中英文对照学习，效果更佳！
原课程链接：https://huggingface.co/deep-rl-course/unit5/curiosity?fw=pt

Mid-way Quiz

中途测验

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

学习和避免能力幻觉的最好方法是测试自己。这将帮助你找到你需要加强知识的地方。

Q1: What are the two main approaches to find optimal policy?

问题1：寻找最佳政策的两种主要方法是什么？

Policy-based methods

基于政策的方法

Random-based methods

基于随机的方法

Value-based methods

基于价值的方法

Evolution-strategies methods

进化策略方法

Q2: What is the Bellman Equation?

问2：什么是贝尔曼方程式？

Solution
The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:

解决方案Bellman方程是一个递归方程，其工作原理如下：我们可以将任何状态的值视为：

Rt+1 + gamma * V(St+1)

RT+1+伽马*V(ST+1)

The immediate reward + the discounted value of the state that follows

直接回报+随后状态的贴现价值

Q3: Define each part of the Bellman Equation

问题3：定义贝尔曼方程式的每一部分

Bellman equation quiz
Solution
Bellman equation solution

贝尔曼方程测验解贝尔曼方程解

Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?

问4：蒙特卡洛学习方法和时差学习方法有什么不同？

With Monte Carlo methods, we update the value function from a complete episode

使用蒙特卡罗方法，我们从一个完整的情节更新值函数

With Monte Carlo methods, we update the value function from a step

使用蒙特卡罗方法，我们从一个步骤更新值函数

With TD learning methods, we update the value function from a complete episode

使用TD学习方法，我们从完整的一集更新值函数

With TD learning methods, we update the value function from a step

使用TD学习方法，我们从一个步骤更新值函数

Q5: Define each part of Temporal Difference learning formula

问题5：定义时差学习公式的各个部分

TD Learning exercise
Solution
TD Exercise

TD学习练习解决方案TD练习

Q6: Define each part of Monte Carlo learning formula

问题6：定义蒙特卡罗学习公式的每个部分

MC Learning exercise
Solution
MC Exercise
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the previous sections to reinforce (😏) your knowledge.