E4-Unit_2-Introduction_to_Q_Learning-G6-way_Quiz
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/deep-rl-course/unit5/curiosity?fw=pt
Mid-way Quiz
中途测验
The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.
学习和避免能力幻觉的最好方法是测试自己。这将帮助你找到你需要加强知识的地方。
Q1: What are the two main approaches to find optimal policy?
问题1:寻找最佳政策的两种主要方法是什么?
Policy-based methods
基于政策的方法
Random-based methods
基于随机的方法
Value-based methods
基于价值的方法
Evolution-strategies methods
进化策略方法
Q2: What is the Bellman Equation?
问2:什么是贝尔曼方程式?
Solution
The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:
解决方案Bellman方程是一个递归方程,其工作原理如下:我们可以将任何状态的值视为:
Rt+1 + gamma * V(St+1)
RT+1+伽马*V(ST+1)
The immediate reward + the discounted value of the state that follows
直接回报+随后状态的贴现价值
Q3: Define each part of the Bellman Equation
问题3:定义贝尔曼方程式的每一部分

Solution
贝尔曼方程测验解贝尔曼方程解
Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?
问4:蒙特卡洛学习方法和时差学习方法有什么不同?
With Monte Carlo methods, we update the value function from a complete episode
使用蒙特卡罗方法,我们从一个完整的情节更新值函数
With Monte Carlo methods, we update the value function from a step
使用蒙特卡罗方法,我们从一个步骤更新值函数
With TD learning methods, we update the value function from a complete episode
使用TD学习方法,我们从完整的一集更新值函数
With TD learning methods, we update the value function from a step
使用TD学习方法,我们从一个步骤更新值函数
Q5: Define each part of Temporal Difference learning formula
问题5:定义时差学习公式的各个部分

Solution
TD学习练习解决方案TD练习
Q6: Define each part of Monte Carlo learning formula
问题6:定义蒙特卡罗学习公式的每个部分

Solution
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the previous sections to reinforce (😏) your knowledge.
MC学习练习解决方案MC练习祝贺您完成本次测验🥳,如果您遗漏了一些元素,请花时间再次阅读前面的部分,以巩固(😏)您的知识。