E4-Unit_2-Introduction_to_Q_Learning-A0-Introduction
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/deep-rl-course/unit4/advantages-disadvantages?fw=pt
Introduction to Q-Learning
Q-Learning简介
![]()
In the first unit of this class, we learned about Reinforcement Learning (RL), the RL process, and the different methods to solve an RL problem. We also trained our first agents and uploaded them to the Hugging Face Hub.
单元2缩略图在本课程的第一个单元中,我们学习了强化学习(RL)、强化学习过程以及解决强化学习问题的不同方法。我们还培训了第一批特工,并将他们上传到Hugging Face中心。
In this unit, we’re going to dive deeper into one of the Reinforcement Learning methods: value-based methods and study our first RL algorithm: Q-Learning.
在本单元中,我们将更深入地研究强化学习方法之一:基于价值的方法,并研究我们的第一个RL算法:Q-™。
We’ll also implement our first RL agent from scratch, a Q-Learning agent, and will train it in two environments:
我们还将从头开始实施我们的第一个RL代理-Q-™代理,并将在两个环境中对其进行培训:
- Frozen-Lake-v1 (non-slippery version): where our agent will need to **go from the starting state (S) to the goal state (G)** by walking only on frozen tiles (F) and avoiding holes (H).
- An autonomous taxi: where our agent will need to learn to navigate a city to transport its passengers from point A to point B.

Concretely, we will:
冷冻-Lake-v1(非滑版本):我们的工程师需要**通过只在冰冻的瓷砖(F)上行走并避免坑洞(H)**从开始状态(S)进入目标状态(G)**。自动出租车:我们的工程师需要学会在城市中导航,以便将乘客从A点运送到B点。具体来说,我们将:
- Learn about value-based methods.
- Learn about the differences between Monte Carlo and Temporal Difference Learning.
- Study and implement our first RL algorithm: Q-Learning.
This unit is fundamental if you want to be able to work on Deep Q-Learning: the first Deep RL algorithm that played Atari games and beat the human level on some of them (breakout, space invaders, etc).
了解基于值的方法。了解蒙特卡罗和时差学习之间的差异。研究并实现我们的第一个RL算法:Q-学习。如果你想要能够进行深度Q-学习,这一单元是基本的:第一个深度RL算法,它玩了Atari游戏,并在其中一些游戏上击败了人类水平(突破,空间入侵者等)。
So let’s get started! 🚀
所以,让?EURO™s开始吧!FINGŸšEURO