B1-Unit_1-Introduction_to_Deep_Reinforcement_Learning-B1-What_is_Reinforcement_Learning
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/deep-rl-course/unit1/what-is-rl?fw=pt
What is Reinforcement Learning?
强化学习是什么?
To understand Reinforcement Learning, let’s start with the big picture.
为了理解强化学习,让我们从大局开始。
The big picture
大局
The idea behind Reinforcement Learning is that an agent (an AI) will learn from the environment by interacting with it (through trial and error) and receiving rewards (negative or positive) as feedback for performing actions.
强化学习背后的想法是,智能代理(AI)将通过与环境互动(通过反复尝试)和接受奖励(消极或积极)来从环境中学习,以此作为执行行动的反馈。
Learning from interactions with the environment comes from our natural experiences.
从与环境的互动中学习,来自我们的自然经验。
For instance, imagine putting your little brother in front of a video game he never played, giving him a controller, and leaving him alone.
例如,想象一下,把你的小弟弟放在他从未玩过的电子游戏前,给他一个控制器,然后让他一个人呆着。

Your brother will interact with the environment (the video game) by pressing the right button (action). He got a coin, that’s a +1 reward. It’s positive, he just understood that in this game he must get the coins.
插图_1你的兄弟将通过按下正确的按钮(动作)与环境(视频游戏)互动。他得到了一枚硬币,这是+1的奖励。这是积极的,他刚刚明白在这场比赛中,他必须拿到硬币。

But then, he presses right again and he touches an enemy. He just died, so that’s a -1 reward.
然后,当他再次向右按时,他接触到了一个敌人。他刚刚死了,所以这是1胜1负的奖励。

By interacting with his environment through trial and error, your little brother understood that he needed to get coins in this environment but avoid the enemies.
通过反复尝试与环境互动,你的小弟弟明白,他需要在这种环境中获得硬币,但又要避开敌人。
Without any supervision, the child will get better and better at playing the game.
在没有任何监督的情况下,孩子玩游戏会越来越好。
That’s how humans and animals learn, through interaction. Reinforcement Learning is just a computational approach of learning from actions.
这就是人类和动物学习的方式,通过互动。强化学习只是从行动中学习的一种简单的计算方法。
A formal definition
正式的定义
If we take now a formal definition:
如果我们现在给出一个正式的定义:
Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.
强化学习是一种解决控制任务(也称为决策问题)的框架,通过构建能够从环境中学习的代理,通过与环境进行交互,通过试错和接受作为唯一反馈的奖励(积极或消极)。
But how does Reinforcement Learning work?
但是强化学习是如何工作的呢?