M12-Unit_8-Part_2_Proximal_Policy_Optimization_(PPO)_with_Doom-A0-Introduction
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/deep-rl-course/unit2/q-learning-example?fw=pt
Introduction to PPO with Sample-Factory
PPO样品厂简介
![]()
In this second part of Unit 8, we’ll get deeper into PPO optimization by using Sample-Factory, an asynchronous implementation of the PPO algorithm, to train our agent playing vizdoom (an open source version of Doom).
在第8单元的第二部分中,我们将通过使用示例工厂(PPO算法的异步实现)来培训我们的代理玩vizdoom(™的一个开源版本),从而更深入地了解PPO优化。
In the notebook, you’ll train your agent to play the Health Gathering level, where the agent must collect health packs to avoid dying. After that, you can train your agent to play more complex levels, such as Deathmatch.
在笔记本中,你将训练你的代理人玩健康收集等级,在那里代理人必须收集生命包以避免死亡。™将训练你的代理人玩健康收集等级。在那之后,你可以训练你的代理玩更复杂的关卡,比如死亡竞赛。

Sounds exciting? Let’s get started! 🚀
环境听起来很刺激?让?EURO™s开始吧!FINGŸšEURO
The hands-on is made by Edward Beeching, a Machine Learning Research Scientist at Hugging Face. He worked on Godot Reinforcement Learning Agents, an open-source interface for developing environments and agents in the Godot Game Engine.
《Hugging Face》是由机器学习研究科学家爱德华·比钦亲手制作的。他致力于戈多强化学习代理,这是一个用于在戈多游戏引擎中开发环境和代理的开源接口。