N13-Bonus_Unit_3-Advanced_Topics_in_Reinforcement_Learning-J9-Brief_introduction_to_RL_documentation

中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/deep-rl-course/unit3/hands-on?fw=pt

Brief introduction to RL documentation

RL文档简介

In this advanced topic, we address the question: how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real world and
interfacing with humans?

在这个高级主题中,我们解决了一个问题:我们应该如何监控和跟踪我们在现实世界中训练并与人类交互的强大的强化学习代理?

As machine learning systems have increasingly impacted modern life, call for documentation of these systems has grown.

随着机器学习系统对现代生活的影响越来越大,对这些系统进行记录的呼声也越来越高。

Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc.
— or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more.

这类文档可以涵盖所使用的培训数据–存储在哪里、何时收集、参与人员等–或模型优化框架–体系结构、评估指标、相关论文等–等等。

Today, model cards and datasheets are becoming increasingly available. For example, on the Hub
(see documentation here).

如今,模型卡和数据手册正变得越来越容易获得。例如,在Hub上(请参阅此处的文档)。

If you click on a popular model on the Hub, you can learn about its creation process.

如果你在Hub上点击一个流行的模型,你可以了解它的创建过程。

These model and data specific logs are designed to be completed when the model or dataset are created, leaving them to go un-updated when these models are built into evolving systems in the future.

这些特定于模型和数据的日志被设计为在创建模型或数据集时完成,以便在将来将这些模型构建到不断发展的系统中时不进行更新。​

Motivating Reward Reports

激励性奖励报告

Reinforcement learning systems are fundamentally designed to optimize based on measurements of reward and time.
While the notion of a reward function can be mapped nicely to many well-understood fields of supervised learning (via a loss function),
understanding how machine learning systems evolve over time is limited.

强化学习系统的设计从根本上是为了根据奖励和时间的测量进行优化。虽然奖励函数的概念可以很好地映射到许多众所周知的监督学习领域(通过损失函数),但对机器学习系统如何随着时间的推移而演变的理解是有限的。

To that end, the authors introduce Reward Reports for Reinforcement Learning (the pithy naming is designed to mirror the popular papers Model Cards for Model Reporting and Datasheets for Datasets).
The goal is to propose a type of documentation focused on the human factors of reward and time-varying feedback systems.

为此,作者引入了强化学习的奖励报告(简洁的命名是为了反映流行的论文模型报告的模型卡和数据集的数据表)。目标是提出一种文献类型,着重于奖励和时变反馈系统的人的因素。

Building on the documentation frameworks for model cards and datasheets proposed by Mitchell et al. and Gebru et al., we argue the need for Reward Reports for AI systems.

建立在Mitchell等人提出的模型卡和数据表的文档框架基础上。和Gebru等人,我们认为人工智能系统需要奖励报告。

Reward Reports are living documents for proposed RL deployments that demarcate design choices.

奖励报告是划分设计选择范围的建议RL部署的实时文档。

However, many questions remain about the applicability of this framework to different RL applications, roadblocks to system interpretability,
and the resonances between deployed supervised machine learning systems and the sequential decision-making utilized in RL.

然而,这个框架对不同的RL应用的适用性、系统可解释性的障碍以及部署的有监督的机器学习系统与RL中使用的顺序决策之间的共鸣仍然存在许多问题。

At a minimum, Reward Reports are an opportunity for RL practitioners to deliberate on these questions and begin the work of deciding how to resolve them in practice.

至少,奖励报告是RL实践者仔细考虑这些问题并开始决定如何在实践中解决它们的工作的机会。​

Capturing temporal behavior with documentation

使用文档捕获时间行为

The core piece specific to documentation designed for RL and feedback-driven ML systems is a change-log. The change-log updates information
from the designer (changed training parameters, data, etc.) along with noticed changes from the user (harmful behavior, unexpected responses, etc.).

特定于为RL和反馈驱动的ML系统设计的文档的核心部分是更改日志。更改日志更新来自设计者的信息(更改的培训参数、数据等)以及来自用户的注意到的变化(有害行为、意外响应等)。

The change log is accompanied by update triggers that encourage monitoring these effects.

更改日志伴随着鼓励监视这些影响的更新触发器。

Contributing

贡献

Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind closed doors of private corporations.
These corporations are largely without regulation, so the burden of documentation falls on the public.

一些最具影响力的由RL驱动的系统本质上是多个利益相关者的,并且是在私人公司的闭门之后。这些公司在很大程度上没有监管,因此文件记录的负担落在了公众身上。

If you are interested in contributing, we are building Reward Reports for popular machine learning systems on a public
record on GitHub.

For further reading, you can visit the Reward Reports paper
or look an example report.

如果你有兴趣贡献,我们正在为流行的机器学习系统建立奖励报告,并在GitHub上公开记录。​欲进一步阅读,您可以访问奖励报告论文或查看示例报告。

Author

作者

This section was written by Nathan Lambert

本部分由内森·兰伯特撰写