1-Transformer_models-6-Sequence-to-sequence_models
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter1/7?fw=pt
Sequence-to-sequence models[sequence-to-sequence-models]
序列到序列模型[序列到序列模型]
问一个问题
Encoder-decoder models (also called sequence-to-sequence models) use both parts of the Transformer architecture. At each stage, the attention layers of the encoder can access all the words in the initial sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input.
编解码器模型(也称为序列到序列模型)使用Transformer体系结构的两个部分。在每个阶段,编码器的注意力层可以访问初始句子中的所有单词,而解码器的注意力层只能访问输入中给定单词之前的单词。
The pretraining of these models can be done using the objectives of encoder or decoder models, but usually involves something a bit more complex. For instance, T5 is pretrained by replacing random spans of text (that can contain several words) with a single mask special word, and the objective is then to predict the text that this mask word replaces.
这些模型的预训练可以使用编码器或解码器模型的目标来完成,但通常涉及一些稍微复杂的东西。例如,通过将随机跨度的文本(可以包含多个单词)替换为单个掩码特殊单词来预训练T5,然后目标是预测该掩码单词所替换的文本。
Sequence-to-sequence models are best suited for tasks revolving around generating new sentences depending on a given input, such as summarization, translation, or generative question answering.
序列到序列模型最适合于围绕根据给定输入生成新句子的任务,例如摘要、翻译或生成性问题回答。
Representatives of this family of models include:
这一系列模型的代表包括:
