2-Using_Transformers-2-Models
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter2/3?fw=pt
Models
模型
在工作室实验室的可乐公开赛中提问
In this section we’ll take a closer look at creating and using a model. We’ll use the AutoModel class, which is handy when you want to instantiate any model from a checkpoint.
在本节中,我们将进一步了解如何创建和使用模型。我们将使用AutoModel类,当您想要从检查点实例化任何模型时,它很方便。
The AutoModel class and all of its relatives are actually simple wrappers over the wide variety of models available in the library. It’s a clever wrapper as it can automatically guess the appropriate model architecture for your checkpoint, and then instantiates a model with this architecture.
`AutoModel`类及其所有相关类实际上是对库中提供的各种模型的简单包装。它是一个聪明的包装器,因为它可以自动猜测检查点的适当模型体系结构,然后使用该体系结构实例化一个模型。
However, if you know the type of model you want to use, you can use the class that defines its architecture directly. Let’s take a look at how this works with a BERT model.
但是,如果您知道要使用的模型类型,则可以直接使用定义其体系结构的类。让我们来看看这是如何与伯特模型一起工作的。
Creating a Transformer
创建Transformer
The first thing we’ll need to do to initialize a BERT model is load a configuration object:
初始化BERT模型需要做的第一件事是加载一个配置对象:
1 | |
The configuration contains many attributes that are used to build the model:
该配置包含用于构建模型的许多属性:
1 | |
1 | |
While you haven’t seen what all of these attributes do yet, you should recognize some of them: the hidden_size attribute defines the size of the hidden_states vector, and num_hidden_layers defines the number of layers the Transformer model has.
虽然您还没有看到所有这些属性的作用,但是您应该认识其中的一些:den_size属性定义了den_states向量的大小,而num_den_layers定义了Transformer模型的层数。
Different loading methods
不同的加载方式
Creating a model from the default configuration initializes it with random values:
从默认配置创建模型会使用随机值对其进行初始化:
1 | |
The model can be used in this state, but it will output gibberish; it needs to be trained first. We could train the model from scratch on the task at hand, but as you saw in [Chapter 1], this would require a long time and a lot of data, and it would have a non-negligible environmental impact. To avoid unnecessary and duplicated effort, it’s imperative to be able to share and reuse models that have already been trained.
该模型可以在这种状态下使用,但它会输出胡言乱语;它需要首先进行训练。我们可以根据手头的任务从头开始训练模型,但正如你在第1章中看到的,这将需要很长的时间和大量的数据,而且它将对环境产生不可忽视的影响。为了避免不必要和重复的工作,必须能够共享和重用已经培训过的模型。
Loading a Transformer model that is already trained is simple — we can do this using the from_pretrained() method:
加载已经训练好的Transformer模型很简单–我们可以使用From_PreTraded()方法来完成:
1 | |
As you saw earlier, we could replace BertModel with the equivalent AutoModel class. We’ll do this from now on as this produces checkpoint-agnostic code; if your code works for one checkpoint, it should work seamlessly with another. This applies even if the architecture is different, as long as the checkpoint was trained for a similar task (for example, a sentiment analysis task).
如您前面所见,我们可以用等价的AutoModel类替换BertModel。我们将从现在开始这样做,因为这会生成与检查点无关的代码;如果您的代码为一个检查点工作,那么它应该与另一个检查点无缝工作。即使体系结构不同,这一点也适用,只要检查点针对类似的任务(例如,情感分析任务)进行了培训。
In the code sample above we didn’t use BertConfig, and instead loaded a pretrained model via the bert-base-cased identifier. This is a model checkpoint that was trained by the authors of BERT themselves; you can find more details about it in its model card.
在上面的代码示例中,我们没有使用BertConfig,而是通过bert-base-Cased标识符来加载一个预先训练好的模型。这是一个由BERT作者自己培训的模型检查站;您可以在其模型卡中找到有关它的更多详细信息。
This model is now initialized with all the weights of the checkpoint. It can be used directly for inference on the tasks it was trained on, and it can also be fine-tuned on a new task. By training with pretrained weights rather than from scratch, we can quickly achieve good results.
此模型现在已使用检查点的所有权重进行了初始化。它可以直接用于对它所训练的任务进行推理,也可以对新的任务进行微调。通过用预先训练好的重量进行训练,而不是从头开始,我们可以很快取得好的成绩。
The weights have been downloaded and cached (so future calls to the from_pretrained() method won’t re-download them) in the cache folder, which defaults to ~/.cache/huggingface/transformers. You can customize your cache folder by setting the HF_HOME environment variable.
权重已下载并缓存到缓存文件夹中(因此以后调用From_preTraded()方法时不会重新下载它们),该文件夹默认为~/.cache/huggingFaces/Transers。您可以通过设置HF_HOME环境变量来自定义您的缓存文件夹。
The identifier used to load the model can be the identifier of any model on the Model Hub, as long as it is compatible with the BERT architecture. The entire list of available BERT checkpoints can be found here.
用于加载模型的标识符可以是Model Hub上任何模型的标识符,只要它与BERT体系结构兼容即可。可用BERT检查点的完整列表可在此处找到。
Saving methods
节约方法
Saving a model is as easy as loading one — we use the save_pretrained() method, which is analogous to the from_pretrained() method:
保存模型和加载模型一样简单–我们使用的是save_preTraded()方法,类似于From_preTraded()方法:
1 | |
This saves two files to your disk:
这会将两个文件保存到您的磁盘上:
1 | |
If you take a look at the config.json file, you’ll recognize the attributes necessary to build the model architecture. This file also contains some metadata, such as where the checkpoint originated and what 🤗 Transformers version you were using when you last saved the checkpoint.
如果您查看一下config.json文件,您将认识到构建模型体系结构所必需的属性。此文件还包含一些元数据,例如检查点的起源位置以及上次保存检查点时使用的🤗Transformer版本。
The pytorch_model.bin file is known as the state dictionary; it contains all your model’s weights. The two files go hand in hand; the configuration is necessary to know your model’s architecture, while the model weights are your model’s parameters.
Pytorch_Model.bin文件称为状态字典;它包含模型的所有权重。这两个文件齐头并进;配置是了解模型体系结构所必需的,而模型权重是模型的参数。
Using a Transformer model for inference
使用Transformer模型进行推理
Now that you know how to load and save a model, let’s try using it to make some predictions. Transformer models can only process numbers — numbers that the tokenizer generates. But before we discuss tokenizers, let’s explore what inputs the model accepts.
现在您已经了解了如何加载和保存模型,让我们尝试使用它来进行一些预测。Transformer模型只能处理数字–令牌器生成的数字。但在我们讨论标记器之前,让我们先了解一下模型接受哪些输入。
Tokenizers can take care of casting the inputs to the appropriate framework’s tensors, but to help you understand what’s going on, we’ll take a quick look at what must be done before sending the inputs to the model.
Tokenizers可以负责将输入转换为适当框架的张量,但为了帮助您了解发生了什么,我们将快速了解在将输入发送到模型之前必须执行的操作。
Let’s say we have a couple of sequences:
假设我们有几个序列:
1 | |
The tokenizer converts these to vocabulary indices which are typically called input IDs. Each sequence is now a list of numbers! The resulting output is:
标记器将这些转换为词汇表索引,通常称为输入ID。现在,每个序列都是一个数字列表!结果输出为:
1 | |
This is a list of encoded sequences: a list of lists. Tensors only accept rectangular shapes (think matrices). This “array” is already of rectangular shape, so converting it to a tensor is easy:
这是编码序列的列表:列表的列表。张量只接受矩形(想想矩阵)。这个“数组”已经是矩形的,所以很容易将其转换为张量:
1 | |
Using the tensors as inputs to the model
使用张量作为模型的输入
Making use of the tensors with the model is extremely simple — we just call the model with the inputs:
在模型中使用张量非常简单-我们只需使用输入调用模型:
1 | |
While the model accepts a lot of different arguments, only the input IDs are necessary. We’ll explain what the other arguments do and when they are required later,
but first we need to take a closer look at the tokenizers that build the inputs that a Transformer model can understand.
虽然该模型接受许多不同的参数,但只需要输入ID。我们稍后将解释其他参数的作用以及何时需要它们,但首先我们需要更仔细地了解构建Transformer模型可以理解的输入的标记器。
