3-Fine-tuning_a_pretrained_model-2-Fine-tuning_a_model_with_the_Trainer_API_or_Keras

中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter3/3?fw=pt

Fine-tuning a model with the Trainer API

使用Trader API微调模型

Ask a Question
Open In Colab
Open In Studio Lab

在工作室实验室的可乐公开赛中提问

🤗 Transformers provides a Trainer class to help you fine-tune any of the pretrained models it provides on your dataset. Once you’ve done all the data preprocessing work in the last section, you have just a few steps left to define the Trainer. The hardest part is likely to be preparing the environment to run Trainer.train(), as it will run very slowly on a CPU. If you don’t have a GPU set up, you can get access to free GPUs or TPUs on Google Colab.

🤗Transformers提供了一个Traine类来帮助您微调它在您的数据集上提供的任何预先训练的模型。一旦您完成了最后一节中的所有数据预处理工作,您就只剩下几个步骤来定义Traine。最困难的部分可能是准备环境以运行Traine.Train(),因为它在CPU上运行非常慢。如果你没有设置图形处理器,你可以在Google Colab上获得免费的图形处理器或TPU。

The code examples below assume you have already executed the examples in the previous section. Here is a short summary recapping what you need:

下面的代码示例假定您已经执行了上一节中的示例。以下是您需要的内容的简短总结:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)


def tokenize_function(example):
return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Training

培训

The first step before we can define our Trainer is to define a TrainingArguments class that will contain all the hyperparameters the Trainer will use for training and evaluation. The only argument you have to provide is a directory where the trained model will be saved, as well as the checkpoints along the way. For all the rest, you can leave the defaults, which should work pretty well for a basic fine-tuning.

在定义我们的Traine之前,第一步是定义一个TrainingArguments类,它将包含Traine将用于训练和评估的所有超参数。您必须提供的唯一参数是一个目录,其中将保存训练的模型以及沿途的检查点。对于所有其余部分,您可以保留默认设置,这对于基本的微调应该非常有效。

1
2
3
from transformers import TrainingArguments

training_args = TrainingArguments("test-trainer")

💡 If you want to automatically upload your model to the Hub during training, pass along push_to_hub=True in the TrainingArguments. We will learn more about this in Chapter 4

💡如果您希望在培训期间自动将模型上传到中心,请在TrainingArguments中传递PUSH_TO_HUB=True。我们将在第四章中了解更多有关这方面的内容

The second step is to define our model. As in the [previous chapter], we will use the AutoModelForSequenceClassification class, with two labels:

第二步是定义我们的模型。和上一章一样,我们将使用AutoModelForSequenceClass类,带有两个标签:

1
2
3
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

You will notice that unlike in [Chapter 2], you get a warning after instantiating this pretrained model. This is because BERT has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). It concludes by encouraging you to train the model, which is exactly what we are going to do now.

您将注意到,与第2章不同,在实例化这个预先训练的模型后,您会收到一条警告。这是因为Bert没有接受过对句子对进行分类的预训练,所以丢弃了预训练模型的头部,而增加了一个适合于序列分类的新头部。警告表明,一些权重没有被使用(对应于下降的预训头部),而另一些权重是随机初始化的(新头部的权重)。最后,它鼓励您培训模型,这正是我们现在要做的。

Once we have our model, we can define a Trainer by passing it all the objects constructed up to now — the model, the training_args, the training and validation datasets, our data_collator, and our tokenizer:

一旦我们有了我们的模型,我们就可以通过传递到目前为止构造的所有对象来定义TraineModelTraining_args、训练和验证数据集、data_Collatortokenizer

1
2
3
4
5
6
7
8
9
10
from transformers import Trainer

trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)

Note that when you pass the tokenizer as we did here, the default data_collator used by the Trainer will be a DataCollatorWithPadding as defined previously, so you can skip the line data_collator=data_collator in this call. It was still important to show you this part of the processing in section 2!

请注意,当您像我们这里一样传递tokenizer时,Traine使用的默认data_Collator将是前面定义的DataCollatorWithPadding,因此您可以跳过此调用中的data_Collator=data_Collator行。在第2节中向您展示处理的这一部分仍然很重要!

To fine-tune the model on our dataset, we just have to call the train() method of our Trainer:

要对数据集上的模型进行微调,我们只需调用Trainetrain()方法:

1
trainer.train()

This will start the fine-tuning (which should take a couple of minutes on a GPU) and report the training loss every 500 steps. It won’t, however, tell you how well (or badly) your model is performing. This is because:

这将开始微调(在GPU上应该需要几分钟),并每500步报告一次训练损失。然而,它不会告诉你你的模型表现得有多好(或多差)。这是因为:

  1. We didn’t tell the Trainer to evaluate during training by setting evaluation_strategy to either "steps" (evaluate every eval_steps) or "epoch" (evaluate at the end of each epoch).
  2. We didn’t provide the Trainer with a compute_metrics() function to calculate a metric during said evaluation (otherwise the evaluation would just have printed the loss, which is not a very intuitive number).

在训练过程中,我们没有通过将EVALUATION_STARTY设置为“Steps”(在每个时期结束时进行求值)或“Epoch”(在每个纪元结束时求值)来告诉Traine进行评估。我们没有为Traine提供用于在该评估过程中计算指标的culate_metrics()函数(否则评估只会打印损失,这不是一个非常直观的数字)。

Evaluation

评估

Let’s see how we can build a useful compute_metrics() function and use it the next time we train. The function must take an EvalPrediction object (which is a named tuple with a predictions field and a label_ids field) and will return a dictionary mapping strings to floats (the strings being the names of the metrics returned, and the floats their values). To get some predictions from our model, we can use the Trainer.predict() command:

让我们看看如何构建一个有用的计算度量()函数,并在下一次培训时使用它。该函数必须接受一个EvalPrediction对象(这是一个命名元组,带有一个Predictions字段和一个Label_ids字段),并将返回一个将字符串映射到浮点数的字典(这些字符串是返回的指标的名称,而浮点数是它们的值)。要从我们的模型中获得一些预测,我们可以使用Traine.recast()命令:

1
2
predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)
1
(408, 2) (408,)

The output of the predict() method is another named tuple with three fields: predictions, label_ids, and metrics. The metrics field will just contain the loss on the dataset passed, as well as some time metrics (how long it took to predict, in total and on average). Once we complete our compute_metrics() function and pass it to the Trainer, that field will also contain the metrics returned by compute_metrics().

`Forecast()方法的输出是另一个命名元组,包含三个字段:PredictionsLabel_idsmetrics。“指标”字段将只包含传递的数据集的损失,以及一些时间指标(预测所需的总时间和平均时间)。一旦我们完成了Compute_Metrics()函数并将其传递给Traine,该字段也将包含由Compute_Metrics()`返回的指标。

As you can see, predictions is a two-dimensional array with shape 408 x 2 (408 being the number of elements in the dataset we used). Those are the logits for each element of the dataset we passed to predict() (as you saw in the [previous chapter], all Transformer models return logits). To transform them into predictions that we can compare to our labels, we need to take the index with the maximum value on the second axis:

如您所见,forections是一个形状为408x2的二维数组(408是我们使用的数据集中的元素数)。这些是我们传递给Forecast()的数据集的每个元素的Logit(正如您在上一章中看到的,所有Transformer模型都返回Logit)。要将它们转换为我们可以与我们的标签进行比较的预测,我们需要取第二个轴上具有最大值的索引:

1
2
3
import numpy as np

preds = np.argmax(predictions.predictions, axis=-1)

We can now compare those preds to the labels. To build our compute_metric() function, we will rely on the metrics from the 🤗 Evaluate library. We can load the metrics associated with the MRPC dataset as easily as we loaded the dataset, this time with the evaluate.load() function. The object returned has a compute() method we can use to do the metric calculation:

我们现在可以将这些‘Preds’与标签进行比较。要构建我们的🤗_Metric()函数,我们将依赖计算评估库中的指标。我们可以像加载数据集一样轻松地加载与MRPC数据集相关联的指标,这一次是使用valuate.Load()函数。返回的对象有一个Compute()方法,我们可以使用它来进行指标计算:

1
2
3
4
import evaluate

metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=predictions.label_ids)
1
{'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}

The exact results you get may vary, as the random initialization of the model head might change the metrics it achieved. Here, we can see our model has an accuracy of 85.78% on the validation set and an F1 score of 89.97. Those are the two metrics used to evaluate results on the MRPC dataset for the GLUE benchmark. The table in the BERT paper reported an F1 score of 88.9 for the base model. That was the uncased model while we are currently using the cased model, which explains the better result.

您得到的确切结果可能会有所不同,因为模型头部的随机初始化可能会更改其实现的指标。在这里,我们可以看到我们的模型在验证集上的准确率为85.78%,F1得分为89.97。这两个指标用于评估GLUE基准的MRPC数据集的结果。BERT论文中的表格报告了基础车型的F1得分为88.9。这是一个‘未加条件’模型,而我们目前使用的是‘案例’模型,这解释了更好的结果。

Wrapping everything together, we get our compute_metrics() function:

将所有内容放在一起,我们得到了我们的Compute_Metrics()函数:

1
2
3
4
5
def compute_metrics(eval_preds):
metric = evaluate.load("glue", "mrpc")
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)

And to see it used in action to report metrics at the end of each epoch, here is how we define a new Trainer with this compute_metrics() function:

为了查看它在每个纪元结束时用于报告指标的实际情况,下面是我们如何使用这个Compute_Metrics()函数定义一个新的Traine

1
2
3
4
5
6
7
8
9
10
11
12
training_args = TrainingArguments("test-trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)

Note that we create a new TrainingArguments with its evaluation_strategy set to "epoch" and a new model — otherwise, we would just be continuing the training of the model we have already trained. To launch a new training run, we execute:

请注意,我们创建了一个新的TrainingArguments和一个新模型,该TrainingArgumentsvaluation_Strategy设置为“Epoch”,否则,我们将继续我们已经训练的模型的训练。为了启动新的培训运行,我们执行:

1
trainer.train()

This time, it will report the validation loss and metrics at the end of each epoch on top of the training loss. Again, the exact accuracy/F1 score you reach might be a bit different from what we found, because of the random head initialization of the model, but it should be in the same ballpark.

这一次,它将在培训损失的基础上报告验证损失和每个时期结束时的指标。同样,由于模型的随机头部初始化,您达到的精确精度/F1分数可能与我们发现的略有不同,但它应该在相同的大体范围内。

The Trainer will work out of the box on multiple GPUs or TPUs and provides lots of options, like mixed-precision training (use fp16 = True in your training arguments). We will go over everything it supports in Chapter 10.

`Traine将在多个GPU或TPU上开箱即用,并提供大量选项,如混合精度训练(在训练参数中使用fp16=True`)。我们将在第10章详细介绍它所支持的所有内容。

This concludes the introduction to fine-tuning using the Trainer API. An example of doing this for most common NLP tasks will be given in [Chapter 7], but for now let’s look at how to do the same thing in pure PyTorch.

关于使用Traine接口进行微调的介绍到此结束。对于大多数常见的NLP任务,我们将在第7章给出一个这样做的例子,但现在让我们看看如何在纯PyTorch中做同样的事情。

✏️ Try it out! Fine-tune a model on the GLUE SST-2 dataset, using the data processing you did in section 2.

✏️试试看吧!使用您在第2节中进行的数据处理,对GLUE SST-2数据集上的模型进行微调。