1-Transformer_models-1-Natural_Language_Processing

中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter1/2?fw=pt

Natural Language Processing

自然语言处理

Ask a Question

问一个问题

Before jumping into Transformer models, let’s do a quick overview of what natural language processing is and why we care about it.

在进入Transformer模型之前,让我们快速概述一下什么是自然语言处理,以及我们为什么关心它。

What is NLP?

什么是NLP?

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words.

自然语言处理是语言学和机器学习的一个领域,专注于理解与人类语言有关的一切。NLP任务的目的不仅是理解单个单词,而且能够理解这些单词的上下文。

The following is a list of common NLP tasks, with some examples of each:

以下是常见NLP任务的列表,每个任务都有一些示例:

  • Classifying whole sentences: Getting the sentiment of a review, detecting if an email is spam, determining if a sentence is grammatically correct or whether two sentences are logically related or not
  • Classifying each word in a sentence: Identifying the grammatical components of a sentence (noun, verb, adjective), or the named entities (person, location, organization)
  • Generating text content: Completing a prompt with auto-generated text, filling in the blanks in a text with masked words
  • Extracting an answer from a text: Given a question and a context, extracting the answer to the question based on the information provided in the context
  • Generating a new sentence from an input text: Translating a text into another language, summarizing a text

NLP isn’t limited to written text though. It also tackles complex challenges in speech recognition and computer vision, such as generating a transcript of an audio sample or a description of an image.

对整个句子进行分类:获取评论的情绪,检测电子邮件是否为垃圾邮件,确定句子在语法上是否正确或两个句子是否在逻辑上相关或不分类句子中的每个单词:识别句子的语法成分(名词、动词、形容词)或生成文本内容的命名实体(人、位置、组织):用自动生成的文本完成提示,用掩蔽词填充文本中的空格从文本中提取答案:给定一个问题和上下文,基于上下文中提供的信息提取问题的答案从输入文本中生成新句子:将文本翻译成另一种语言,不过,对文本NLP进行摘要并不局限于书面文本。它还解决了语音识别和计算机视觉中的复杂挑战,例如生成音频样本或图像描述的文字记录。

Why is it challenging?

为什么它具有挑战性?

Computers don’t process information in the same way as humans. For example, when we read the sentence “I am hungry,” we can easily understand its meaning. Similarly, given two sentences such as “I am hungry” and “I am sad,” we’re able to easily determine how similar they are. For machine learning (ML) models, such tasks are more difficult. The text needs to be processed in a way that enables the model to learn from it. And because language is complex, we need to think carefully about how this processing must be done. There has been a lot of research done on how to represent text, and we will look at some methods in the next chapter.

计算机处理信息的方式与人类不同。例如,当我们读到“我饿了”这句话时,我们很容易理解它的意思。同样,给出“我饿了”和“我很难过”这样的两句话,我们就能很容易地确定它们有多相似。对于机器学习(ML)模型,这样的任务更加困难。文本需要以使模型能够从中学习的方式进行处理。因为语言是复杂的,我们需要仔细考虑这个过程必须如何进行。关于如何表示文本已经做了很多研究,我们将在下一章研究一些方法。