中英文对照学习，效果更佳！
原课程链接：https://huggingface.co/course/chapter5/8?fw=pt

End-of-chapter quiz

章末测验

问一个问题

This chapter covered a lot of ground! Don’t worry if you didn’t grasp all the details; the next chapters will help you understand how things work under the hood.

这一章涵盖了很多领域！如果您没有掌握所有细节，请不要担心；下一章将帮助您了解事情是如何在幕后运行的。

Before moving on, though, let’s test what you learned in this chapter.

不过，在继续之前，让我们测试一下您在本章中学到了什么。

The load_dataset() function in 🤗 Datasets allows you to load a dataset from which of the following locations?

Locally, e.g. on your laptop

🤗DataSets中的Load_DataSet()函数允许您从以下哪个位置加载数据集？本地，例如在您的笔记本电脑上

The Hugging Face Hub

Hugging Face中心

A remote server

远程服务器

Suppose you load one of the GLUE tasks as follows:

1
2
3

from datasets import load_dataset

dataset = load_dataset("glue", "mrpc", split="train")

Which of the following commands will produce a random sample of 50 elements from dataset?

假设您加载一个粘合任务，如下所示：以下哪个命令会从datet中随机抽取50个元素？

dataset.sample(50)

`Datet.Sample(50)`

dataset.shuffle().select(range(50))

`Datet.Shuffle().Select(Range(50))`

dataset.select(range(50)).shuffle()

`datet.select(range(50)).Shuffle()`

Suppose you have a dataset about household pets called pets_dataset, which has a name column that denotes the name of each pet. Which of the following approaches would allow you to filter the dataset for all pets whose names start with the letter “L”?

pets_dataset.filter(lambda x : x['name'].startswith('L'))

假设您有一个关于家庭宠物的数据集，名为perts_datet，它有一个name列，表示每个宠物的名字。以下哪种方法允许您筛选名称以字母“L”开头的所有宠物的数据集？宠物_数据集.过滤器(lambda x：x[‘name’].startswith(‘L’))

pets_dataset.filter(lambda x['name'].startswith('L'))

`宠物_数据集.Filter(lambda x[‘name’].startswith(‘L’))`

Create a function like def filter_names(x): return x['name'].startswith('L') and run pets_dataset.filter(filter_names).

创建一个类似于def Filter_NAMES(X)：返回x[‘name’].startswith(‘L’)的函数，然后运行pits_datet.Filter(Filter_Names)。

What is memory mapping?

A mapping between CPU and GPU RAM

什么是内存映射？CPU和GPU RAM之间的映射

A mapping between RAM and filesystem storage

RAM和文件系统存储之间的映射

A mapping between two files in the 🤗 Datasets cache

🤗数据集缓存中两个文件之间的映射

Which of the following are the main benefits of memory mapping?

Accessing memory-mapped files is faster than reading from or writing to disk.

以下哪项是内存映射的主要优势？访问内存映射文件比从磁盘读取或写入磁盘更快。

Applications can access segments of data in an extremely large file without having to read the whole file into RAM first.

应用程序可以访问极大文件中的数据段，而不必首先将整个文件读取到RAM中。

It consumes less energy, so your battery lasts longer.

它消耗的能量更少，所以你的电池续航时间更长。

Why does the following code fail?

from datasets import load_dataset

dataset = load_dataset("allocine", streaming=True, split="train")
dataset[0]

It tries to stream a dataset that’s too large to fit in RAM.

为什么下面的代码失败了？它试图流式传输一个太大而不能放入RAM的数据集。

It tries to access an IterableDataset.

它尝试访问IterableDataset。

The allocine dataset doesn’t have a train split.

`alLocine数据集没有Train`拆分。

以下哪些是创建数据集卡的主要好处？它提供了有关数据集的预期用途和支持的任务的信息，以便社区中的其他人可以做出关于使用它的明智决定。

Which of the following are the main benefits of creating a dataset card?

It provides information about the intended use and supported tasks of the dataset so others in the community can make an informed decision about using it.

它有助于引起人们对语料库中存在的偏见的注意。

It helps draw attention to the biases that are present in a corpus.

它提高了社区中其他人使用我的数据集的机会。

It improves the chances that others in the community will use my dataset.

什么是语义搜索？一种在查询中的单词和语料库中的文档之间搜索精确匹配的方法

What is semantic search?

A way to search for exact matches between the words in a query and the documents in a corpus

通过理解查询的上下文含义来搜索匹配文档的方法

A way to search for matching documents by understanding the contextual meaning of a query

一种提高搜索精度的方法

A way to improve search accuracy

对于非对称语义搜索，您通常有：一个简短的查询和一个较长的段落来回答查询

For asymmetric semantic search, you usually have:

A short query and a longer paragraph that answers the query

长度大致相同的查询和段落

Queries and paragraphs that are of about the same length

一个长查询和一个回答查询的较短段落

A long query and a shorter paragraph that answers the query

我是否可以使用🤗数据集加载数据以用于其他领域，如语音处理？不可以

Can I use 🤗 Datasets to load data for use in other domains, like speech processing?

是

Yes

Transformer

#Course

5-The_Datasets_library-5-Semantic_search_with_FAISS 上一篇

5-The_Datasets_library-6-_Datasets_check 下一篇