5-The_Datasets_library-6-_Datasets_check

中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter5/7?fw=pt

🤗 Datasets, check!

🤗数据集,检查!

Ask a Question

问一个问题

Well, that was quite a tour through the 🤗 Datasets library — congratulations on making it this far! With the knowledge that you’ve gained from this chapter, you should be able to:

这是对🤗数据集库的一次相当长的旅程–祝贺您走到了这一步!有了您从本章中学到的™知识,您应该能够:

  • Load datasets from anywhere, be it the Hugging Face Hub, your laptop, or a remote server at your company.
  • Wrangle your data using a mix of the Dataset.map() and Dataset.filter() functions.
  • Quickly switch between data formats like Pandas and NumPy using Dataset.set_format().
  • Create your very own dataset and push it to the Hugging Face Hub.
  • Embed your documents using a Transformer model and build a semantic search engine using FAISS.

In [Chapter 7], we’ll put all of this to good use as we take a deep dive into the core NLP tasks that Transformer models are great for. Before jumping ahead, though, put your knowledge of 🤗 Datasets to the test with a quick quiz!

从任何地方加载数据集,无论是Hugging Face中心、笔记本电脑还是公司的远程服务器。混合使用Dataset.map()Dataset.Filter()函数来管理数据。使用Dataset.set_™()在熊猫和NumPy等数据格式之间快速切换。创建自己的数据集并将其推送到Hugging Face中心。使用Transformer模型嵌入您的文档,并使用FAIS构建语义搜索引擎。在第7章中,我们将很好地利用所有这些功能,深入研究Transformer模型非常适合的核心自然语言处理任务。不过,在跳到前面之前,先通过一个快速测验来测试您对🤗数据集的了解!