5-The_Datasets_library-6-_Datasets_check
中英文对照学习,效果更佳!
原课程链接:https://huggingface.co/course/chapter5/7?fw=pt
🤗 Datasets, check!
🤗数据集,检查!
问一个问题
Well, that was quite a tour through the 🤗 Datasets library — congratulations on making it this far! With the knowledge that you’ve gained from this chapter, you should be able to:
这是对🤗数据集库的一次相当长的旅程–祝贺您走到了这一步!有了您从本章中学到的™知识,您应该能够:
- Load datasets from anywhere, be it the Hugging Face Hub, your laptop, or a remote server at your company.
- Wrangle your data using a mix of the
Dataset.map()andDataset.filter()functions. - Quickly switch between data formats like Pandas and NumPy using
Dataset.set_format(). - Create your very own dataset and push it to the Hugging Face Hub.
- Embed your documents using a Transformer model and build a semantic search engine using FAISS.
In [Chapter 7], we’ll put all of this to good use as we take a deep dive into the core NLP tasks that Transformer models are great for. Before jumping ahead, though, put your knowledge of 🤗 Datasets to the test with a quick quiz!
从任何地方加载数据集,无论是Hugging Face中心、笔记本电脑还是公司的远程服务器。混合使用Dataset.map()和Dataset.Filter()函数来管理数据。使用Dataset.set_™()在熊猫和NumPy等数据格式之间快速切换。创建自己的数据集并将其推送到Hugging Face中心。使用Transformer模型嵌入您的文档,并使用FAIS构建语义搜索引擎。在第7章中,我们将很好地利用所有这些功能,深入研究Transformer模型非常适合的核心自然语言处理任务。不过,在跳到前面之前,先通过一个快速测验来测试您对🤗数据集的了解!
