如何使用Python和Tensorflow下载和探索Illiad数据集？

Tensorflow是谷歌提供的机器学习框架。它是一个开源框架，与Python结合使用，实现算法、深度学习应用等等。它被用于研究和生产目的。

“tensorflow”包可以使用以下代码在Windows上安装−

pip install tensorflow

Tensor是TensorFlow中使用的数据结构。它有助于连接流程图中的边。该流程图被称为“Data flow graph”。张量只是一个多维数组或列表。

可使用三个主要属性来识别它们−

等级 −它告诉张量的维度。它可以被理解为定义的张量中的张量的阶数或维度数。
类型 −它告诉与张量元素相关联的数据类型。它可以是一维、二维或n维张量。
形状 −它是行和列的数量。

我们将使用Illiad数据集，该数据集包含William Cowper、Edward (Earl of Derby)和Samuel Butler的三个翻译作品的文本数据。模型经过训练，当给出一行文本时，可识别翻译人。使用的文本文件已进行预处理。这包括删除文档标题和页脚、行号和章节标题。

我们使用Google Colaboratory运行下面的代码。Google Colab或Colaboratory可在浏览器上运行Python代码，不需要任何配置，并提供免费访问GPU（图形处理单元）。Collaboratory是在Jupyter Notebook之上构建的。以下是代码片段−

更多Python相关文章，请阅读：Python 教程

示例

print("Loading the Illiad dataset")
DIRECTORY_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/illiad/'
FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt']

print("Iterating through the name of the files")
for name in FILE_NAMES:
   text_dir = utils.get_file(name, origin=DIRECTORY_URL + name)

parent_dir = pathlib.Path(text_dir).parent
print("The list of files in the directory")
print(list(parent_dir.iterdir()))

代码来源 − https://www.tensorflow.org/tutorials/load_data/text

输出

Loading the Illiad dataset
Iterating through the name of the files
Downloading data from
https://storage.googleapis.com/download.tensorflow.org/data/illiad/cowper.txt
819200/815980 [==============================] - 0s 0us/step
Downloading data from
https://storage.googleapis.com/download.tensorflow.org/data/illiad/derby.txt
811008/809730 [==============================] - 0s 0us/step
Downloading data from
https://storage.googleapis.com/download.tensorflow.org/data/illiad/butler.txt
811008/807992 [==============================] - 0s 0us/step
The list of files in the directory
[PosixPath('/root/.keras/datasets/derby.txt'), PosixPath('/root/.keras/datasets/cowper.txt'),
PosixPath('/root/.keras/datasets/butler.txt')]
[ ]