如何使用Python迭代遍历数据集并显示样本数据？

Tensorflow是Google提供的机器学习框架。它是一个开源框架，与Python结合使用来实现算法、深度学习应用等等。它在研究和生产中被广泛使用。它具有优化技术，可帮助快速执行复杂的数学运算。这是因为它使用NumPy和多维数组。这些多维数组也被称为“张量”。该框架支持深度神经网络的工作。它具有高度可扩展性，并附带许多流行的数据集。它使用GPU计算并自动管理资源。它配有大量的机器学习库，并得到了良好的支持和文档。该框架具有运行深度神经网络模型、训练它们并创建可预测相应数据集特征的应用程序的能力。

可以使用以下代码在Windows上安装’tensorflow’软件包 −

pip install tensorflow

张量是TensorFlow中使用的数据结构。它有助于将流程图中的边连接在一起。这个流程图被称为“数据流图”。张量只是多维数组或列表。它们可以使用三个主要属性进行标识 −

等级（Rank） − 它告诉张量的维度。它可以被理解为张量的顺序或已定义的张量中的维数的数量。
类型（Type） − 它告诉张量元素关联的数据类型。它可以是一维、二位或n维张量。
形状（Shape） − 它是行数和列数的总和。

我们使用Google Colaboratory来运行以下代码。Google Colab或Colaboratory可以在浏览器上运行Python代码，无需配置，并且可以免费访问GPU（图形处理器）。Colaboratory是建立在Jupyter Notebook之上的。

更多Python相关文章，请阅读：Python 教程

示例

print("Iterating through the training data")
for i, label in enumerate(raw_train_ds.class_names):
   print("Label", i, "maps to", label)
print("The training parameters have been defined")
raw_val_ds = preprocessing.text_dataset_from_directory(
   train_dir,
   batch_size=batch_size,
   validation_split=0.25,
   subset='validation',
   seed=seed)
print("The test dataset is being prepared")
test_dir = dataset_dir/'test'
raw_test_ds = preprocessing.text_dataset_from_directory(
   test_dir, batch_size=batch_size)

代码来源 − https://www.tensorflow.org/tutorials/load_data/text

输出

Iterating through the training data
Label 0 maps to csharp
Label 1 maps to java
Label 2 maps to javascript
Label 3 maps to python
The training parameters have been defined
Found 8000 files belonging to 4 classes.
Using 2000 files for validation.
The test dataset is being prepared
Found 8000 files belonging to 4 classes.