如何使用Tensorflow和Python混洗预处理数据？

Tensorflow是由Google提供的机器学习框架。它是一个开源框架，与Python一起使用来实现算法、深度学习应用等。它被用于研究和生产目的。它具有优化技术，可帮助快速执行复杂的数学运算。这是因为它使用NumPy和多维数组。这些多维数组也称为“张量”。该框架支持与深度神经网络一起工作。

可以使用以下代码在Windows上安装‘tensorflow’包 –

pip install tensorflow

Tensor是TensorFlow中使用的数据结构。它有助于连接流程图中的边缘。这个流程图被称为“数据流图”。张量不过是一个多维数组或列表。

我们将使用Illiad数据集，其中包含William Cowper、Edward (Earl of Derby)和Samuel Butler的三个翻译作品的文本数据。模型的训练是为了在给出单行文本时识别翻译人。使用的文本文件已进行预处理。这包括删除文档标题和页脚、行号和章节标题。

我们使用Google Colaboratory来运行下面的代码。Google Colab或Colaboratory通过浏览器运行Python代码，需要零配置并免费访问GPU(图形处理器)。Colaboratory是基于Jupyter Notebook构建的。

更多Python相关文章，请阅读：Python 教程

示例

下面是代码片段 –

print("将标记的数据集合并并重排")
BUFFER_SIZE = 50000
BATCH_SIZE = 64
VALIDATION_SIZE = 5000
all_labeled_data = labeled_data_sets[0]
for labeled_dataset in labeled_data_sets[1:]:
   all_labeled_data = all_labeled_data.concatenate(labeled_dataset)
all_labeled_data = all_labeled_data.shuffle(
   BUFFER_SIZE, reshuffle_each_iteration=False)
print("显示一些输入数据样本")
for text, label in all_labeled_data.take(8):
   print("句子是 : ", text.numpy())
   print("标签是 :", label.numpy())

代码来自 – https://www.tensorflow.org/tutorials/load_data/text

输出

将标记的数据集合并并重排
显示一些输入数据样本
句子是 : b'But I have now both tasted food, and given'
标签是 : 0
句子是 : b'All these shall now be thine: but if the Gods'
标签是 : 1
句子是 : b'Their spiry summits waved. There, unperceived'
标签是 : 0
句子是 : b'"I pray you, would you show your love, dear friends,'
标签是 : 1
句子是 : b'Entering beneath the clavicle the point'
标签是 : 0
句子是 : b'But grief, his father lost, awaits him now,'
标签是 : 1
句子是 : b'in the fore-arm where the sinews of the elbow are united, whereon he'
标签是 : 2
句子是 : b'For, as I think, I have already chased'
标签是 : 0