如何使用Python和Tensorflow对与stackoverflow问题数据集相关的文本数据进行向量化？

Tensorflow 是由 Google 提供的机器学习框架。它是一个开源框架，与 Python 结合使用，用于实现算法、深度学习应用等等。它被用于研究和生产目的。它具有优化技术，可以帮助快速执行复杂的数学运算。

这是因为它使用了 NumPy 和多维数组。这些多维数组也被称为“张量”。该框架支持使用深度神经网络。它高度可扩展，并带有许多流行的数据集。它使用 GPU 计算并自动管理资源。它带有众多机器学习库并得到良好的支持和文档化。该框架具有运行深度神经网络模型、训练它们以及创建预测相应数据集相关特征的应用程序的能力。

可以使用以下代码在 Windows 上安装“tensorflow”软件包−

pip install tensorflow

张量是 TensorFlow 中使用的数据结构。它有助于连接流程图中的边缘。这个流程图被称为“数据流图”。张量只是一个多维数组或列表。

我们使用 Google Colaboratory 运行以下代码。Google Colab 或 Colaboratory 可以在浏览器中运行 Python 代码，不需要任何配置并免费访问 GPU（图形处理器）。Colaboratory 建立在 Jupyter Notebook 之上。

更多Python相关文章，请阅读：Python 教程

示例

下面是一个将文本数据向量化的代码片段−

print("定义向量化函数")
def int_vectorize_text(text, label):
   text = tf.expand_dims(text, -1)
   return int_vectorize_layer(text), label
print("检索数据集的一批次")
text_batch, label_batch = next(iter(raw_train_ds))
first_question, first_label = text_batch[0], label_batch[0]
print("问题是：", first_question)
print("标签是：", first_label)

print("二进制向量化后的问题是：",
   binary_vectorize_text(first_question, first_label)[0])
print("整数向量化后的问题是：",

   int_vectorize_text(first_question, first_label)[0])

代码来源− https://www.tensorflow.org/tutorials/load_data/text

输出

定义矢量化函数
获取数据集的一批数据
问题是: tf.Tensor(b'"function expected error in blank for dynamically created check box
when it is clicked i want to grab the attribute value.it is working in ie 8,9,10 but not working in ie
11,chrome shows function expected error..<input type=checkbox checked=\'checked\'
id=\'symptomfailurecodeid\' tabindex=\'54\' style=\'cursor:pointer;\' onclick=chkclickevt(this);
failurecodeid=""1"" >...function chkclickevt(obj) { .
alert(obj.attributes(""failurecodeid""));.}"\n', shape=(), dtype=string)
标签是: tf.Tensor(2, shape=(), dtype=int32)
将二进制向量化的问题是: tf.Tensor([[1. 1. 1. ... 0. 0. 0.]], shape=(1, 10000), dtype=float32)
将整数向量化的问题是: tf.Tensor(
[[ 37 464 65 7  16 12 879 262 181 448 44 10 6  700
   3  46  4 2085 2 473 1   6  156  7  478 1 25 20
  156 7  478 1  499 37 464 1 1846 1666 1  1  1  1
   1  1   1  1    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0 0 0
   0  0   0  0    0 0    0 0    0    0 0  0]], shape=(1, 250), dtype=int64)