如何使用Tensorflow和Python在stackoverflow问题数据集上应用文本向量化？

Tensorflow是由Google提供的机器学习框架。它是一个开源框架，与Python一起使用来实现算法、深度学习应用等等。它被用于研究和生产目的。

可以使用以下代码在Windows上安装“tensorflow”包 −

pip install tensorflow

张量是TensorFlow中使用的数据结构。它有助于连接流程图中的边缘。这个流程图被称为“数据流图”。张量只是一个多维数组或列表。

我们使用Google Colaboratory来运行以下代码。Google Colab或Colaboratory帮助在浏览器上运行Python代码，需要零配置和免费访问GPU（图形处理单元）。Colaboratory是基于Jupyter Notebook构建的。

更多Python相关文章，请阅读：Python 教程

例子

以下是代码片段 −

print("1234 ---> ", int_vectorize_layer.get_vocabulary()[1289])
print("321 ---> ", int_vectorize_layer.get_vocabulary()[313])
print("Vocabulary size is : {}".format(len(int_vectorize_layer.get_vocabulary())))

print("The text vectorization is applied to the training dataset")
binary_train_ds = raw_train_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the validation dataset")
binary_val_ds = raw_val_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the test dataset")
binary_test_ds = raw_test_ds.map(binary_vectorize_text)

int_train_ds = raw_train_ds.map(int_vectorize_text)
int_val_ds = raw_val_ds.map(int_vectorize_text)
int_test_ds = raw_test_ds.map(int_vectorize_text)

代码来源 − https://www.tensorflow.org/tutorials/load_data/text

输出

1234 ---> substring
321 ---> 20
Vocabulary size is : 10000
The text vectorization is applied to the training dataset
The text vectorization is applied to the validation dataset
The text vectorization is applied to the test dataset