如何使用Python将文本数据嵌入到维度向量中？

Tensorflow是由Google提供的机器学习框架。它是一种开源框架，与Python一起使用来实现算法、深度学习应用等。它在研究和生产中都被使用。

Keras是作为ONEIROS项目（开放式神经电子智能机器人操作系统）的一部分开发的。Keras是一个深度学习API，它是用Python编写的。它是一个高级API，具有有助于解决机器学习问题的高效接口。它在Tensorflow框架的基础上运行。它被设计成帮助快速实验。它提供了必要的抽象和构建块，这些抽象和构建块对于开发和封装机器学习解决方案至关重要。

Keras已经存在于Tensorflow包中。可以使用以下代码行访问它。

import tensorflow
from tensorflow import keras

Keras函数API帮助创建比使用顺序API创建的模型更灵活的模型。函数API可以处理具有非线性拓扑结构、可以共享层并且可以使用多个输入和输出的模型。深度学习模型通常是包含多个层的有向无环图（DAG），函数API帮助构建层的图形。

我们使用Google Colaboratory在下面的代码中运行。Google Colab或Colaboratory可以在浏览器中运行Python代码，不需要任何配置，并且免费访问GPU（图形处理单元）。Colaboratory建立在Jupyter Notebook之上。以下是代码片段，其中我们将标题中的每个单词嵌入到64维向量中 –

更多Python相关文章，请阅读：Python 教程

示例

print("Number of unique issue tags")
num_tags = 12
print("Size of vocabulary while preprocessing text data")
num_words = 10000
print("Number of classes for predictions")
num_classes = 4
title_input = keras.Input(
   shape=(None,), name="title"
)
print("Variable length int sequence")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(
   shape=(num_tags,), name="tags"
)
print("Embed every word in the title to a 64-dimensional vector")
title_features = layers.Embedding(num_words, 64)(title_input)
print("Embed every word into a 64-dimensional vector")
body_features = layers.Embedding(num_words, 64)(body_input)
print("Reduce sequence of embedded words into single 128-dimensional vector")
title_features = layers.LSTM(128)(title_features)
print("Reduce sequence of embedded words into single 132-dimensional vector")
body_features = layers.LSTM(32)(body_features)
print("Merge available features into a single vector by concatenating it")
x = layers.concatenate([title_features, body_features, tags_input])
print("Use logistic regression to predict the features")
priority_pred = layers.Dense(1, name="priority")(x)
department_pred = layers.Dense(num_classes, name="class")(x)
print("Instantiate a model that predicts priority and class")
model = keras.Model(
   inputs=[title_input, body_input, tags_input],
   outputs=[priority_pred, department_pred],
)

代码来源 – https://www.tensorflow.org/guide/keras/functional

输出

唯一问题标签数
文本数据预处理时词汇量的大小
预测的类别数
可变长度的整数序列
将标题中的每个单词嵌入到64维向量中
将每个单词嵌入到一个64维向量中
将嵌入的单词序列缩减为一个128维向量
将嵌入的单词序列缩减为一个132维向量
通过连接将可用功能合并为单个向量
使用逻辑回归来预测特征
实例化一个模型以预测优先级和类别