如何使用Python加载包含stackoverflow问题的数据集?
Tensorflow是由Google提供的机器学习框架。它是一个开源框架,与Python一起使用以实现算法、深度学习应用等。它被用于研究和生产目的。它具有优化技术,有助于快速执行复杂的数学操作。
这是因为它使用NumPy和多维数组。这些多维数组也称为“张量”。该框架支持使用深度神经网络。它高度可扩展,带有许多流行的数据集。它使用GPU计算并自动化资源管理。它带有各种机器学习库,并得到了良好的支持和文档。该框架具有运行深度神经网络模型、训练它们并创建相应数据集的应用程序的能力。
“tensorflow”包可以使用以下代码在Windows上安装 –
pip install tensorflow
我们使用Google Colaboratory运行下面的代码。Google Colab或Colaboratory可以在浏览器上运行Python代码,无需配置,并免费访问GPU(图形处理单元)。Collaboratory是在Jupyter Notebook上构建的。以下是使用Python加载包含stackoverflow问题的数据集的代码片段 –
更多Python相关文章,请阅读:Python 教程
示例
batch_size = 32
seed = 42
print("The training parameters have been defined")
raw_train_ds = preprocessing.text_dataset_from_directory(
train_dir,
batch_size=batch_size,
validation_split=0.25,
subset='training',
seed=seed)
for text_batch, label_batch in raw_train_ds.take(1):
for i in range(10):
print("Question: ", text_batch.numpy()[i][:100], '...')
print("Label:", label_batch.numpy()[i])
代码来源 – https://www.tensorflow.org/tutorials/load_data/text
输出
The training parameters have been defined
Found 8000 files belonging to 4 classes.
Using 6000 files for training.
Question: b'"my tester is going to the wrong constructor i am new to programming so if i ask a
question that can' ...
Label: 1
Question: b'"blank code slow skin detection this code changes the color space to lab and using a
threshold finds' ...
Label: 3
Question: b'"option and validation in blank i want to add a new option on my system where i
want to add two text' ...
Label: 1
Question: b'"exception: dynamic sql generation for the updatecommand is not supported against
a selectcommand th' ...
Label: 0
Question: b'"parameter with question mark and super in blank, i\'ve come across a method that
is formatted like t' ...
Label: 1
Question: b'call two objects wsdl the first time i got a very strange wsdl. ..i would like to call the
object (i' ...
Label: 0
Question: b'how to correctly make the icon for systemtray in blank using icon sizes of any
dimension for systemt' ...
Label: 0
Question: b'"is there a way to check a variable that exists in a different script than the original
one? i\'m try' ...
Label: 3
Question: b'"blank control flow i made a number which asks for 2 numbers with blank and
responds with the corre' ...
Label: 0
Question: b'"credentials cannot be used for ntlm authentication i am getting
org.apache.commons.httpclient.auth.' ...
Label: 1
说明
-
从磁盘中加载数据并准备成适合训练的形式。
-
使用“text_dataset_from_dataset”实用工具创建一个有标签的数据集。
-
“tf.Data”是一个强大的工具集合,用于构建输入管道。
-
向“text_dataset_from_dataset”实用工具传递目录结构。
-
将StackOverflow问题数据集分为训练集和测试集。
-
使用“validation_split”方法创建验证集。
-
标签只有0、1、2或3。
极客教程