TensorFlow如何用来加载花卉数据集并进行处理?

我们将使用包含数千张图片的花卉数据集。该数据集包含5个子目录，每个类别对应一个子目录。

更多Python相关文章，请阅读：Python 教程

一旦使用”get_file”方法下载了花卉数据集，它将被加载到环境中以供处理。加载器参数已明确说明，并将加载的数据分为训练和验证集。

我们使用Google Colaboratory来运行下面的代码。Google Colab或Colaboratory可以在浏览器上运行Python代码，无需任何配置，可以免费访问GPU（图形处理单元）。Colaboratory是在Jupyter Notebook的基础上构建的。

print("Loading parameters for the loader")
batch_size = 32
img_height = 180
img_width = 180

print("Preprocessing the image dataset using Keras")
print("Splitting dataset into training and validation set ")

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
   data_dir,
   validation_split=0.2,
   subset="training",
   seed=123,
   image_size=(img_height, img_width),
   batch_size=batch_size)

print("Splitting dataset into training and validation set ")
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
   data_dir,
   validation_split=0.2,
   subset="validation",
   seed=123,
   image_size=(img_height, img_width),
   batch_size=batch_size)

print("Printing the class names present in sub-directories")
class_names = train_ds.class_names
print(class_names)

代码来源: https://www.tensorflow.org/tutorials/load_data/images

输出

加载器的参数
使用Keras对图像数据集进行预处理
将数据集分为训练集和验证集
共计3670个文件，属于5类。
已使用2936个文件进行训练。
将数据集分为训练集和验证集
共计3670个文件，属于5类。
已使用734个文件进行验证。
打印子目录中的类别名
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']