使用神经网络的手写数字识别
简介
手写数字识别是图像识别的一部分,广泛用于深度学习中的计算机视觉。图像识别是深度学习中每个图像或视频相关任务的非常基本和初步的阶段之一。本文将概述手写数字识别,以及图像识别如何扩展到多类分类。
在继续之前,让我们了解二进制和多类图像分类之间的区别。
二元图像分类
在二元图像分类中,该模型有两个类别需要预测。例如,在猫和狗的分类中。
多类图像分类
在多类图像分类中,模型有两个以上的类来预测。例如,在FasnionMNIST或手写数字识别的分类中,我们有10个类别可以预测。
手写数字识别
这个任务是多类图像分类的一个案例,模型预测输入图像所属的0到9的数字之一。
在MNIST数字识别任务中,我们使用CNN网络来开发一个模型来识别手写数字。我们将下载MNIST数据集,其中包括60000张图像的训练集和10000张用于测试的图像。每张图片被裁剪成28×28像素,手写数字从0到9。
使用Python实现
示例
## Digit Recognition
import keras
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Sequential
from keras import backend as K
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.layers import Dense, Dropout, Flatten
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure
n_classes = 10
input_shape = (28, 28, 1)
batch_size = 128
num_classes = 10
epochs = 10
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
print("Training data shape {} , test data shape {}".format(X_train.shape, Y_train.shape))
img = X_train[1]
plt.imshow(img, cmap='gray')
plt.show()
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
Y_train = to_categorical(Y_train, n_classes)
Y_test = to_categorical(Y_test, n_classes)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('x_train shape:', X_train.shape)
print('train samples ',X_train.shape[0],)
print('test samples',X_test.shape[0])
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
history = model.fit(X_train, Y_train,batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(X_test, Y_test))
output_score = model.evaluate(X_test, Y_test, verbose=0)
print('Testing loss:', output_score[0])
print('Testing accuracy:', output_score[1])
输出
Training data shape (60000, 28, 28) , test data shape (60000,)
x_train shape: (60000, 28, 28, 1)
train samples 60000
test samples 10000
Epoch 1/10
469/469 [==============================] - 13s 10ms/step - loss: 2.2877 - accuracy: 0.1372 - val_loss: 2.2598 - val_accuracy: 0.2177
Epoch 2/10
469/469 [==============================] - 4s 9ms/step - loss: 2.2428 - accuracy: 0.2251 - val_loss: 2.2058 - val_accuracy: 0.3345
Epoch 3/10
469/469 [==============================] - 5s 10ms/step - loss: 2.1863 - accuracy: 0.3062 - val_loss: 2.1340 - val_accuracy: 0.4703
Epoch 4/10
469/469 [==============================] - 5s 10ms/step - loss: 2.1071 - accuracy: 0.3943 - val_loss: 2.0314 - val_accuracy: 0.5834
Epoch 5/10
469/469 [==============================] - 4s 9ms/step - loss: 1.9948 - accuracy: 0.4911 - val_loss: 1.8849 - val_accuracy: 0.6767
Epoch 6/10
469/469 [==============================] - 4s 10ms/step - loss: 1.8385 - accuracy: 0.5744 - val_loss: 1.6841 - val_accuracy: 0.7461
Epoch 7/10
469/469 [==============================] - 4s 10ms/step - loss: 1.6389 - accuracy: 0.6316 - val_loss: 1.4405 - val_accuracy: 0.7825
Epoch 8/10
469/469 [==============================] - 5s 10ms/step - loss: 1.4230 - accuracy: 0.6694 - val_loss: 1.1946 - val_accuracy: 0.8078
Epoch 9/10
469/469 [==============================] - 5s 10ms/step - loss: 1.2229 - accuracy: 0.6956 - val_loss: 0.9875 - val_accuracy: 0.8234
Epoch 10/10
469/469 [==============================] - 5s 11ms/step - loss: 1.0670 - accuracy: 0.7168 - val_loss: 0.8342 - val_accuracy: 0.8353
Testing loss: 0.8342439532279968
Testing accuracy: 0.8353000283241272
总结
在这篇文章中,我们研究了如何使用神经网络进行手写数字的识别。