본문 바로가기

AI_딥 러닝_시각지능

AI_파이썬_시각지능_CNN_SmallSize_notMNIST_2024

Letter recognition (small size)

Indeed, I once even proposed that the toughest challenge facing AI workers is to answer the question: “What are the letters ‘A’ and ‘I’? - Douglas R. Hofstadter (1995)

notMNIST

Data source: notMNIST (you need to download notMNIST_small.mat file):

some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts.

Approaching 0.5% error rate on notMNIST_small would be very impressive. If you run your algorithm on this dataset, please let me know your results.

So, why not MNIST?

Many introductions to image classification with deep learning start with MNIST, a standard dataset of handwritten digits. This is unfortunate. Not only does it not produce a “Wow!” effect or show where deep learning shines, but it also can be solved with shallow machine learning techniques. In this case, plain k-Nearest Neighbors produces more than 97% accuracy (or even 99.5% with some data preprocessing!). Moreover, MNIST is not a typical image dataset – and mastering it is unlikely to teach you transferable skills that would be useful for other classification problems

Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision]. - François Chollet’s tweet


Keras Update


!pip install keras-nightly
Collecting keras-nightly
  Downloading keras_nightly-3.6.0.dev2024101603-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (1.4.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (1.26.4)
Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (13.9.2)
Requirement already satisfied: namex in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (0.0.8)
Requirement already satisfied: h5py in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (3.11.0)
Requirement already satisfied: optree in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (0.13.0)
Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (0.4.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from keras-nightly) (24.1)
Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from optree->keras-nightly) (4.12.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich->keras-nightly) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich->keras-nightly) (2.18.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich->keras-nightly) (0.1.2)
Downloading keras_nightly-3.6.0.dev2024101603-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 8.9 MB/s eta 0:00:00
Installing collected packages: keras-nightly
Successfully installed keras-nightly-3.6.0.dev2024101603

!wget http://yaroslavvb.com/upload/notMNIST/notMNIST_small.mat
--2024-10-23 07:35:39--  http://yaroslavvb.com/upload/notMNIST/notMNIST_small.mat
Resolving yaroslavvb.com (yaroslavvb.com)... 129.121.4.193
Connecting to yaroslavvb.com (yaroslavvb.com)|129.121.4.193|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 117586976 (112M)
Saving to: ‘notMNIST_small.mat’

notMNIST_small.mat  100%[===================>] 112.14M  67.5MB/s    in 1.7s    

2024-10-23 07:35:41 (67.5 MB/s) - ‘notMNIST_small.mat’ saved [117586976/117586976]

import numpy as np
import matplotlib.pyplot as plt

from scipy import io

Data Loading


data = io.loadmat('notMNIST_small.mat')

x = data['images']
y = data['labels']

x.shape, y.shape
((28, 28, 18724), (18724,))

resolution = 28
classes = 10

x = np.transpose(x, (2, 0, 1))
print(x.shape)
x = x.reshape( (-1, resolution, resolution, 1) )
(18724, 28, 28)

# sample, x, y, channel
x.shape, y.shape
((18724, 28, 28, 1), (18724,))

  • 데이터 살펴보기

rand_i = np.random.randint(0, x.shape[0])

plt.title( f'idx: {rand_i} , y: {"ABCDEFGHIJ"[ int(y[rand_i]) ]}' )
plt.imshow( x[rand_i, :, :, 0], cmap='gray' )
plt.show()


rows = 5
fig, axes = plt.subplots(rows, classes, figsize=(classes,rows))

for letter_id in range(classes) :
    letters = x[y==letter_id]      # 0부터 9까지 각 숫자에 맞는 array가 letters에 들어간다.
    letters_len = len(letters)

    for row_i in range(rows) :
        axe = axes[row_i, letter_id]
        axe.imshow( letters[np.random.randint(letters_len)], cmap='gray', interpolation='none')
        axe.axis('off')


Data Preprocessing


  • Data split
    • training set : test set = 8 : 2
    • training set : validation set = 8 : 2
    • 재현을 위한 난수 고정 : 2024

from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2, random_state=2024)

train_x.shape, test_x.shape
((14979, 28, 28, 1), (3745, 28, 28, 1))

train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size=0.2, random_state= 2024 )

train_x.shape, val_x.shape, test_x.shape
((11983, 28, 28, 1), (2996, 28, 28, 1), (3745, 28, 28, 1))

  • Scaling
    • min-max scaling

train_x.max(), test_x.min()
(255.0, 0.0)

train_x = train_x / 255.0
test_x = test_x / 255.0
val_x = val_x / 255.0

train_x.max(), test_x.min(), val_x.max()
(1.0, 0.0, 1.0)

  • One-hot encoding

len(np.unique(train_y))
10

from keras.utils import to_categorical

train_y = to_categorical(train_y, 10)
test_y = to_categorical(test_y, 10)
val_y = to_categorical(val_y, 10)
  • Data shape 재확인
len(np.unique(train_y)), train_y.shape, test_y.shape, val_y.shape
(2, (11983, 10), (3745, 10), (2996, 10))

Image Precessing&Augmentation Layers

import keras

 

aug_layers = [keras.layers.RandomRotation(factor=(-0.3,0.3)),
              keras.layers.RandomTranslation(height_factor=(-0.3,0.3), width_factor=(-0.3,0.3)),
              keras.layers.RandomZoom(height_factor=(-0.2,0.2), width_factor=(-0.2,0.2)),
              keras.layers.RandomFlip(mode='horizontal_and_vertical')
              ]

def image_augmentation(images):
    for layer in aug_layers:
        images = layer(images)
    return images

aug_imgs = image_augmentation(train_x[0])

rand_n = np.random.randint(0, 11982)

for i in range(9) :
    aug_imgs = image_augmentation(train_x[rand_n])

    plt.imshow( np.array(aug_imgs), cmap='gray' )
    plt.axis('off')
    plt.show()

 


Modeling : CNN + Image Precessing&Augmentation


train_x.shape, train_y.shape
((11983, 28, 28, 1), (11983, 10))

keras.utils.clear_session()

il = keras.layers.Input(shape=(28, 28, 1))

hl = keras.layers.RandomRotation(factor=(-0.1,0.1))(il)
hl = keras.layers.RandomTranslation(height_factor=(-0.1,0.1), width_factor=(-0.1,0.1))(hl)
hl = keras.layers.RandomZoom(height_factor=(-0.1,0.1), width_factor=(-0.1,0.1))(hl)
#hl = keras.layers.RandomFlip(mode='horizontal_and_vertical')(hl)

hl = keras.layers.Conv2D(64,3,1,'same', activation = 'relu')(hl)
hl = keras.layers.Conv2D(64,3,1,'same', activation = 'relu')(hl)
hl = keras.layers.MaxPool2D(2,2)(hl)

hl = keras.layers.Conv2D(128,3,1,'same', activation = 'relu')(hl)
hl = keras.layers.Conv2D(128,3,1,'same', activation = 'relu')(hl)
hl = keras.layers.MaxPool2D(2,2)(hl)

hl = keras.layers.Flatten()(hl)
hl = keras.layers.Dense(1024, activation = 'relu')(hl)
ol = keras.layers.Dense(10, activation = 'softmax')(hl)

model = keras.models.Model(il, ol)

model.summary()


from keras.callbacks import EarlyStopping

model.compile(optimizer = 'Adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

  • Early Stopping

es = EarlyStopping(min_delta = 0, patience = 5, verbose = 1, restore_best_weights=True)

  • .fit( )

model.fit(train_x, train_y, validation_split=0.2, epochs=500, callbacks = [es], verbose=1)
Epoch 1/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 8s 16ms/step - accuracy: 0.6309 - loss: 1.1173 - val_accuracy: 0.8736 - val_loss: 0.4310
Epoch 2/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 12ms/step - accuracy: 0.8732 - loss: 0.4284 - val_accuracy: 0.9153 - val_loss: 0.3054
Epoch 3/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.8910 - loss: 0.3468 - val_accuracy: 0.9274 - val_loss: 0.2704
Epoch 4/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 12ms/step - accuracy: 0.9096 - loss: 0.2981 - val_accuracy: 0.9199 - val_loss: 0.2641
Epoch 5/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.9141 - loss: 0.2726 - val_accuracy: 0.9270 - val_loss: 0.2567
Epoch 6/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - accuracy: 0.9294 - loss: 0.2385 - val_accuracy: 0.9232 - val_loss: 0.2572
Epoch 7/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 12ms/step - accuracy: 0.9241 - loss: 0.2350 - val_accuracy: 0.9237 - val_loss: 0.2413
Epoch 8/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 13ms/step - accuracy: 0.9310 - loss: 0.2157 - val_accuracy: 0.9153 - val_loss: 0.2871
Epoch 9/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.9424 - loss: 0.1842 - val_accuracy: 0.9295 - val_loss: 0.2520
Epoch 10/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 6s 13ms/step - accuracy: 0.9435 - loss: 0.1779 - val_accuracy: 0.9282 - val_loss: 0.2577
Epoch 11/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.9460 - loss: 0.1669 - val_accuracy: 0.9345 - val_loss: 0.2567
Epoch 12/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.9437 - loss: 0.1695 - val_accuracy: 0.9320 - val_loss: 0.2354
Epoch 13/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.9463 - loss: 0.1599 - val_accuracy: 0.9387 - val_loss: 0.2155
Epoch 14/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.9534 - loss: 0.1468 - val_accuracy: 0.9341 - val_loss: 0.2245
Epoch 15/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - accuracy: 0.9578 - loss: 0.1349 - val_accuracy: 0.9366 - val_loss: 0.2412
Epoch 16/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 13ms/step - accuracy: 0.9593 - loss: 0.1305 - val_accuracy: 0.9316 - val_loss: 0.2442
Epoch 17/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.9569 - loss: 0.1288 - val_accuracy: 0.9316 - val_loss: 0.2247
Epoch 18/500
300/300 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.9583 - loss: 0.1310 - val_accuracy: 0.9307 - val_loss: 0.2452
Epoch 18: early stopping
Restoring model weights from the end of the best epoch: 13.
<keras.src.callbacks.history.History at 0x7fa29010f190>

  • .evaluate( )

model.evaluate(val_x, val_y)
94/94 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.9537 - loss: 0.1906
[0.1934709995985031, 0.9492656588554382]

  • .predict( )

y_pred = model.predict(test_x)
118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step

# 원핫 인코딩 한 것을 다시 묶어주는 코드
# 평가 지표 및 실제 데이터 확인을 위해 필요

y_pred_arg = np.argmax(y_pred, axis=1)
test_y_arg = np.argmax(test_y, axis=1)

  • 평가 지표

from sklearn.metrics import accuracy_score, classification_report
accuracy_score(test_y_arg, y_pred_arg)
0.9455273698264353

print( classification_report(test_y_arg, y_pred_arg) )
              precision    recall  f1-score   support

           0       0.95      0.93      0.94       360
           1       0.96      0.95      0.95       382
           2       0.97      0.95      0.96       385
           3       0.90      0.96      0.93       373
           4       0.93      0.94      0.94       364
           5       0.95      0.97      0.96       392
           6       0.92      0.94      0.93       390
           7       0.98      0.96      0.97       364
           8       0.95      0.90      0.93       360
           9       0.96      0.95      0.96       375

    accuracy                           0.95      3745
   macro avg       0.95      0.95      0.95      3745
weighted avg       0.95      0.95      0.95      3745

Visualization


  • 실제 데이터 확인

letters_str = "ABCDEFGHIJ"

rand_idx = np.random.randint(0, len(y_pred_arg))
test_idx = test_y_arg[rand_idx]
pred_idx = y_pred_arg[rand_idx]
class_prob = np.floor( y_pred[rand_idx]*100 )

print(f'idx = {rand_idx}')
print(f'해당 인덱스의 이미지는 {letters_str[test_idx]}')
print(f'모델의 예측 : {letters_str[pred_idx]}')
print(f'모델의 클래스별 확률 : ')
print('-------------------')
for idx, val in enumerate(letters_str) :
    print(val, class_prob[idx])
print('=================================================')

if test_y_arg[rand_idx] == y_pred_arg[rand_idx] :
    print('정답')
else :
    print('땡')

plt.imshow(test_x[rand_idx], cmap='gray')
plt.show()


  • 틀린 이미지만 확인해보기

len(test_y)
3745

temp = (test_y_arg == y_pred_arg)
false_idx = np.where(temp==False)[0]
false_len = len(false_idx)
false_len
606

letters_str = "ABCDEFGHIJ"

rand_idx = false_idx[np.random.randint(0, false_len)]
test_idx = test_y_arg[rand_idx]
pred_idx = y_pred_arg[rand_idx]
class_prob = np.floor( y_pred[rand_idx]*100 )

print(f'idx = {rand_idx}')
print(f'해당 인덱스의 이미지는 {letters_str[test_idx]}')
print(f'모델의 예측 : {letters_str[pred_idx]}')
print(f'모델의 클래스별 확률 : ')
print('-------------------')
for idx, val in enumerate(letters_str) :
    print(val, class_prob[idx])
print('=================================================')

if test_y_arg[rand_idx] == y_pred_arg[rand_idx] :
    print('정답')
else :
    print('땡')

plt.imshow(test_x[rand_idx], cmap='gray')
plt.show()