Michael Tang, Joy Wang, Lucinda Zhou
Throughout the years, there has been a significant increase in utilizing autonomous machines in the workforce. Thanks to machine learning, these machines are more advanced than ever! Machine learning is a way to provide more advanced functions for robots instead of the traditional path or rule based control strategies.There are now a variety of communities and fields where workers have been replaced with machines. For example, in farming, there exists a variety of agricultural robots that harvest and even tend to crops regularly. While not perfect, robots are quick and efficient at their jobs, and save humans from performing redundant, tiring, and even dangerous tasks.
For our final project, our group has decided to create a classifier that can classify different types of fruits and vegetables. This type of classifier has many real-life applications. For example, we may choose to train an agricultural robot with this classifier so that it may assist with harvesting crops. As long as the robot is equipped with a camera to take pictures or videos, it may use this classifier to recognize and count the fruits and vegetables it sees in a cluttered field or greenhouse. On the other hand, we could also train a robot to assist workers in grocery stores. A robot could routinely inspect the fruit and vegetable aisles in order to identify out-of-place or understocked items using this image classifier, which saves humans from conducting these mundane tasks. We see many possible applications for our classifier, and we are excited to be able to apply what we have learned about machine learning to produce something interesting and relevant!
The dataset we chose to classify was the Fruits 360 dataset from Kaggle (https://www.kaggle.com/datasets/moltean/fruits). This dataset consists of 90380 images of 131 different fruits and vegetables sourced by the author from stores and his own garden. The pictures were taken with a Logitech C920 webcam with a piece of white paper as the background. To account for the variations in the images due to different lighting conditions, the background of the images were manually edited to be completely white. The original dataset was too large for our algorithm to run in a reasonable amount of time, so we tested our implementations on a subset of the data that contains 6231 images of 24 different fruits and vegetables. The images are scaled down so that they are 100x100 pixels in size. The images of each fruit are rotated in numerous different ways so that there is a larger variety of data to train on for each type of fruit and vegetable.
To classify the images from our dataset, we used a convolutional neural network (CNN). CNN’s work in such a way where they can work directly with raw images without any preprocessing. The input image passes through a convolutional, pooling and classification layer. In the convolutional layer, a filter matrix is passed over a layer of pixels one step at a time, resulting in a feature map. This feature map is then passed into the pooling layer where the size of it is reduced to make future processing faster. These layers may be repeated multiple times in different implementations. Afterwards, the pooled feature map is flattened and passed through an activation function where the image is then assigned a class. We will be using different activation functions in the last layer (ReLU, Sigmoid, Softmax, and Tanh) and seeing which algorithm performs the best. Source: https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
First, we imported all of the libraries we needed.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers
from keras.models import Sequential
Then we acquired the datasets that we'll be building our classifier off of.
path_to_training_directories = r".\fruit360\fruits-360-original-size\fruits-360-original-size\Training"
path_to_test_images = r".\fruit360\fruits-360-original-size\fruits-360-original-size\Test"
batch_size = 32
img_height = 256
img_width = 256
train_ds = tf.keras.utils.image_dataset_from_directory(
path_to_training_directories,
seed=123,
batch_size=batch_size)
class_names = train_ds.class_names
print(class_names)
val_ds = tf.keras.utils.image_dataset_from_directory(
r".\fruit360\fruits-360-original-size\fruits-360-original-size\Validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
Found 6231 files belonging to 24 classes. ['apple_6', 'apple_braeburn_1', 'apple_crimson_snow_1', 'apple_golden_1', 'apple_golden_2', 'apple_golden_3', 'apple_granny_smith_1', 'apple_hit_1', 'apple_pink_lady_1', 'apple_red_1', 'apple_red_2', 'apple_red_3', 'apple_red_delicios_1', 'apple_red_yellow_1', 'apple_rotten_1', 'cabbage_white_1', 'carrot_1', 'cucumber_1', 'cucumber_3', 'eggplant_violet_1', 'pear_1', 'pear_3', 'zucchini_1', 'zucchini_dark_1'] Found 3114 files belonging to 24 classes.
Let’s test the tutorial model from tensorflow keras site. (https://www.tensorflow.org/tutorials/images/classification) This model first rescales each of the RGB values of the image from 0-255 to 0-1. Then it runs through 3 rounds of CNN layers and max pooling layers. Finally, it goes through a fully connected neural network that outputs to 128 neurons, and then another fully connected network that outputs to the number of classes we have.
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
After constructing our model, we have to compile it.
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Let's see the summary
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 256, 256, 3) 0
conv2d (Conv2D) (None, 256, 256, 16) 448
max_pooling2d (MaxPooling2D (None, 128, 128, 16) 0
)
conv2d_1 (Conv2D) (None, 128, 128, 32) 4640
max_pooling2d_1 (MaxPooling (None, 64, 64, 32) 0
2D)
conv2d_2 (Conv2D) (None, 64, 64, 64) 18496
max_pooling2d_2 (MaxPooling (None, 32, 32, 64) 0
2D)
flatten (Flatten) (None, 65536) 0
dense (Dense) (None, 128) 8388736
dense_1 (Dense) (None, 24) 3096
=================================================================
Total params: 8,415,416
Trainable params: 8,415,416
Non-trainable params: 0
_________________________________________________________________
This model has more than 8 million parameters to train. Let's run it and see how long it takes.
Now we can run it
epochs=4
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/4 195/195 [==============================] - 88s 438ms/step - loss: 0.6434 - accuracy: 0.8126 - val_loss: 0.0969 - val_accuracy: 0.9695 Epoch 2/4 195/195 [==============================] - 82s 424ms/step - loss: 0.0273 - accuracy: 0.9912 - val_loss: 6.8223e-04 - val_accuracy: 1.0000 Epoch 3/4 195/195 [==============================] - 82s 420ms/step - loss: 4.4758e-04 - accuracy: 1.0000 - val_loss: 2.5450e-04 - val_accuracy: 1.0000 Epoch 4/4 195/195 [==============================] - 81s 417ms/step - loss: 1.9298e-04 - accuracy: 1.0000 - val_loss: 1.3936e-04 - val_accuracy: 1.0000
That took a long time (5m 33.9s), but the accuracy seems high, let's plot the accuracies to get a better look.
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
After 2 epochs, we reach 100% testing and validation accuracy. However, there's something weird. Why is validation accuracy higher than training accuracy, and validation loss lower than training loss? After some research, (consulting of stack overflow and quora) there could be a few reasons for this.
Potential reason number 1: Dropout
Used during training and not during validation, and makes training worse
Counter, we didn’t use this
Potential reason number 2: Data leak
Images from training are being used in validation
The images for training, testing, and validation are in 3 different folders, so they can't be any leaks
However, what could simulate image leak is that the images in validation are just very similar to the images in the training dataset. This leads to our hypothesis that classifying fruits is just very easy for the network to solve.
To test this hypothesis, let’s run some simpler models to see if it can get these seemingly impossible results.
For example, let's run a model with only one CNN layer
num_classes = len(class_names)
oneCNNmodel = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(num_classes)
])
oneCNNmodel.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
oneCNNmodel.summary()
Model: "sequential_15"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_15 (Rescaling) (None, 256, 256, 3) 0
conv2d_19 (Conv2D) (None, 256, 256, 16) 448
max_pooling2d_23 (MaxPoolin (None, 128, 128, 16) 0
g2D)
flatten_15 (Flatten) (None, 262144) 0
dense_17 (Dense) (None, 24) 6291480
=================================================================
Total params: 6,291,928
Trainable params: 6,291,928
Non-trainable params: 0
_________________________________________________________________
epochs=4
history = oneCNNmodel.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/4 195/195 [==============================] - 40s 207ms/step - loss: 3.9453 - accuracy: 0.8058 - val_loss: 0.0378 - val_accuracy: 0.9933 Epoch 2/4 195/195 [==============================] - 40s 206ms/step - loss: 0.0137 - accuracy: 0.9984 - val_loss: 0.0034 - val_accuracy: 1.0000 Epoch 3/4 195/195 [==============================] - 40s 206ms/step - loss: 0.0023 - accuracy: 1.0000 - val_loss: 0.0016 - val_accuracy: 1.0000 Epoch 4/4 195/195 [==============================] - 40s 207ms/step - loss: 0.0012 - accuracy: 1.0000 - val_loss: 8.9128e-04 - val_accuracy: 1.0000
That took less time (2m 41.1s), but again, it seems as though validation was better than training. Let's look at the plots to confirm.
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
We have the same issue pointed out during our presentation. Validation accuracy is higher than training. Weirdly, our research led us to twitter: https://twitter.com/aureliengeron/status/1110839223878184960
Our hypothesis is that classifying fruits is an easy problem for neural networks. This could result in validation showing better results than training because validation is done halfway into the epoch (from the twitter post). Therefore, if the network only gets better over time due to the problem being easy, validation will always be better than the training because it's using a better trained network (the network has half an epoch worth of training when it tries to validate).
Let's do one final test for this hypothesis. Just a plain fully connected neural network, no CNN at all. CNN is the go to layer for images, but if a plain fully connected neural network can perfecly classify our fruits, then the problem is defintely too easy.
num_classes = len(class_names)
denseNN = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Flatten(),
layers.Dense(num_classes)
])
denseNN.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
denseNN.summary()
Model: "sequential_16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_16 (Rescaling) (None, 256, 256, 3) 0
flatten_16 (Flatten) (None, 196608) 0
dense_18 (Dense) (None, 24) 4718616
=================================================================
Total params: 4,718,616
Trainable params: 4,718,616
Non-trainable params: 0
_________________________________________________________________
epochs=18
history = denseNN.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/18 195/195 [==============================] - 4s 21ms/step - loss: 35.3395 - accuracy: 0.6176 - val_loss: 2.4790 - val_accuracy: 0.8783 Epoch 2/18 195/195 [==============================] - 4s 22ms/step - loss: 2.0638 - accuracy: 0.8893 - val_loss: 2.3833 - val_accuracy: 0.9098 Epoch 3/18 195/195 [==============================] - 5s 24ms/step - loss: 3.1651 - accuracy: 0.8864 - val_loss: 1.2465 - val_accuracy: 0.8950 Epoch 4/18 195/195 [==============================] - 4s 21ms/step - loss: 2.4558 - accuracy: 0.9257 - val_loss: 10.3081 - val_accuracy: 0.7781 Epoch 5/18 195/195 [==============================] - 4s 20ms/step - loss: 3.1956 - accuracy: 0.9114 - val_loss: 0.7584 - val_accuracy: 0.9557 Epoch 6/18 195/195 [==============================] - 4s 22ms/step - loss: 0.1325 - accuracy: 0.9889 - val_loss: 0.2913 - val_accuracy: 0.9769 Epoch 7/18 195/195 [==============================] - 4s 22ms/step - loss: 2.6679 - accuracy: 0.9127 - val_loss: 4.2417 - val_accuracy: 0.9107 Epoch 8/18 195/195 [==============================] - 4s 21ms/step - loss: 2.5331 - accuracy: 0.9344 - val_loss: 5.4481 - val_accuracy: 0.9017 Epoch 9/18 195/195 [==============================] - 4s 20ms/step - loss: 1.7081 - accuracy: 0.9570 - val_loss: 2.3279 - val_accuracy: 0.9454 Epoch 10/18 195/195 [==============================] - 4s 22ms/step - loss: 0.9529 - accuracy: 0.9756 - val_loss: 0.0997 - val_accuracy: 0.9942 Epoch 11/18 195/195 [==============================] - 4s 22ms/step - loss: 0.0297 - accuracy: 0.9974 - val_loss: 4.5497e-05 - val_accuracy: 1.0000 Epoch 12/18 195/195 [==============================] - 4s 21ms/step - loss: 3.8046e-06 - accuracy: 1.0000 - val_loss: 4.4938e-05 - val_accuracy: 1.0000 Epoch 13/18 195/195 [==============================] - 4s 21ms/step - loss: 3.5852e-08 - accuracy: 1.0000 - val_loss: 4.4891e-05 - val_accuracy: 1.0000 Epoch 14/18 195/195 [==============================] - 4s 21ms/step - loss: 3.4972e-08 - accuracy: 1.0000 - val_loss: 4.4833e-05 - val_accuracy: 1.0000 Epoch 15/18 195/195 [==============================] - 4s 21ms/step - loss: 3.4015e-08 - accuracy: 1.0000 - val_loss: 4.4765e-05 - val_accuracy: 1.0000 Epoch 16/18 195/195 [==============================] - 4s 21ms/step - loss: 3.3001e-08 - accuracy: 1.0000 - val_loss: 4.4679e-05 - val_accuracy: 1.0000 Epoch 17/18 195/195 [==============================] - 4s 22ms/step - loss: 3.1968e-08 - accuracy: 1.0000 - val_loss: 4.4592e-05 - val_accuracy: 1.0000 Epoch 18/18 195/195 [==============================] - 4s 22ms/step - loss: 3.0897e-08 - accuracy: 1.0000 - val_loss: 4.4498e-05 - val_accuracy: 1.0000
Although it takes more epochs, we can still achieve 100% training and validation accuracy. But do we still see the validation > training anomaly?
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
It's as black and white as we can see that the validation is sometimes better than the training, and sometimes worse. It's more varied, but still very very close which is unusual. Another reason cited for validation sometimes performing better or equal to training is if the validation set is too small. In our case, the validation set was seperate and we did not split the data ourselves. In fact, the validation set is about half the size of the training set, so we don't believe this the issue.
Let's see how all three models perform on test data.
test_ds = tf.keras.utils.image_dataset_from_directory(
path_to_test_images,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
results = model.evaluate(test_ds, batch_size=128)
print("Tutorial model with 3 CNN's and 2 Dense NN's: test loss, test acc:", results)
results = oneCNNmodel.evaluate(test_ds, batch_size=128)
print("Model with 1 CNN and 1 Dense NN: test loss, test acc:", results)
results = denseNN.evaluate(test_ds, batch_size=128)
print("1 Dense NN Model : test loss, test acc:", results)
Found 3110 files belonging to 24 classes. 98/98 [==============================] - 9s 93ms/step - loss: 1.3554e-04 - accuracy: 1.0000 Tutorial model with 3 CNN's and 2 Dense NN's: test loss, test acc: [0.000135544512886554, 1.0] 98/98 [==============================] - 6s 61ms/step - loss: 9.0204e-04 - accuracy: 1.0000 Model with 1 CNN and 1 Dense NN: test loss, test acc: [0.0009020409197546542, 1.0] 98/98 [==============================] - 2s 16ms/step - loss: 4.5574e-08 - accuracy: 1.0000 1 Dense NN Model : test loss, test acc: [4.557389488013541e-08, 1.0]
All 3 models achieved 100% accuracy, and the only difference was the number of epochs it took to train. Our conclusion is that this is an easy task for neural networks, so we will use the number of epochs and the time it takes to train each epoch as a way to judge the effectiveness of different configurations of our model.
So let's try a network with one CNN layYeser but with a different activation function to compare.
oneCNNmodelSigmoid = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='sigmoid'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(num_classes)
])
oneCNNmodelSigmoid.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
oneCNNmodelSigmoid.summary()
Model: "sequential_18"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_18 (Rescaling) (None, 256, 256, 3) 0
conv2d_21 (Conv2D) (None, 256, 256, 16) 448
max_pooling2d_25 (MaxPoolin (None, 128, 128, 16) 0
g2D)
flatten_18 (Flatten) (None, 262144) 0
dense_20 (Dense) (None, 24) 6291480
=================================================================
Total params: 6,291,928
Trainable params: 6,291,928
Non-trainable params: 0
_________________________________________________________________
epochs=10
history = oneCNNmodelSigmoid.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/10 195/195 [==============================] - 44s 225ms/step - loss: 85.1649 - accuracy: 0.2696 - val_loss: 1.9925 - val_accuracy: 0.6082 Epoch 2/10 195/195 [==============================] - 44s 226ms/step - loss: 2.6711 - accuracy: 0.6466 - val_loss: 2.8921 - val_accuracy: 0.6599 Epoch 3/10 195/195 [==============================] - 44s 225ms/step - loss: 1.6803 - accuracy: 0.7593 - val_loss: 2.4335 - val_accuracy: 0.8234 Epoch 4/10 195/195 [==============================] - 44s 226ms/step - loss: 1.3157 - accuracy: 0.8186 - val_loss: 0.5476 - val_accuracy: 0.8513 Epoch 5/10 195/195 [==============================] - 44s 225ms/step - loss: 0.6971 - accuracy: 0.8832 - val_loss: 0.3294 - val_accuracy: 0.9204 Epoch 6/10 195/195 [==============================] - 44s 225ms/step - loss: 0.8125 - accuracy: 0.8827 - val_loss: 0.2002 - val_accuracy: 0.9560 Epoch 7/10 195/195 [==============================] - 44s 225ms/step - loss: 0.3333 - accuracy: 0.9459 - val_loss: 0.4959 - val_accuracy: 0.9162 Epoch 8/10 195/195 [==============================] - 44s 226ms/step - loss: 0.2425 - accuracy: 0.9477 - val_loss: 0.0805 - val_accuracy: 0.9772 Epoch 9/10 195/195 [==============================] - 44s 225ms/step - loss: 0.5293 - accuracy: 0.9485 - val_loss: 0.5592 - val_accuracy: 0.8838 Epoch 10/10 195/195 [==============================] - 45s 228ms/step - loss: 0.5588 - accuracy: 0.9212 - val_loss: 0.0155 - val_accuracy: 0.9942
This took (7m 20s). As expected, ReLU is much faster comptuationally, and it seems that the sigmoid could not reach the expected 100% accuracy. This is also expected because the main argument for ReLU over sigmoid is that ReLU avoids the vanishing gradient problem.
We can run this model for a few more epochs to see, but we are expecting that the accuracy will not go up that much as the gradient may have vanished, so the internal weights are not updating by much anymore.
epochs=10
history2 = oneCNNmodelSigmoid.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/10 195/195 [==============================] - 44s 225ms/step - loss: 0.3809 - accuracy: 0.9480 - val_loss: 0.3086 - val_accuracy: 0.9380 Epoch 2/10 195/195 [==============================] - 44s 225ms/step - loss: 0.0886 - accuracy: 0.9841 - val_loss: 3.3552e-04 - val_accuracy: 1.0000 Epoch 3/10 195/195 [==============================] - 44s 226ms/step - loss: 0.0057 - accuracy: 0.9994 - val_loss: 8.0966e-05 - val_accuracy: 1.0000 Epoch 4/10 195/195 [==============================] - 44s 225ms/step - loss: 0.0261 - accuracy: 0.9973 - val_loss: 7.7770e-05 - val_accuracy: 1.0000 Epoch 5/10 195/195 [==============================] - 44s 227ms/step - loss: 0.2320 - accuracy: 0.9852 - val_loss: 0.9320 - val_accuracy: 0.8927 Epoch 6/10 195/195 [==============================] - 44s 226ms/step - loss: 0.8674 - accuracy: 0.9140 - val_loss: 1.0456 - val_accuracy: 0.8606 Epoch 7/10 195/195 [==============================] - 44s 225ms/step - loss: 0.5202 - accuracy: 0.9499 - val_loss: 0.1205 - val_accuracy: 0.9778 Epoch 8/10 195/195 [==============================] - 44s 225ms/step - loss: 0.0771 - accuracy: 0.9881 - val_loss: 0.0017 - val_accuracy: 1.0000 Epoch 9/10 195/195 [==============================] - 44s 227ms/step - loss: 0.0554 - accuracy: 0.9912 - val_loss: 1.4377e-04 - val_accuracy: 1.0000 Epoch 10/10 195/195 [==============================] - 45s 230ms/step - loss: 1.3496e-04 - accuracy: 1.0000 - val_loss: 6.2941e-05 - val_accuracy: 1.0000
This took (7m 20s). If we plot this on graphs combined with the first 2 epochs, we'll see some very interesting behavior
acc = history.history['accuracy'] + history2.history['accuracy']
val_acc = history.history['val_accuracy'] + history2.history['val_accuracy']
loss = history.history['loss'] + history2.history['loss']
val_loss = history.history['val_loss'] + history2.history['val_loss']
epochs_range = range(20)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
There are a few epochs (12-14) in which validation accuracy hit 100% but then it dipped back down and then reached 100% again later on. This could be due to random variation since during these epochs, the test accruacy was not 100%.
As for training accuracy, it hit 100% after the full 20 epochs.
Usually, 100% training accuracy raises concerns of overfitting, but the track record of all previous models hitting 100% training, validation, and test accuracy dispels these concerns.
Let's test this model.
results = oneCNNmodelSigmoid.evaluate(test_ds, batch_size=128)
print("1 CNN Sigmoid Activaion Function model: test loss, test acc:", results)
98/98 [==============================] - 6s 66ms/step - loss: 6.7692e-05 - accuracy: 1.0000 1 CNN Sigmoid Activaion Function model: test loss, test acc: [6.769162428099662e-05, 1.0]
Again, we hit 100% accuracy, but the main takeaways here is as expected, sigmoid takes longer to run per epoch, and also seems worse than ReLU due to the vanishing gradient problem, even though our NN is not particularly deep, only a few layers.
What about softmax?
oneCNNmodelSoftmax = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='softmax'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(num_classes)
])
oneCNNmodelSoftmax.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
oneCNNmodelSoftmax.summary()
Model: "sequential_19"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_19 (Rescaling) (None, 256, 256, 3) 0
conv2d_22 (Conv2D) (None, 256, 256, 16) 448
max_pooling2d_26 (MaxPoolin (None, 128, 128, 16) 0
g2D)
flatten_19 (Flatten) (None, 262144) 0
dense_21 (Dense) (None, 24) 6291480
=================================================================
Total params: 6,291,928
Trainable params: 6,291,928
Non-trainable params: 0
_________________________________________________________________
epochs=10
history2 = oneCNNmodelSoftmax.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/10 195/195 [==============================] - 59s 301ms/step - loss: 8.0228 - accuracy: 0.4474 - val_loss: 0.5239 - val_accuracy: 0.8417 Epoch 2/10 195/195 [==============================] - 58s 300ms/step - loss: 0.3999 - accuracy: 0.8809 - val_loss: 0.3038 - val_accuracy: 0.8992 Epoch 3/10 195/195 [==============================] - 59s 303ms/step - loss: 0.2565 - accuracy: 0.9215 - val_loss: 0.0390 - val_accuracy: 0.9917 Epoch 4/10 195/195 [==============================] - 59s 304ms/step - loss: 0.4169 - accuracy: 0.9037 - val_loss: 0.0381 - val_accuracy: 0.9878 Epoch 5/10 195/195 [==============================] - 60s 307ms/step - loss: 0.0695 - accuracy: 0.9795 - val_loss: 0.0067 - val_accuracy: 0.9984 Epoch 6/10 195/195 [==============================] - 59s 302ms/step - loss: 0.0974 - accuracy: 0.9753 - val_loss: 0.9641 - val_accuracy: 0.9123 Epoch 7/10 195/195 [==============================] - 59s 302ms/step - loss: 0.1833 - accuracy: 0.9618 - val_loss: 0.6531 - val_accuracy: 0.9017 Epoch 8/10 195/195 [==============================] - 59s 303ms/step - loss: 0.2443 - accuracy: 0.9536 - val_loss: 0.0399 - val_accuracy: 0.9888 Epoch 9/10 195/195 [==============================] - 59s 305ms/step - loss: 0.0037 - accuracy: 0.9989 - val_loss: 2.1772e-04 - val_accuracy: 1.0000 Epoch 10/10 195/195 [==============================] - 59s 303ms/step - loss: 1.6798e-04 - accuracy: 1.0000 - val_loss: 1.6656e-04 - val_accuracy: 1.0000
acc = history2.history['accuracy']
val_acc = history2.history['val_accuracy']
loss = history2.history['loss']
val_loss = history2.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
results = oneCNNmodelSoftmax.evaluate(test_ds, batch_size=128)
print("1 CNN Softmax Activaion Function model: test loss, test acc:", results)
98/98 [==============================] - 6s 61ms/step - loss: 0.6823 - accuracy: 0.7958 1 CNN Softmax Activaion Function model: test loss, test acc: [0.6823172569274902, 0.7958199381828308]
Training each epoch took 1 minute, for a total of 10 minutes total. Softmax is the slowest to train per epoch, but reached 100% training and validation accuracy faster than sigmoid. However, it's testing accuracy did not hit 100%, which leads us to believe it actually overfit.
The last function we wanted to test was Tanh
oneCNNmodelTanh = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='tanh'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(num_classes)
])
oneCNNmodelTanh.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs=4
history2 = oneCNNmodelTanh.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/4 195/195 [==============================] - 43s 218ms/step - loss: 4.0354 - accuracy: 0.8374 - val_loss: 0.0111 - val_accuracy: 1.0000 Epoch 2/4 195/195 [==============================] - 43s 221ms/step - loss: 0.0087 - accuracy: 0.9989 - val_loss: 0.0037 - val_accuracy: 1.0000 Epoch 3/4 195/195 [==============================] - 43s 218ms/step - loss: 0.0031 - accuracy: 1.0000 - val_loss: 0.0025 - val_accuracy: 1.0000 Epoch 4/4 195/195 [==============================] - 42s 218ms/step - loss: 0.0021 - accuracy: 1.0000 - val_loss: 0.0017 - val_accuracy: 1.0000
acc = history2.history['accuracy']
val_acc = history2.history['val_accuracy']
loss = history2.history['loss']
val_loss = history2.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
results = oneCNNmodelTanh.evaluate(test_ds, batch_size=128)
print("1 CNN TanH Activaion Function model: test loss, test acc:", results)
98/98 [==============================] - 6s 62ms/step - loss: 0.0017 - accuracy: 1.0000 1 CNN TanH Activaion Function model: test loss, test acc: [0.0017495382344350219, 1.0]
Each epoch was 3s slower to run than ReLU, but the accuracies are similar to ReLU. I think this is due to the problem being easy, and not being complex enough to show the differences between the activation functions.
A conclusion we can make is ReLU > Tanh > Sigmoid > Softmax for our purposes. ReLU ran the fastest for correct results followed by TanH for a close second. Sigmoid took the largest amount of epochs to produce correct results and Softmax was the worst of all worlds. Took the most time per epoch and overfit, not producing 100% accuracy on the test data.
However, this is using softmax as activation function the internal neurons when really, Softmax should be used as the final layer.
num_classes = len(class_names)
oneCNNmodelSoftmaxLastLayer = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(num_classes),
layers.Activation('softmax')
])
oneCNNmodelSoftmaxLastLayer.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits = False),
metrics=['accuracy'])
epochs=4
history = oneCNNmodelSoftmaxLastLayer.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/4 195/195 [==============================] - 40s 206ms/step - loss: 5.4257 - accuracy: 0.8073 - val_loss: 0.0172 - val_accuracy: 1.0000 Epoch 2/4 195/195 [==============================] - 40s 207ms/step - loss: 0.0048 - accuracy: 1.0000 - val_loss: 9.9097e-04 - val_accuracy: 1.0000 Epoch 3/4 195/195 [==============================] - 40s 205ms/step - loss: 5.1952e-04 - accuracy: 1.0000 - val_loss: 2.8329e-04 - val_accuracy: 1.0000 Epoch 4/4 195/195 [==============================] - 40s 207ms/step - loss: 1.6582e-04 - accuracy: 1.0000 - val_loss: 1.1496e-04 - val_accuracy: 1.0000
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
results = oneCNNmodelSoftmaxLastLayer.evaluate(test_ds, batch_size=128)
print("1 CNN ReLU Softmax last layer: test loss, test acc:", results)
98/98 [==============================] - 6s 61ms/step - loss: 1.1704e-04 - accuracy: 1.0000 1 CNN ReLU Softmax last layer: test loss, test acc: [0.00011703643394866958, 1.0]
This was too easy to tell the effectiveness of the softmax. Let's try with just the dense NN
denseNN = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Flatten(),
layers.Dense(num_classes),
layers.Activation('softmax')
])
denseNN.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
epochs=18
history = denseNN.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/18 195/195 [==============================] - 4s 21ms/step - loss: 21.5245 - accuracy: 0.6572 - val_loss: 1.5884 - val_accuracy: 0.8622 Epoch 2/18 195/195 [==============================] - 4s 22ms/step - loss: 3.4056 - accuracy: 0.8543 - val_loss: 0.8680 - val_accuracy: 0.9258 Epoch 3/18 195/195 [==============================] - 4s 22ms/step - loss: 3.6604 - accuracy: 0.8828 - val_loss: 0.6956 - val_accuracy: 0.9486 Epoch 4/18 195/195 [==============================] - 4s 22ms/step - loss: 1.3850 - accuracy: 0.9305 - val_loss: 4.8154 - val_accuracy: 0.8719 Epoch 5/18 195/195 [==============================] - 4s 21ms/step - loss: 2.7831 - accuracy: 0.9101 - val_loss: 1.7505 - val_accuracy: 0.9477 Epoch 6/18 195/195 [==============================] - 4s 22ms/step - loss: 1.3164 - accuracy: 0.9480 - val_loss: 0.1320 - val_accuracy: 0.9859 Epoch 7/18 195/195 [==============================] - 4s 22ms/step - loss: 0.4695 - accuracy: 0.9742 - val_loss: 0.5156 - val_accuracy: 0.9560 Epoch 8/18 195/195 [==============================] - 4s 21ms/step - loss: 1.9492 - accuracy: 0.9483 - val_loss: 0.0594 - val_accuracy: 0.9910 Epoch 9/18 195/195 [==============================] - 4s 22ms/step - loss: 0.1845 - accuracy: 0.9902 - val_loss: 0.0811 - val_accuracy: 0.9917 Epoch 10/18 195/195 [==============================] - 4s 22ms/step - loss: 2.1161 - accuracy: 0.9411 - val_loss: 2.1613 - val_accuracy: 0.9220 Epoch 11/18 195/195 [==============================] - 4s 22ms/step - loss: 4.3883 - accuracy: 0.9204 - val_loss: 0.0855 - val_accuracy: 0.9942 Epoch 12/18 195/195 [==============================] - 4s 21ms/step - loss: 0.3662 - accuracy: 0.9851 - val_loss: 0.0062 - val_accuracy: 0.9987 Epoch 13/18 195/195 [==============================] - 4s 22ms/step - loss: 0.0378 - accuracy: 0.9982 - val_loss: 9.2258e-09 - val_accuracy: 1.0000 Epoch 14/18 195/195 [==============================] - 4s 22ms/step - loss: 0.0050 - accuracy: 0.9997 - val_loss: 9.2258e-09 - val_accuracy: 1.0000 Epoch 15/18 195/195 [==============================] - 4s 21ms/step - loss: 8.0305e-05 - accuracy: 1.0000 - val_loss: 9.2258e-09 - val_accuracy: 1.0000 Epoch 16/18 195/195 [==============================] - 4s 23ms/step - loss: 2.3532e-09 - accuracy: 1.0000 - val_loss: 9.1875e-09 - val_accuracy: 1.0000 Epoch 17/18 195/195 [==============================] - 4s 22ms/step - loss: 2.3532e-09 - accuracy: 1.0000 - val_loss: 9.1492e-09 - val_accuracy: 1.0000 Epoch 18/18 195/195 [==============================] - 4s 22ms/step - loss: 2.3149e-09 - accuracy: 1.0000 - val_loss: 9.1492e-09 - val_accuracy: 1.0000
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
With softmax, we've reached 100% accuracy at 15 epochs, which is slower than without. Softmax converts the output of the last layer to probabilties instead of one hot encoding.
results = denseNN.evaluate(test_ds, batch_size=128)
print("Dense NN Softmax last layer: test loss, test acc:", results)
98/98 [==============================] - 2s 16ms/step - loss: 7.6662e-10 - accuracy: 1.0000 Dense NN Softmax last layer: test loss, test acc: [7.666189905108922e-10, 1.0]
But still it's 100% accurate
Obviously, there is room for improvement with our project, and we see multiple ways in which we could expand the scope of it or make it more successful. For one, we could use the full Fruits 360 dataset to build our classifier, instead of using the smaller version. We are aware that utilizing more data would help us build a better and more accurate classifier, but this was too much data to run in a reasonable amount of time. For example, even the smaller data set we decided to use crashed on Google Colaboratory when we attempted to run it there, and we were forced to run it on our local machines. And when we tried to run the full Fruits 360 data set on a local machine, we saw that one epoch took 50 minutes. Due to the limited computation power of our resources, we chose to use a smaller dataset so we could run more models in a reasonable amount of time. Given more time and more powerful machines, we could use this larger dataset for more accurate and detailed results. We also see room for improvement within the dataset that could help us address some of the anomalies we noticed with our results. A lot of the pictures within the dataset are too “nice”, since they are all on white backgrounds, and every picture only has a clear picture of one single fruit. These images lack noise, and because of that, our classifier’s job is a little too easy. This could be addressed in the future by testing on pictures with multiple fruits, or messier backgrounds, so that we can actually see how successful our classifier is.