MOOC Reflection: convolutional neural networks in TensorFlow

I have decided to jump on this second, subsequent, course from the DeepLearning.AI TensorFlow Developer series (see the course page ). This course should focus on more realistic datasets and encourage me as a learner to do work on regular tasks of ML work. Such tasks aren't what we might think of as part of machine learning. They don’t seem to be as ‘hot’ but I personally give value them a lot. It’s the routine work such as data manipulation, cleaning and visualisation that I'm not used to right now. Playing with neural nets in a super-prepared sandbox is nice, but very distant from the everyday work in ML. I still think the course will be very lightweight when it comes to “doing the real work”. After all, it's the purpose of the course to be beginner friendly and allow progression. I hope that I will see and practice some of those habits that ML practitioners must take as a matter of course.

Week 1

This week was about exploring the larger dataset and it was heading towards implementing a neural net to classify whether there is a cat or a dog in a picture, based on 25k training images. I don’t have much to say about the architecture and training of the neural net. I used architecture very close to my previous assignment and it pretty well matched the desired accuracy on validation data. The training time was very long compared to other assignments. The model was slightly overfitting, but I assume the task was designed in a way so that I can learn how to avoid overfitting by manipulating the training images to get a more variable training data set. As I mentioned in motivation, this week also focused on some data manipulation techniques, so I worked on that and made sure I understand everything.

Week 2

This week was all about possible tensorflow’s built-in image augmentation methods. Image augmentation is a good way to simulate more training examples to avoid overfitting. It basically simulates different variations of the images by slightly skewing them, performing rotation, zooming in them, cutting some parts and so on. The tradeoff is that it requires more computing power and the training takes more time as the neural network trains on more data. I would really appreciate a more detailed explanation of how the augmentation works so I could imagine how it affects the actual training and results.

This week’s final assignment focused on classifying cats and dogs by utilizing a Kaggle dataset from 9 years ago and using all the techniques shown in the course series. I chose the wrong activation function on the output layer which led to weird training results. This was the first time in the MOOC that I had to do some debugging and find the error in my code. The debugging alone was ok, but the training time spent during the debugging was too much! As I'm writing this reflection, I'm running another experiment on the dataset which is already executing for 40 minutes and it will take at least 30 more.

I should note that by the start of this week, I hit the limit for the monthly usage of the Google Colab interactive notebooks application I use in the course. Since I certainly didn’t feel like setting up TensorFlow on my machine this time, I was forced to buy the pro version of Colab to continue in the course. It’s fine because I really like the product. Let's see if I can utilize the paid compute power during January :).

Week 3

The third week introduced the concept of transfer learning. Transfer learning stands for model sharing so we can reuse existing pre-trained models in our own projects. I already met with transfer learning when reading machine learning papers so I knew people reuse and compose multiple existing models together into one model to deal with their specific task. Sadly, I never tried it until now.

This week contained only one interactive notebook example that reuses the Inception Model which is really deep and contains more than 20M parameters to classify objects in images. Using the layers of a pre-trained model means that we need to cut off the parts of a model we want to reuse and build our own initial model “skeleton” to which we can then set the weights from the pre-trained model. The example also showed how to avoid training on those pre-trained parts so we can retrain the final model on our own dataset without touching all the pre-trained parameters. The notebook was well described because I immediately grasped how the construction of the model works from it. The pre-trained model was trained to classify 1000 object classes but in the example, we used it to classify only two classes. At first, it was quite hard to imagine for me, how exactly the model composition helps us deal with different scenarios like this one. But it clicked in me after second thought - we only extract the hidden layers without the output layer, so the model is able to distinguish 1000 classes of objects on the image but we force it to only focus on two of them. The final assignment was about reimplementing the model from the example notebook. It was quick.

In the meantime, I explored some corners oftTensorflow API and documentation above the course materials:

I read about tensorflow functional API that allows for non-linear model composition.
I also found the Tensorflow Hub project which is an open repository of models for transfer learning. We can find collections of models for different kinds of domains. I would like to take some of those models and use them in my own study project.
I skimmed two relevant research papers: “Rethinking the Inception Architecture for Computer Vision” which was about the model used in this week’s course example, and “U-Net: Convolutional Networks for Biomedical Image Segmentation” which introduces the model I would like to implement myself after finishing the course.

Week 4

The last week was all about building a multiclass classification model. There was only one workbook which introduced the concept on a dataset of photos of hands playing a rock/paper/scissors game. Then, the final assignment was a slight variation of the task in the previous workbook. Instead of classifying 3 types of images, I had to classify letters based on photos of sign language photos from the language MNIST dataset.

As always, the assignment was well structured so it was pretty easy to jump into it. However, this assignment was different from the previous ones, because the dataset was provided in CSV format that had to be parsed in the right way. This also affected the machine learning model compilation step, because the letter labels of the photos of hands were provided as integers instead of binary vectors. It took me a while until I figured out that I have to use sparse\_categorical\_crossentropy loss for this to work as expected. Once I debugged the model, to be, at least able to run the training, I had to iterate the model to meet the criteria of at least.99 training accuracy and .95 validation accuracy. Below is a summary of the iterations I made and my reasoning behind them.

At first, I tried running the model with the same parameters as the model to classify three types of images just to see how it performed. It performed poorly of course.
I added more convolutions and reconfigured the image data generator to use a bigger rotation range and smaller zoom range as I expect the dataset photos to be pretty similar when it comes to zoom.
The training accuracy is now about.92 and the validation accuracy is .96. Im thinking I might add one more hidden layer to make the network more complex to handle the classes.
After tweaking the hidden layers, the accuracies are still ~.94. First, Im just going to double the hidden layers' number of neurons to see what happens.
The results are still the same! I decided to change the dense layers to 2048, 4096 and 2048 now. If this doesn't work, I'm going to revise the image data generator to produce less variable training examples because the fact that the validation accuracy is pretty high and training accuracy is stuck between.9 and .94 probably means I provided the model with training examples that are too different.
Accuracies are now about.98. Good, let's simplify the convolutions and change the order of hidden layers to see what happens. The results are promising, so I simplified the model again. I removed one hidden layer and simplified image augmentation.
The model now meets the required criteria. In fact, the validation accuracy is about.99.

Conclusion

The second course from the TensorFlow series was fun. I think I learned that when creating ML models, I need to work with the library documentation to understand the techniques more deeply. When it comes to working with data, I was still working a lot with prepared datasets in a well-structured environment of workbooks and assignments. The data were provided in various formats, and I encountered a few debugging situations, but I feel I'm still too separated from real-world ML work scenarios. As a next step in my ML journey, I would like to implement an algorithm straight from a research paper for the first time.