MOOC Reflection: convolutional neural networks in TensorFlow
I have decided to jump on this second, subsequent, course from the DeepLearning.AI TensorFlow Developer series (see the course page ). This course should focus on more realistic datasets and encourage me as a learner to do work on regular tasks of ML work. Such tasks aren't what we might think of as part of machine learning. They don’t seem to be as ‘hot’ but I personally give value them a lot. It’s the routine work such as data manipulation, cleaning and visualisation that I'm not used to right now. Playing with neural nets in a super-prepared sandbox is nice, but very distant from the everyday work in ML. I still think the course will be very lightweight when it comes to “doing the real work”. After all, it's the purpose of the course to be beginner friendly and allow progression. I hope that I will see and practice some of those habits that ML practitioners must take as a matter of course.
This week was about exploring the larger dataset and it was heading towards implementing a neural net to classify whether there is a cat or a dog in a picture, based on 25k training images. I don’t have much to say about the architecture and training of the neural net. I used architecture very close to my previous assignment and it pretty well matched the desired accuracy on validation data. The training time was very long compared to other assignments. The model was slightly overfitting, but I assume the task was designed in a way so that I can learn how to avoid overfitting by manipulating the training images to get a more variable training data set. As I mentioned in motivation, this week also focused on some data manipulation techniques, so I worked on that and made sure I understand everything.
This week was all about possible tensorflow’s built-in image augmentation methods. Image augmentation is a good way to simulate more training examples to avoid overfitting. It basically simulates different variations of the images by slightly skewing them, performing rotation, zooming in them, cutting some parts and so on. The tradeoff is that it requires more computing power and the training takes more time as the neural network trains on more data. I would really appreciate a more detailed explanation of how the augmentation works so I could imagine how it affects the actual training and results.
This week’s final assignment focused on classifying cats and dogs by utilizing a Kaggle dataset from 9 years ago and using all the techniques shown in the course series. I chose the wrong activation function on the output layer which led to weird training results. This was the first time in the MOOC that I had to do some debugging and find the error in my code. The debugging alone was ok, but the training time spent during the debugging was too much! As I'm writing this reflection, I'm running another experiment on the dataset which is already executing for 40 minutes and it will take at least 30 more.
I should note that by the start of this week, I hit the limit for the monthly usage of the Google Colab interactive notebooks application I use in the course. Since I certainly didn’t feel like setting up TensorFlow on my machine this time, I was forced to buy the pro version of Colab to continue in the course. It’s fine because I really like the product. Let's see if I can utilize the paid compute power during January :).
The third week introduced the concept of transfer learning. Transfer learning stands for model sharing so we can reuse existing pre-trained models in our own projects. I already met with transfer learning when reading machine learning papers so I knew people reuse and compose multiple existing models together into one model to deal with their specific task. Sadly, I never tried it until now.
This week contained only one interactive notebook example that reuses the Inception Model which is really deep and contains more than 20M parameters to classify objects in images. Using the layers of a pre-trained model means that we need to cut off the parts of a model we want to reuse and build our own initial model “skeleton” to which we can then set the weights from the pre-trained model. The example also showed how to avoid training on those pre-trained parts so we can retrain the final model on our own dataset without touching all the pre-trained parameters. The notebook was well described because I immediately grasped how the construction of the model works from it. The pre-trained model was trained to classify 1000 object classes but in the example, we used it to classify only two classes. At first, it was quite hard to imagine for me, how exactly the model composition helps us deal with different scenarios like this one. But it clicked in me after second thought - we only extract the hidden layers without the output layer, so the model is able to distinguish 1000 classes of objects on the image but we force it to only focus on two of them. The final assignment was about reimplementing the model from the example notebook. It was quick.
In the meantime, I explored some corners oftTensorflow API and documentation above the course materials:
- I read about tensorflow functional API that allows for non-linear model composition.
- I also found the Tensorflow Hub project which is an open repository of models for transfer learning. We can find collections of models for different kinds of domains. I would like to take some of those models and use them in my own study project.
- I skimmed two relevant research papers: “Rethinking the Inception Architecture for Computer Vision” which was about the model used in this week’s course example, and “U-Net: Convolutional Networks for Biomedical Image Segmentation” which introduces the model I would like to implement myself after finishing the course.