MOOC reflection: introduction to tensorflow
personal goal/motto: write reflections as quickly as possible and leave them unpolished. sorry for the grammar.
I decided to choose an introductory course on TensorFlow (a framework for machine learning) because I never used a framework for such a task. Most of my experience with learning machine learning comes from another course from Coursera, in which I tried writing simple neural nets in Octave, the open-source alternative to Matlab. Its been quite a long time since I experimented with ML, and I feel 1) that machine learning frameworks and tooling is becoming more mature and user-friendly, also the field of ML ops is one of the most interesting things for me lately, 2) I miss implementing the algorithms and would like to implement some algorithms from popular papers that have been recommended to me by folks from the community. The first week was, to my surprise, much friendlier and easier than I expected. It was nice to see how easy it can be to set up a simple neural network using the library and how different optimizers and loss functions are already built in the lib and are ready to choose from. I am looking forward to the next week, hopefully, it will be a little more in-depth.
Finished under an hour.
The second week introduced common activation functions (relu, and softmax) and the concept of using callbacks to basically hook into various events that may occur during the training of the machine learning model, so I would say it basically allows us to decide what code we want to execute eg. when the training ends, or when a training epoch ends so we can end the training when hit desired threshold (this is the most commonly used case). The concepts were shown in the example of mnist training dataset (collection of pictures of clothing with their corresponding labels). I spend about an hour tinkering and playing with the example to understand what the code meant.
The final assignment was to define a simple model that will predict numbers based on their 28x28 pixel pictures. The assignment seemed quite easy but made me use the tensorflow documentation. It can be solved using the course materials solely but I am taking this MOOC as an opportunity to get started with the library hands-on.
I’m scared by the simplicity of the course because the assignments are designed in such a way which lets me focus on small pieces and problems. Whereas I feel and know that in practice most of the work I would need to get done before thinking about how to compose a machine learning model would consist of manipulating the data, finding ways to look at them and make sense of them in the first place.
This week took me about 2:35 because I took a break and had to get into the course again.
This MOOC’s week took me substantially more time to finish. The primary reasons are that I haven't worked on it in a single learning session but I interacted with the course during three evenings.
The main presented concepts were convolutions and pooling techniques for neural networks. I like to think of them as image compression that highlights the most distinguishable features of an image while reducing the overall information in the picture in terms of raw data so the learning of neural networks gets more accurate.
The convolution is actually an image filter based on a moving matrix that gets applied to the image. For example, Instagram got famous using these primitive filters at the beginning that enabled people to make their images look artsy even though they were taken on lousy phone cameras. In neural nets, the convolution is usually used to highlight features of an image like in the following example from wikipedia:
Thanks to these filters, the neural nets deal with much simpler pictures than the original which is good for some applications. Image pooling is then a way to compress the image in terms of its size while making the highlighted parts even more significant. In information science slang, such compression actually increases the information in the image for the neural network that has to learn what features of the image are the most important and weigh them respectfully.
Learning these concepts wasn't hard as I had some prior knowledge of them (of course I could always dig deeper into them and only discover that I don‘t understand them in detail) but it took me time to practice applying them on two datasets as it was recommended by the MOOC’s teachers before starting to work on the final assignment which was easy after all the work. So I was quite grateful that they said we should devote about an hour or more to going through the examples and playing with them. Having some timely threshold helped me understand that I should not just scan the examples but really try some experiments, and tweak them etc which I enjoyed. It was also very interesting to see that not all activities in the MOOC have to be structured. Sometimes getting a few hits and suggestions can be even more interesting for me as a learner and I spend quite a lot of time on them.
Some of my experiments were based on totally not understanding what‘s going on under the hood (lol), and some were based on more thoughtful predictions of mine, here are some of the experiments I made. They are described very poorly but helped me during the experimenting phase!
One thing I hated about the experiments is their speed. Training neural nets took me up to 10 mins which is enough time for me to get bored, browse GitHub or try to do something else. So another practical thing I tried was running the data science notebooks on my personal computer instead of Deepnote cloud since I believed my GPU should support the Tensorflow library natively and it should run with better performance than the free instances in the cloud. I wasn't able to run the code on my computer after 2+ hours of trying to configure it so I consider it a fail for now, but I will try it again anyway. Python dependency hell is a known thing and lost myself in it…
This week took me about 6 hours.
The last week focused on fetching the datasets based on file system structure (so that images get labels based on the folders we put them in) and training in batches. It turns out training in batches in reasonable size can be more efficient, but it also requires some tweaks to optimization.
As in the previous weeks, I spent as much time as I could on the hands-on examples and tried to play with them as I was encouraged, to get used to the framework. I took my time to explore Tensorflow’s documentation again. I focused on making sure that I really understand what I’m doing, what methods of the framework I use and what they do according to the documentation so now I feel a little bit more prepared to play with the framework on my own. During the experiments, I realised one thing. The hardest part of these courses is that the training of the model takes too much time. I hated waiting, so I tried to use the time productively. I made a decent amount of notes, reasoned about the experiment and thought about my next steps.
Finished in 6 hours.
What I enjoyed about this course is that I got to use TensorFlow on datasets that were fun very quickly. There is some limitation to my ability to use it on real data right now because all the datasets in the course were selected for the course to be just right for the learners. I have a good sense of a Tensorflow project structure and I can see its value proposition when it comes to developer experience. I can now compose very simple neural nets using the library, however, the course didn’t touch many aspects of ML workflow. If I was asked to design a neural network in real life scenario, I would make some guesses with very little confidence. The course does not deal with managing training data, selecting the right optimizers or loss functions and many more things. I guess there are only a few mentions about the internal stuff to avoid information overload in the students learning. When there were some links to complementary materials, I got through them instantly. I would likely recommend the course to anyone who wants to get a sense of what a simple machine-learning project looks like while also getting some hands-on experience with the TensorFlow library.