I would like to present an interesting way to dive into deep learning, without a need of involving powerful machines and long time spent on neural networks training. Convolutional Neural Networks offer an extraordinary performance for feature detection. Once trained, they might be reused for different applications – for example, for the fascinating field of autonomous driving. Here, an already trained model is going to be reused for lane detection in different settings than original.
Deep Learning is a buzzword that has recently gained a big popularity and has been used in numbers of recent articles, blog posts and conferences. Not it is only a seasonal trend, but also a powerful tool, which brings machine learning as close to the original meaning of artificial intelligence, as never before. A very intuitive explanation of this approach might be found in the “Deep Learning” book by Ian Goodfellow, Joshua Bengio and Aaron Courville:
“this solution allows computers to learn from experience and understand the world in terms of hierarchy of concepts, with each concept defined in terms of its relation to simpler concepts (…) without the need for human operators to formally specify all of the knowledge the computer needs.”
The popularity of deep learning has exploded in recent years mainly due to advancement made by computational technology and appearance of large, widely available datasets. Taking the benefit of parallelization and processing speed of modern GPUs, the time required for training a large network has been reduced from weeks, to days, or even hours. Secondly labelled datasets, like ImageNet, allow to start experimenting straight away without a need for a long process of data collection.
Convolutional neural networks
An interesting example of deep learning is a convolutional neural network (CNN), which architecture is inspired by the cell arrangement in animals visual cortex. Therefore the main application of such networks is image recognition. What distinguish this architecture from the others, are local receptive field, shared weights and pooling. If you aren’t an expert in that matter of convolutional networks or mentioned concepts don’t sound familiar to you, I encourage you to take a look at one of many available resources before proceeding to further part of the article.
Complex convolutional architectures like AlexNet require a significant time for training and large datasets. However, it is not always necessary – if handled problem it is similar to the one, which original networks was taught to solve, existing models might be reused. One way of doing this is to treat a trained CNN as a feature detector – the output of the last hidden layer (learned features) might be used as an input to a new, different classifier.
There is a rich database of already pre-trained models, called Model Zoo. Although the models presented there were trained to handle particular tasks, they might be reused in the mentioned way and obtain astounding performance.
Lane detection issue
In my previous article, I have described a couple of ways to start an adventure with autonomous driving. In this article I would like to focus on the DeepDriving project, which involves convolutional neural network trained in a virtual environment of a computer game. Unfortunately the in-game road settings try to resemble american road conditions so the left edge line is yellow. Therefore the original model can’t be used on european roads straight away and it needs some extra work.
In order to take a closer look at this problem, I modified Torcs’ environment so I could run tests with different line settings. I prepared two additional textures – the first one contains european line settings (all lines are white), the second one does not contain the edge lines at all.
As seen on the video below, the system indeed fails to detect correctly the position of the car.
Fixing the issue
Training the deep learning network from scratch with examples of different edge line color or no edge lines, would most probably fix lane detection issue. However it would require collecting a new dataset and training the network against it – both solutions require time and resources.
Rather than that, I used Torcs to collect a relatively small dataset, where an image was taken from car’s forward mounted camera while driving one lap on each of the three lanes on 3 different tracks. It was repeated for three different line settings presented on the image above. Finally, I have collected three datasets, each containing around 11k images and lane number (0, 1, 2) associated with each image.
The next step was to use original network as a feature extractor. I passed each of the images from the previous step into the neural net, but instead of taking the final output, I have collected the output of the last hidden layer (CNN codes) and stored it as the input for new a classifier. And again, it gave me 3 datasets, each representing different line settings.
Finally I trained three different SVMs, each on a single dataset from the previous step. I called them yellow, white and no-line SVM. Each of them was taught to “detect” the lane number, taking as an input the CNN codes. Those models were then evaluated against remaining data collections. Below, I present accuracy obtained by each classifier.
The results I obtained seem to confirm my worries. The yellow edge line dataset, makes the model fit well training conditions, but does not generalize well to others. The same happens when trained with white edge lines. When analysing the confusion matrices, with predictions made by the yellow SVM for white and no-lines datasets (see below), we observe, that the left lane is classified as the middle lane in all the cases for the white-dataset, or left/middle for no-lines dataset.
|Confusion matrix: Yellow SVM vs White-dataset
|Confusion matrix: Yellow SVM vs No-line-dataset
On the other hand the SVMs trained on no-lines dataset offers great performance in all types of conditions. It is rather expected as in this case the classifier seem to learn to distinguish road edges, instead of lines. However it may also require to be trained with a more diverse dataset, containing different types of roads surroundings (sand, grass, concrete, etc), if we consider it to be useful in real world problems.
Going from game to reality
In the previous paragraph, I presented how to use CNN as a feature extractor for evaluation of different lane settings. Now I would like to present the results I obtained, when using mentioned approach to a real problem – driving on a highway near Wroclaw.
In the video I show the result of our SVM built on top of CNN, which was trained using white lane examples collected while driving in Torcs. Obviously the video is speeded up and I did not get a ticket for speeding.
Other ways of reusing existing networks
Other approach that makes it possible to reuse existing model is fine tunning. It requires using a new classifier and training the remaining part of the network with a new data. It also introduces the necessity to collect a large dataset to prevent the model from overfitting.
We may also consider a net surgery, that requires a really careful network weights analyses and manual change of the weights. However it is a very tricky approach that becomes especially hard in case of deep architectures.
As next steps, I could integrate our SVM into the original system and adjust it to european conditions. Sounds like an idea for the next article ;)
Even if deep learning requires big computational power, a user of an average class laptop may still benefit from it. There is no need to train a model from scratch every time we need to fit the model to slightly different dataset, instead we can use one of the mentioned techniques. That’s an important remark as experiments with different deep learning models might be quite costly, If you operate on a short deadline. We know that from our own experience ;)
Graphics by Ania Langiewicz