Image classification with Deep Learning
The Paris-Saclay Center for Data Science started to offer hands-on trainings for scientists who are interested in data science. The event format follows the RAMP (Rapid Analytics and Model Prototyping) approach. It consists in solving a scientific problem on real-data in one day using (and possibly learning) a given technology. The 5th event was hosted by Proto204 on October 8, and dedicated to the use of deep learning techniques for image classification.
As you can imagine, Heuritech could not miss the event ! Hence, we (Hedi and Florian) accepted the challenge. This note gives a brief insight of our contribution.
This time, the goal was to classify images of pollenating insects from the SPIPOLL (Suivi Photographique des Insectes POLLinisateurs). It is a crowdsourcing project lead by the Paris Museum of Natural History (MNHN). For this task, we were encouraged to use deep learning techniques. To speed up the process, we were able to use GPUs gracefully provided by the Université de Champagne-Ardenne ROMEO HPC Center and NVIDIA.
The sampled dataset consisted of 20348 labeled pictures of insects from 18 different species. Each picture was a 64×64 colored image. Note that the real dataset comes with more pictures, bigger images and also some extra informations (e.g. the place and the time where the picture was taken). However, for the purpose of the exercise we restricted ourselves to small pictures (as below), slightly blurred by the downsizing process.
A descriptive analysis of the dataset, kindly provided by the organizer’s team, revealed an important class imbalance. Hence, around ~28% of the dataset represents a single specie « L’Abeille mellifère » while on the opposite « Les Araignées crabes sombres » only represents ~1% of it. The color intensity of pixels was also not homogenous.
Following the RAMP approach, each participant was invited to contribute with different models, potentially updating models proposed by others, in some kind of beneficial incremental process. Eventually, all fast prototyped models were combined into a single hybrid model to reach an overall higher score. Unfortunately, we do not have any information on how the different models were mixed together to form this hybrid model.
The different models submitted during the day were evaluated on two criteria:
- prediction score : how well the model was able to predict labels on an unknown test set (percentage of correct classification). A classifier that would always predict « L’abeille mellifère » would have a score of ~0.3 on this test test, providing us with some baseline.
- contributivity : how much the model contributed to the prediction capacities of the global hybrid model ? (a percentage of contribution).
For a better distinction between prediction scores and contributivities, we use 0.x notation for the former and percentages for the latter.
The initial models (provided mainly by the organizers as a basis on which to iterate) reached a score of 0.31 (i.e. correctly classifying 31% of the pictures of the test set). At the end of the day, the best model reached a score of 0.71. Interestingly, its contribution is only of 30%, that is, some other models having a different approach, while reaching a lower score (0.68, 0.66, 0.65) significantly contribute to the hybrid model (15%, 17%, 10% respectively).
While the prediction score is independent of other models, the contributivity of a model varies with the addition of new models. For example, the second best model reaches a score of 0.68 and, while it started with a contributivity of 37%, its current contributivy is now of 2%. The reason is that the best model is a slight improvement of the former, which makes the previous version kind of « useless » for the hybrid model. In the remaining of this post, we therefore mainly focus on the prediction score.
Without further ado, let us discuss our most successful model. Did I forget to mention that the best model is actually ours ? 🙂 Note that, following the RAMP approach, this model is the result of a succession of small improvements made on top of other participants’ contributions. We did not reach a prediction score of 0.71 in one shot, but after applying several tricks and manually tuning some parameters.
The model is available on github: https://github.com/deleron/pollenating_insects_ramp.
Following a classical approach in image processing with deep learning, we use a deep neural network composed of a succession of convolutional and max-pooling layers, followed by several dense layers (cf. diagram below). Its architecture is mainly inspired by the model of another participant (Yousra B) with a prediction score of 0.65.
The previous version of the model without dropout layers reached a prediction score of 0.68, but had a tendency to overfit. Hence, a first regularization is achieved using dropout after the first dense layer. By lack of time, we did not try multiple dropout layers, nor different dropout parameters. More time would definitely have allowed us to reach a better prediction score. Note that older models used too much dropout layers and were definitely underfitting (i.e. reaching a prediction score on the train set equals to those of the test set). Hence, tuning is the key.
The weights of the network are optimized using mini-batch AdaGrad optimization method, with a mini-batch size of 128. Taking inspiration in the work of others, prior to feeding mini-batches to the network, some pictures in the mini-batch are randomly flipped (with a probability of 0.25). This is a classical trick to force the network to be invariant to such transformations. A second trick is to randomly shut down half of the pixels of the picture, forbidding the network always to rely on the same pixels.
As the number of pictures of the training set is not that big, a trick consists in augmenting the size of the dataset by applying several transformations (translation, rotations, flips, stretching,…) to pictures of the original dataset. An exploratory analysis of the data showed us that insects, while always in the center of pictures, could be in all kind of positions in the pictures. Hence, before training, the size of the dataset is quadrupled by having each image rotated by 0°, 30°, 90, 135°. It makes our model more robust to rotations but also considerably slows training.
One of the first submitted model to reach an interesting score mainly consisted in applying the reference model provided by the organizers on pre-processed data. It was noted that as the insects were always centered, cropping the pictures could actually improve performance. We would like to interpret it as if it allowed the network to focus on the insects rather than the surrounding. While an interesting idea, the surrounding might actually be of some importance. In our last model, rather than cropping the pictures, we preprocess the data by applying a gaussian window on the images, flattening the intensity of border pixels, making them less salients (but still usable) than those in the center. It might actually be possible to learn an even smarter window (not necessarily gaussian) by having some extra full convolution layer in the model just after input.
To sum up, our model is a mix of state-of-the art techniques, data-scientist and domain-specific tricks. More time would definitely have allowed us to improve its performance, but that was not part of the deal 🙂 For those interested in pursuing the challenge, here are some suggestions for improvements: meta-parameters optimization, multi-modalities, plugging a pretrained network (such as VGG or GoogleNet) on top of the current architecture…
To conclude, we thanks the organizers for such a thrilling event. This was a really nice opportunity to show the interest and power of applying deep learning to all kind of interesting problems and domains. It is certain that with a bit more time, taking into account our suggestions, and by using the real sized pictures in addition to the other data of the dataset, one may achieve a pretty good prediction score. We hope that our contribution will help the people of MNHN to achieve interesting research.