An insightful story by our very own Al Resident
Have you ever wondered what it’s like to work on your own research idea in the field of Machine Learning? How does it feel to have powerful GPUs available and brilliant people to talk to, or opportunity to focus only on one inspiring project, very good salary, modern office to stay in? Well, I have. A lot. I thought that such positions only exist in top companies with research centers like Google, Microsoft or DeepMind. It turns out I was wrong. My name is Michał Stypułkowski and I am the first AI Resident at Tooploox.
Journey
My adventure with Machine Learning started last summer. After getting my Bachelor’s degree in Mathematics, I dived into the world of AI. I met Tooploox team for the first time at PL in ML 2018 conference in Warsaw. They told me about AI Residency program and encouraged to test my skills and potential during the recruitment process. I was afraid that I wouldn’t do well in interviews as I didn’t have enough experience in ML or computer science background. Still, I knew this was a huge opportunity, so… I quit my job the next month to have time to study.
Recruitment process
First interview was mostly non-technical. I was asked about my background, my skill set, preferred programming language, Deep Learning framework, previous experience and achievements. I also had to share my research idea, which was not really specified back then. I only knew that I’d like to work on some generative models, preferably Variational Autoencoders (VAE). Overall, the atmosphere was great. Despite the lack of strong experience in ML, I was given a chance to prove my abilities.
The next stage was an assignment. The main task was to reproduce results described in ICLR paper with additional quality measure techniques. I also had to write an extended abstract of my work and optionally try to add some extension to the model. The whole assignment was very time-consuming. I mean it. I spent long hours working on my implementation and tuning hyperparameters, but in the end I was happy with my results. I had a lot of fun, because I treated it as a good opportunity to create a good project to my portfolio. Every interviewee had 2 weeks (10 working days) for the task. Tasks varied based on the individual research idea.
Finally, I was invited to the last interview. I received feedback on my task and was requested to improve my results before the meeting. I didn’t succeed. My results got a little better but they were far from perfect. The last interview was fully technical. 70% of the time I was defending my solution to the task and explaining my motivation to decisions I made. The rest of the time I was answering general questions about ML to two Tooploox researchers. Some of them were tricky, but none of them were asked to force my failure. I loved the time I spent there. It gave me a lot of pleasure and ensured that the last two weeks of grinding were worth it. It was my best interview so far.
Working as AI Researcher at Tooploox
On the first day, after onboarding meetings and the office tour, I was introduced to the team. All of the hardware was waiting on my desk: Macbook Pro, keyboard, mouse and additional two monitors. I also received SWAG package with gadgets and clothing. To the present day, I actually didn’t have to ask for anything. It was all perfect from the first day!
It is really good to have people who share your passion around at work. I didn’t realize how much until I started talking with others from AI and Computer Vision teams. The amount of knowledge and experience is almost overwhelming. It’s great to listen to people explaining their projects from a very different field than mine or to hear some stories about studying abroad. This specific scientific halo is truly motivating and gives much power to deepen your knowledge.
The main assumption of the AI Residency program is that Resident should work only on his project in cooperation with Mentor. In my case, I also didn’t have to cut ties with my alma mater. My Residency project (as my Master’s thesis) is co-supervised by dr Jan Chorowski from University of Wrocław. Tooploox Mentor is an experienced researcher who shares knowledge and clarifies ideas and solutions to appearing problems. He does not work on the code nor take crucial steps in the project. My Mentor is dr inż. Maciej Zięba, who is an assistant professor at Wrocław University of Science and Technology. I got him assigned because he is also interested in generative models. He was a perfect fit for this role and our collaboration is really fruitful. He digs deeply into my ideas and problems, which is very helpful for my thinking process. A couple of times I came up with solutions to some issues thanks to his questions and doubts. Keep in mind that Mentor is not a guy who will get your project done. 90% of the work is in fact your work.
Research
I’ve been working on my project for 3 months now. The first couple of weeks were dedicated to literature and clarification of a research idea. Diving deeper into recent publications I finally decided what path I should take. I’m currently creating a new generative model for point clouds based on a group of models called Normalizing Flows [1], [2].
There are two major groups of generative models: Variational Auto-Encoders [3] and GANs [4]. Despite a great popularity of these kinds of methods, they have some drawbacks. VAEs are trained with ELBO (Evidence Lower Bound), which is the boundary of likelihood. This gives us only the approximation of the value we wish to maximize. A new object generation isn’t that good, either. Much better results can be obtained using GANs. The problem with these methods is that we are playing a min-max game starring a generator and discriminator. This metric does not provide any useful information during the training. The training process is considered a lottery. Flow-based models are trained using the exact likelihood of the data, thus are more coherent in terms of mathematics. They are still a little underdeveloped and I think there is still much to discover and improve.
The main idea is to find invertible transformation between some simple distribution (e.g. Gaussian) – p_Z and complex one – p_X, which in this case is distribution over surface of a point cloud. We assume that there exists a bijection f such that z=f(x), where z sim p_Z and x sim p_X. We make use of change of variables formula for density functions:
begin{align*}
p_X(x) = p_Z(f(x))big{|}det frac{partial f(x)}{partial x}big{|},
end{align*}
where big{|}det frac{partial f(x)}{partial x}big{|} is the absolute value of the determinant of the Jacobian of the transformation f.
Next step is to define f. We want it to be invertible and with determinant of Jacobian, which is easy to compute. To do that, let’s assume that our data point x is D-dimensional. Let x^{(1)}=x_{1:d} and x^{(2)}=x_{d+1:D} be some partition of point dimensions. We define f in the following way:
begin{align*}
&z^{(1)} = x^{(1)}
&z^{(2)} = x^{(2)}*exp(M(z^{(1)})) + A(z^{(1)}).
end{align*}
Inverse transformation f^{-1} is given by
begin{align*}
&x^{(1)} = z^{(1)}
&x^{(2)} = (z^{(2)}- A(x^{(1)}))*exp(-M(x^{(1)})).
end{align*}
In this case big{|}det frac{partial f(x){partial x}big{|} is equal to exp(sum_{i=1}^d M(x_i)). Note that we don’t need to compute either derivatives or inverse functions to M and A. These functions can be arbitrarily complex. We choose them to be Neural Networks.
The objective to be optimized is direct likelihood of the data, or more precisely log-likelihood
begin{align*}
log p_X(x) = log p_Z(f(x)) + logbig{|}det frac{partial f(x)}{partial x}big{|}.
end{align*}
Gif below shows the training process. We feed our model with Gaussian noise and it manages to find a proper function f.
Conclusion
I love the time I’m spending at Tooploox. I’m very hyped for my research and the company creates a perfect environment for work. With such perfect conditions there’s nothing else to do but to make great contribution to AI world.
If you have any questions, don’t hesitate to leave comments below. I’ll be more than happy to answer them.
References
[1] Laurent Dinh, David Krueger, Yoshua Bengio NICE: Non-linear Independent Components Estimation. 2015[2] Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio Density estimation using Real NVP. 2017[3] Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes. 2014[4] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Generative Adversarial Networks. 2014
Read also about Augmenting AI image recognition with partial evidence