getting your hands dirty with artificial paint
When we imagine a truly intelligent system from the future of science fiction, the things it is capable of doing don’t usually include high precision counting road traffic and recognizing cats and dogs from images. We dream (or are afraid) of AI that is able to reason, have personal opinions, and perhaps even feel a range of emotions.
Feelings and sensitivity, as well as aesthetic sensibility, are some of the areas that we believe make us different from machines. While such capabilities in AI still lay in the unforeseeable future, there is already a growing number of artists and researchers that use machine learning as a tool for artistic exploration.
The unpredictability of a deep artificial networks’ training process makes it somehow similar to watercolor painting. You cannot exactly predict where the paint will flow as well as you cannot predict the exact path of gradient in the n-dimensional space.
With my background in photography as well as artificial intelligence, I was more than excited about this new trend and eager to create some works of my own. In this blog post, I will outline some background on the current state of the AI Art scene as well as share experiences from my personal project of synthesizing analog photographs with machine learning.
The Hype is Real
There are already some established artists creating amazing works in the AI Art scene. Some of the names that I recommend checking out include Sofia Crespo (who works with biology-inspired technologies), Scott Eaton (who explores the representation of human figures through drawing), Robbie Barrat (who works on the boundaries of neural networks in the traditional art world), and Refik Anadol (who creates large scale interactive AI projects).
There is also a growing interest in the creative applications of AI at major scientific conferences, such as NeurIPS or SIGGRAPH that host dedicated workshops on the topic, such as Machine Learning for Creativity and Design.
When it comes to the artistic toolbox, the key player is definitely the Generative Adversarial Network (GAN) which is used to generate new synthetic images by mimicking the original distribution of visual works. The learning process is somewhat imperfect and the mistakes that the network makes while learning the visual representations of our world forge new grounds for creative and artistic explorations.
Getting Creative With Generative Adversarial Networks
The development of GAN capabilities has gained pace in the last few years and the synthetic images have drastically improved in quality. It is now possible to generate images in high resolution with greater attention to detail, something that is crucial for visual exhibits. Nobody wants to look at small pixelated thumbnails in the actual gallery.
Furthermore, just a couple of years ago, amazing high-quality images generated by BigGAN or StyleGAN were only made possible with tons of training data (e.g 70 000 images) and powerful computational resources (such as the ones used by researchers at Google or Nvidia). This is no longer the case. For with the introduction of such techniques as e.g adaptive discriminator augmentation, we may now train powerful models with as few as 2000 images.
Transferring The Style
There are also other tools used for creative applications. One of the earliest examples, and a very popular one, is the neural style transfer, and it is still used quite often and has been implemented in some easy to use applications available to everyone. Other researchers, e.g Ahmed Elgammal, have developed their own custom solutions and even try to answer the question of whether AI can be creative on its own without human intervention.
The main problem with designing networks that optimize for artistic value and creativity is, of course, the definition of these concepts and the problem with defining an appropriate loss function. While many have tried, so far none have been able to say exactly how one is to calculate creativity. Nonetheless, there exist some particularly interesting approximations.
The process of using AI for creative applications involves a lot of human-in-the-loop interaction at all stages. First of all, there are the usual problems that come with training generative models, such as mode collapse or overfitting. These are usually tackled with some standard solutions, such as using a lot of dropout and other regularization techniques.
Art And Data
When using smaller data sets it also makes sense to use smaller architectures with fewer parameters. Transfer learning is also quite popular as a tool for artistic exploration. Some artists confess that they never start training from scratch with random model weights but rather take a model trained on a previous project.
Another very important aspect that needs to be carefully and manually curated is the preparation of the data set. Neural networks are very good at capturing the most prominent aspects of the data distribution.
While for other tasks, such as classification or segmentation, it does not hurt if the training data is a bit noisy, for creative applications you need to make sure that all of your training images are clean and aesthetically pleasing. If you are using any automated data gathering, for example scraping Van Gogh’s portraits from the internet, it might be a good idea to first run the data through a classification model and filter out images with low category confidence.
For smaller datasets, you might even go through the whole dataset manually and check for any non-relevant or low-quality images. One important thing to remember is to never leave any ambiguity in the data, as it might lead to some unexpected outcomes. Unless, of course, you are looking for unexpected outcomes. As they say, in art and creativity, all rules are made to be broken.
Many AI artists approach the data collection step as an artistic process in itself and create beautiful datasets that can be exhibited on their own. For example, Anna Ridler’s Myriad of tulips was first exhibited as an installation of thousands of hand-labeled photographs forming a dataset of unique tulips. Other artists, who work with different mediums such as paintings, sculptures, or doodles, use photography to capture their works of art and discover new works influenced by their unique style. Creating your own training data has many advantages, especially the power of total control on what is fed into the model.
The First Experiment: Artificial Analog
My first camera was my grandfather’s LOMO LC-A film camera (which is actually famous for starting a whole Lomography movement) and I still regularly buy rolls of film and shoot analog images. Analog photography is famous for its feel, which is impossible to replicate with digital cameras – the grain, the colors, the mood.
After seeing some high-quality results produced by StyleGAN2 architecture, I realized that it is a great tool to experiment with recreating and modifying the fabric of analog images. The latest implementation by the StyleGAN2 creators at Nvidia was introduced in October 2020 and is especially useful for generating high-quality images from a limited set of training data, thanks to the use of adaptive discriminator augmentation.
The Non-Leaky Augmentation
This technique works by applying data augmentation in a non-leaky manner to all the images that the critic sees – both real and generated images. What is more, the rate of augmentation is chosen in an adaptive manner during training so that the regularization is inversely proportional to the amount of overfitting.
In the case of analog images, the training data is scarce and literally expensive. For the first training run, I gathered together all of my scanned negatives and grouped them into some categories. While there was a lot of variety in the images and they contained a lot of details related to a bunch of different topics, I decided to focus on the categories that had the least ambiguity in them and to proceed with photographs falling into the broadly understood category of night photography.
I trained the model on random square crops of images that contained out of context shapes and colors. My goal was to experiment with the form and I was happy to notice the emergence of a texture that resembled a scanned negative.
Another thing that the model was quick to catch onto from my analog images were bright colors that would intersect in a smooth manner, almost watercolor-like, and form exciting abstract patterns.
And some very weird shapes…
These Humans Did Not Feel
After the initial experiments and after seeing some more artificially generated human faces, I started wondering about the influence that traditional photography has on us as viewers.
We are accustomed to taking photography for granted and usually do not even question the reality of the events presented, even in the age of digital post-production. One of the goals of photography is to affect the spectator by displaying a story or a particular problem, hence there is often an emotional charge related to seeing photographs.
After the initial experiments and after seeing some more artificially generated human faces, I started wondering about the influence that traditional photography has on us as viewers.
We are accustomed to taking photography for granted and usually do not even question the reality of the events presented, even in the age of digital post-production. One of the goals of photography is to affect the spectator by displaying a story or a particular problem, hence there is often an emotional charge related to seeing photographs.
How I Did It
I decided to generate artificial human faces displaying a range of emotions from a dataset of old photographs. The data was collected from The Metropolitan Museum of Art, using their public API and downloading images that are under open license for unrestricted use.
The first step was to extract the faces from old photographs and perform some data cleaning (e.g remove faces that are too small or of too low quality). I also used a pre-trained deep neural network to choose the images that scored the highest for the expressed emotions.
The faces that were generated after training carry some uncanny resemblance to actual human beings and the emotions that can be read from their faces seem genuine and have the potential to affect the viewer. You may see the results below: