The dawn of machine learning is upon us and we already find ourselves walking among humanoid robots. The exponential advancement in machine learning applications will lead to more sophisticated language models and simulation programs which will render a digital reality. Although we are still far from an artificial intelligence that is self-aware, the progress is rapidly approaching a machine singularity, and some important decisions will need to made to determine the fate of interactions between humans and machines. The very nature of the AI open source development is leading to exponential improvements in the algorithms and techniques that will set a new precedent and shape our future. Although I sorely want to delve into the potential perils and delights of this automation, I’ll divert a bit and talk about using machine learning in a rather benign simulation application like creating art such as anime.
The brain is the ultimate natural neural network with neurons and activation functions which has been trained over millenia to make us so uniquely human. However, those same structures can be re-created artificially to learn a vast majority of things such as language, drawing or even playing chess. Some examples of artificial neural nets include convolutional, recurrent neural nets and long short-term memory or LSTM models. I won’t go into the details, but the various layers, nodes, and activation functions try to emulate some of the functions that the brain performs to retain information and learn.
Luckily some of the world’s best researchers are improving and making available some of the structures and pre-trained nets that you can use to build upon. I’ll go through an example of how you can use some of these models to train and create anime drawings.
In order to create the images you will need to start with a training set which has anime figures that you would like the net to generate. I chose anime faces to keep the complexity low. You will want to resize the images to the same size for example 64×64 pixels. There are many packages in python like PIL that can do this for you.
The code I created above will read the images from the directory, resize them and save them into an array which can later be used to train the network.
Next, we need to find the appropriate deep learning model that can learn how to tell the difference between anime pictures and be able to retain that information. For this particular application, we use generative-adversarial networks or GANs.
What if you could take random noise and shape that into images of anime pictures? That is exactly what a GAN can be trained to do. It does this with discriminator and generator networks that perform different functions. The discriminator is fed the training images and tries to learn the qualities of what makes those pictures similar or “anime” like. The generator’s job is to try and fool the discriminator. It passes random noise into a fake image to see if it can trick the discriminator into thinking it is an anime picture. You can train over multiple iterations until you feel that your discriminator is doing a decent job at telling the difference between real and fake images. Overall, the topology or architecture used is illustrated below rather simplistically:
I’ve included some code that you can use to train the discriminator and generator. The Keras activation functions, layers and convolutional neural nets are used for the back-end.
I’m including some sample code that can be used to develop your GAN and view the topology:
In the paper that introduced GANs, the generator tries to minimize the following function while the discriminator tries to maximize it:
Ex[log(D(x))]+Ez[log(1−D(G(z)))]
D(x) is the discriminator’s estimate of the probability that real data instance x is real.
Ex is the expected value over all real data instances.
G(z) is the generator’s output when given noise z.
D(G(z)) is the discriminator’s estimate of the probability that a fake instance is real.
Ez is the expected value over all random inputs to the generator (in effect, the expected value over all generated fake instances G(z)).
The formula derives from the cross-entropy between the real and generated distributions.
A stable GAN will have a discriminator loss around 0.5, typically between 0.5 and maybe as high as 0.7 or 0.8. The generator loss is typically higher and may hover around 1.0, 1.5, 2.0, or even higher. GANs are temperamental and sensitive to parameters so they need to be tuned appropriately.
There are some recommendations and best practices to train a stable Deep convolutional GAN offered on the machine learning mastery site which include the following:
Downsample Using Strided Convolutions
Upsample Using Strided Convolutions
Use LeakyReLU
Use Batch Normalization
Use Gaussian Weight Initialization
Use Adam Stochastic Gradient Descent
Scale Images to the Range [-1,1]
Running the GAN we see that it does a very good job at recreating the noise into the training set pictures although it looks like it is overfit to the input pictures. More work will be needed to see how to add more variation. That concludes the production of using a GAN to create anime pictures:
Step: 40/500 [G loss: 0.1167]
Step: 50/500 [G loss: 0.1167]
Step: 60/500 [G loss: 0.1167]
Step: 70/500 [G loss: 0.1167]
Step: 80/500 [G loss: 0.1167]
Step: 90/500 [G loss: 0.1167]
Step: 100/500 [G loss: 0.1167]
Comments