Pop quiz: What the heck is a GAN? You might not have heard the term, but you’ve likely seen one in action. From generating images of fake celebrities, to creating training data for self-driving cars, and even recreating suggestive (and weird) nude portraits, GANs are everywhere these days. But what exactly is this bleeding-edge technology, and what can we do with it?
A GAN (Generative Adversarial Network) is a type of AI that pits two neural networks against each other, each one training the other to improve its ability. Unlike other types of AI, GANs don’t just recognize objects—they can create them. GANs borrow a feature from biology and pit two competing neural networks against each other. One network, the “generator,” produces synthetic images, while another network, the “discriminator,” tries to guess whether the images are synthetic or real. The discriminator learns from both synthetic and real images, getting better over time. As it improves, it challenges the generator to produce ever more realistic synthetic images.
During a bite-sized sabbatical from my day job at IDEO CoLab, I dove head first into building and exploring the visual potential of GANs. I learned a lot about the differences between how humans and computers see the world, and how an AI’s output is shaped by the data we provide it. As designers, it’s important for us to understand the limitations of emerging technologies likes GANs—or else we risk creating products that harm, rather than help, humans. Here’s what I created, and what I learned. (Want to learn how to set up your own GAN? Read more here.)
GANs can generate convincing fake pictures of almost anything—faces, cats, and even anime characters. How do they do it? It all depends on the dataset. GANs need to be fed a large set of images (think tens of thousands or more) so they can learn patterns. For example, GANs that produce faces learn that they tend to be roughly round, have two eyes, one nose, one mouth, and possibly hair on top. They learn pixel-level patterns, too—the textures that make up skin, eyes, and hair.
The trick with GANs is finding sets of images large enough and diverse enough for a network to pick out patterns. There are plenty out there—like CelebA (hundreds of thousands of celebrity faces) or LSUN (images of scenes like rooms and buildings)—but these are all meant for this kind of research, and training with them tends to illustrate how successful GANs can be. I wanted to do the opposite. What if we try to stretch GANs to their limit, so that we can get a better understanding of how they work? In this case, we’re using images from The Simpsons.
Some images from our Simpsons dataset.
Once I had my dataset of images from the show, it was time to train the GAN. For those familiar with machine learning, read my step-by-step process here. For everyone else, here’s the gist of it: I installed an open-sourced machine learning software called TensorFlow on my computer, which I souped up with a hefty graphics card originally bought for virtual reality and gaming. I then used a series of tools to reformat and resize images from my Simpsons dataset.
What training a GAN looks like. To give a sense of how long this process actually takes, this screen capture was sped up 40x.
I trained the network overnight (sometimes over multiple days) and saved examples of images created by the generator network. The video below documents the learning process of the generator. With time, the images began look more and more like the actual Simpsons.
Scrub through the video to see the network’s progress. At 1:10 (about 13 hours of training), the first outlines of Simpsons eyes appear. At 2:00 (about 23 hours), the color scheme becomes noticeably brighter and figures start to emerge. At 3:00 (about 34 hours), there is less variation as shapes start to solidify and the networks reach some stabilization on what is classified as “real.” At 5:00 (about 57 hours) and beyond, the shapes start to vaguely resemble Simpsons characters, though the result is still too psychedelic and abstract for us to truly classify the images as real Simpsons frames.
A selection of Simpsons-esque images generated by the GAN.
The GAN manages to generate very Simpsons-esque images, but they’re definitely not going to fool the human eye. What’s going on? GANs (and any neural network or machine learning method, for that matter) only learn patterns that are in the training data. As humans, we have a deep and complex understanding of what people and objects should look like, including Simpsons characters and scenes in the show. The GAN, however, doesn’t bring any preexisting knowledge. It has to learn everything from the raw pixel values of the images it sees, and more complex patterns take more examples to learn.
In this case, the GAN easily learns the color palette of The Simpsons and its simple, flat illustration style, and starts to learn the features of each character. With enough training data, it may learn more complex patterns, like complete images of characters, or which characters show up in which settings (like the Simpson family at home, or Mr. Burns at the power plant).
The varied and strange outputs of these networks are a succinct example of both the promise and risks of building products, services, or systems with machine learning inside. They’re seemingly magical, learning from patterns in data without being told what to look for, but they’re also limited by that data. Imagine you came into this world yesterday and had to make sense of it all just from still frames of The Simpsons—how would that inform your thinking?
Biases in training data, intended or not, will be reflected in the output of the network, which, if embedded into a product or service, can codify the bias at an unprecedented scale. As designers, we are uniquely positioned to help with this problem. As Kate Crawford, cofounder of the AI Now Institute has said, the problem is neither a specifically social nor technical problem: It’s a socio-technical problem. Designers have a responsibility to understand the unique role data plays in machine learning, so that we can create networks that equitably serve human needs.