Learn Creative Coding (#99) - GANs: Creating Images from Nothing

cc-banner

Last episode we taught a neural network to translate between image domains. Pix2Pix took edge drawings and turned them into photographs -- cat sketches became furry nightmares, architectural scribbles became building facades. The model learned from paired examples: this sketch maps to this photo. The pairing was tight. Every input had a matching output in the training data, and the model learned to reproduce that mapping for new inputs.

But Pix2Pix is still conditional. It needs your sketch as input. It translates, it doesn't create. The output depends entirely on what you feed in. Without a sketch, there's no image. The generator is a function -- input in, output out.

GANs -- Generative Adversarial Networks -- break that dependency. A GAN generates images from nothing but random noise. No input sketch. No reference photo. No condition of any kind. Feed the generator a vector of random numbers and it produces an image that looks like it belongs in its training set. A GAN trained on cat photos produces cat photos that never existed. A GAN trained on landscapes produces landscapes nobody ever photographed. A GAN trained on faces produces people who were never born.

The images are not collages. They're not filtered versions of training data. They're genuinely new -- synthesized from the statistical patterns the model learned during training. Each random vector maps to a different image. And the space of all possible vectors -- the latent space -- is navigable. You can walk through it, interpolate between points, find directions that correspond to visual properties. Smiling, aging, rotating, changing hair color -- each is a direction in latent space. The latent space IS the creative parameter space, and exploring it is the artwork.

How a GAN works: generator vs discriminator

We touched on GANs briefly in episode 98 when we explained how Pix2Pix works. Let me go deeper here because understanding the adversarial dynamic is key to understanding why GAN outputs look the way they do.

A GAN has two neural networks that train against each other. The generator takes a random vector (typically 128 or 512 random numbers) and produces an image. At the start of training, this image is pure garbage -- random pixel noise. The discriminator takes an image (either a real one from the training set or a fake one from the generator) and outputs a single number: how likely is this image to be real?

// conceptual GAN training loop
// (not runnable -- this happens inside tensorflow/pytorch during training)

// step 1: discriminator training
// show it real images, teach it to say "real" (1.0)
// show it generator output, teach it to say "fake" (0.0)
discriminator.train([
  { image: realPhoto, label: 1.0 },
  { image: generator.generate(randomVector), label: 0.0 }
]);

// step 2: generator training
// generate an image, ask the discriminator to judge it
// the generator's loss = discriminator says "fake"
// so the generator learns to make images the discriminator calls "real"
let fakeImage = generator.generate(randomVector);
let score = discriminator.judge(fakeImage);
generator.adjustWeights(score);  // improve toward higher scores

// repeat millions of times
// the generator gets better at fooling the discriminator
// the discriminator gets better at detecting fakes
// they push each other upward

The training is adversarial -- they're playing a game. The generator wants to fool the discriminator. The discriminator wants to catch fakes. If one gets too far ahead, the other catches up. After enough rounds, the generator produces images so convincing that the discriminator genuinely can't tell them from real photos. At that point, the generator has learned the statistical distribution of the training data well enough to sample new images from it.

This is fundamentally different from what we've seen before. Style transfer reshapes existing images. Pix2Pix translates between domains. GANs create from scratch. The generator is not transforming an input -- it's mapping a point in random space to a point in image space. That mapping IS the model's learned understanding of what images in the training set look like.

The latent space: where images come from

The random vector you feed the generator is called a latent vector, and the space of all possible latent vectors is called the latent space. For a typical GAN, this is a 512-dimensional space (512 random numbers). Each point in this space maps to a unique image.

This sounds abstract but the important thing is that nearby points produce similar images. A small change to the latent vector produces a small change in the output image. This continuity is what makes the latent space useful for creative work -- you can navigate it smoothly.

// latent space: a 512-dimensional parameter space
// each point produces a unique image

let latentDim = 512;

// generate a random point in latent space
function randomLatent() {
  let v = [];
  for (let i = 0; i < latentDim; i++) {
    v.push(randomGaussian(0, 1));
  }
  return v;
}

// interpolate between two points
function lerpLatent(a, b, t) {
  let result = [];
  for (let i = 0; i < latentDim; i++) {
    result.push(a[i] * (1 - t) + b[i] * t);
  }
  return result;
}

// spherical interpolation (better for high-dimensional spaces)
function slerpLatent(a, b, t) {
  // normalize both vectors
  let magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
  let magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
  let aN = a.map(v => v / magA);
  let bN = b.map(v => v / magB);

  // angle between them
  let dot = aN.reduce((s, v, i) => s + v * bN[i], 0);
  dot = Math.max(-1, Math.min(1, dot));
  let theta = Math.acos(dot);

  if (theta < 0.001) return lerpLatent(a, b, t);

  let sinTheta = Math.sin(theta);
  let wA = Math.sin((1 - t) * theta) / sinTheta;
  let wB = Math.sin(t * theta) / sinTheta;

  let result = [];
  for (let i = 0; i < latentDim; i++) {
    result.push(a[i] * wA + b[i] * wB);
  }
  return result;
}

Two important operations here. Linear interpolation (lerpLatent) walks a straight line between two points. Spherical interpolation (slerpLatent) walks along the surface of a hypersphere, which produces smoother results in high-dimensional spaces because points drawn from a Gaussian distribution tend to cluster on a shell rather than filling the interior. For creative purposes, slerp usually produces better-looking transitions than lerp -- the intermediate images stay sharper and more coherent.

Remember seed-based art from episode 24? We used a single seed number to generate reproducible random output. The latent vector is the same idea scaled up massively. Instead of one seed number controlling all randomness, you have 512 independent numbers, each influencing a different aspect of the generated image. The latent space is the ultimate parameter space for generative art -- higher dimensional than anything we've worked with before, but navigable using the same interpolation techniques.

Working with pre-trained GANs via API

Training a GAN from scratch requires a powerful GPU and days or weeks of compute time. But pre-trained GANs are available through APIs that let you generate images from latent vectors or text prompts with a single HTTP request. This is the practical path for creative coding -- use someone else's trained model as your creative engine.

// using Replicate API to generate images from a GAN
// (requires API key -- use your own from replicate.com)

async function generateFromLatent(latentVector) {
  const response = await fetch('https://api.replicate.com/v1/predictions', {
    method: 'POST',
    headers: {
      'Authorization': 'Token YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      version: 'MODEL_VERSION_HERE',
      input: {
        latent_vector: latentVector
      }
    })
  });

  const prediction = await response.json();
  // poll for completion or use webhook
  return prediction;
}

// generate two random images and interpolate between them
let pointA = randomLatent();
let pointB = randomLatent();

// create 10 intermediate images
for (let t = 0; t <= 1.0; t += 0.1) {
  let interpolated = slerpLatent(pointA, pointB, t);
  let image = await generateFromLatent(interpolated);
  console.log('t=' + t.toFixed(1) + ': ' + image.output);
}

The API approach has latency -- each generation takes a few seconds -- but it gives you access to models far too large to run in the browser. StyleGAN2 (NVIDIA's flagship face generator) produces photorealistic 1024x1024 faces. StyleGAN3 generates faces without the aliasing artifacts of its predecessor. These models are 100+ million parameters. You can't run them in JavaScript. But you can call them from JavaScript and display the results in your p5 sketch.

Latent space interpolation: the walk as artwork

The most immediately striking thing you can do with a GAN is a latent space walk -- smoothly interpolating between two random points and watching the generated image morph. Face A transforms into Face B. The transition passes through faces that never existed, each one a unique person generated for a fraction of a second and then gone forever. The in-between is often more interesting than the endpoints.

let images = [];
let currentIdx = 0;
let numSteps = 60;
let pointA, pointB;
let walking = false;

function setup() {
  createCanvas(512, 512);
  background(10, 12, 18);
  textFont('monospace');
}

function startWalk() {
  pointA = randomLatent();
  pointB = randomLatent();
  images = [];
  currentIdx = 0;
  walking = true;
  generateNextStep();
}

async function generateNextStep() {
  if (currentIdx >= numSteps) {
    walking = false;
    return;
  }

  let t = currentIdx / (numSteps - 1);
  let latent = slerpLatent(pointA, pointB, t);

  // in practice, call API and load the result image
  // here we simulate with a placeholder
  let img = await callGanApi(latent);
  images.push(img);
  currentIdx++;

  generateNextStep();
}

function draw() {
  background(10, 12, 18);

  if (images.length > 0) {
    // cycle through generated images
    let idx = frameCount % images.length;
    image(images[idx], 0, 0, width, height);

    // progress bar
    fill(40);
    noStroke();
    rect(0, height - 4, width, 4);
    fill(180, 140, 80);
    rect(0, height - 4, width * (idx / images.length), 4);
  }

  // status
  fill(140);
  noStroke();
  textSize(10);
  text('images: ' + images.length + '/' + numSteps, 10, 20);
  text('press W to walk', 10, 36);
}

function keyPressed() {
  if (key === 'w') startWalk();
}

Press W and the system generates 60 steps between two random latent points. Each step is a unique image. Played back at 30fps, the walk takes 2 seconds -- a smooth morph between two randomly chosen images. Every walk is different. Every frame is a unique, never-before-seen image.

The creative decisions are: which pairs of points to walk between, how many steps to take (more = smoother but slower), and whether to use linear or spherical interpolation. Some walks pass through mundane intermediate images. Others hit startling transitions -- a face suddenly gains glasses, a landscape shifts from summer to winter, a cat's breed changes mid-morph. You don't control WHAT happens between the endpoints. You discover it.

Latent space arithmetic: attributes as directions

Here's where latent space gets properly wild. Specific visual attributes -- smiling, wearing glasses, being blonde, facing left -- correspond to specific directions in latent space. Find the direction for "smiling" and you can add it to any face to make it smile. Find the direction for "glasses" and you can add or remove glasses from any portrait. The attributes are vectors, and they add and subtract like regular math.

// latent space arithmetic
// find the "smile" direction by averaging latent vectors
// of smiling faces and subtracting neutral faces

// conceptual -- requires pre-computed latent vectors
// from a trained model's encoder

function computeAttributeDirection(smilingLatents, neutralLatents) {
  let smileAvg = averageLatent(smilingLatents);
  let neutralAvg = averageLatent(neutralLatents);

  // the difference IS the "smile" direction
  let smileDir = [];
  for (let i = 0; i < smileAvg.length; i++) {
    smileDir.push(smileAvg[i] - neutralAvg[i]);
  }
  return smileDir;
}

function averageLatent(latents) {
  let avg = new Array(latents[0].length).fill(0);
  for (let v of latents) {
    for (let i = 0; i < v.length; i++) {
      avg[i] += v[i];
    }
  }
  return avg.map(v => v / latents.length);
}

// apply the attribute: add the direction, scaled by strength
function applyAttribute(latentVector, attributeDir, strength) {
  let result = [];
  for (let i = 0; i < latentVector.length; i++) {
    result.push(latentVector[i] + attributeDir[i] * strength);
  }
  return result;
}

// generate a face, then add smile at different intensities
let baseFace = randomLatent();
let strengths = [-2, -1, 0, 1, 2];

for (let s of strengths) {
  let modified = applyAttribute(baseFace, smileDirection, s);
  // generate image from modified latent
  // negative = less smile (frown?), positive = more smile
}

The math is simple -- it's vector addition. See where this is going? The interesting part is that it works AT ALL. Nobody told the GAN to organize its latent space so that "smiling" is a consistent direction. The GAN just learned to generate faces, and as a side effect of learning, semantic attributes arranged themselves as linear directions in the space. This is an emergent property. It falls out of the training process naturally.

For creative coding, attribute vectors are creative parameters. They're like sliders in a photo editor but weirder -- you can push them past their natural range. What happens when the "smile" attribute is set to 10x? The face grins impossibly wide. What happens when you combine "smile" + "old" + "glasses" + "blonde"? You get a specific intersection of all those attributes applied to a random base face. The combination space is enormous. Every combination of attribute directions at different strengths produces a different face.

Running a lightweight GAN in the browser

Most serious GAN models are too large for in-browser execution. But smaller models do exist. TensorFlow.js can load compressed GAN models that generate low-resolution images (64x64 or 128x128) at interactive speeds. The quality is lower than StyleGAN but the interactivity is immediate -- no API calls, no latency, generate images as fast as the GPU can go.

let generator;
let currentImage;
let latent;

async function setup() {
  createCanvas(512, 512);

  // load a pre-trained generator model (TensorFlow.js format)
  // in practice, you'd host this yourself or use a public model
  generator = await tf.loadGraphModel('model/generator/model.json');

  latent = tf.randomNormal([1, 128]);
  generateImage();
}

async function generateImage() {
  // run the generator
  let output = generator.predict(latent);

  // output is a tensor of shape [1, 64, 64, 3]
  // values in [-1, 1], need to rescale to [0, 255]
  let scaled = output.add(1).mul(127.5);
  let pixels = await scaled.data();

  // create p5 image from pixel data
  currentImage = createImage(64, 64);
  currentImage.loadPixels();
  for (let i = 0; i < 64 * 64; i++) {
    currentImage.pixels[i * 4] = pixels[i * 3];
    currentImage.pixels[i * 4 + 1] = pixels[i * 3 + 1];
    currentImage.pixels[i * 4 + 2] = pixels[i * 3 + 2];
    currentImage.pixels[i * 4 + 3] = 255;
  }
  currentImage.updatePixels();
}

function draw() {
  background(10, 12, 18);

  if (currentImage) {
    // upscale the 64x64 to fill canvas -- pixelation is part of the aethetic
    noSmooth();
    image(currentImage, 0, 0, width, height);
  }

  fill(140);
  noStroke();
  textSize(10);
  textFont('monospace');
  text('press SPACE for new face', 10, height - 10);
}

function keyPressed() {
  if (key === ' ') {
    // new random latent -> new face
    latent.dispose();
    latent = tf.randomNormal([1, 128]);
    generateImage();
  }
}

Press space and a new face appears. Each face is unique -- generated from a random 128-dimensional vector. At 64x64 resolution the faces are blocky and impressionistic rather than photorealistic, which for creative coding purposes is actually more interesting. The low resolution forces a painterly quality. The faces are clearly faces but the details are ambiguous -- is that a shadow or a beard? Is that hair or a hat? The model fills in just enough detail to read as "face" and leaves the rest to your imagination.

The noSmooth() call is important. Without it, p5 interpolates between pixels when scaling up, producing a blurry image. With noSmooth(), you get hard pixel edges that emphasize the grid structure. Each pixel is a deliberate choice by the generator. The pixelation isn't a limitation -- it's the medium.

Real-time latent space exploration

With an in-browser model, you can explore the latent space interactively. Map mouse position to two dimensions of the latent vector and watch the generated face change as you move the cursor.

let generator;
let currentImage;
let baseLatent;
let isGenerating = false;

async function setup() {
  createCanvas(512, 560);
  generator = await tf.loadGraphModel('model/generator/model.json');
  baseLatent = Array.from({ length: 128 }, () => randomGaussian(0, 1));
  requestGeneration();
}

function requestGeneration() {
  if (isGenerating) return;
  isGenerating = true;

  // map mouse to two latent dimensions
  let modified = baseLatent.slice();
  modified[0] = map(mouseX, 0, width, -3, 3);
  modified[1] = map(mouseY, 0, height, -3, 3);

  let tensor = tf.tensor2d([modified]);
  let output = generator.predict(tensor);
  let scaled = output.add(1).mul(127.5);

  scaled.data().then(function(pixels) {
    currentImage = createImage(64, 64);
    currentImage.loadPixels();
    for (let i = 0; i < 64 * 64; i++) {
      currentImage.pixels[i * 4] = pixels[i * 3];
      currentImage.pixels[i * 4 + 1] = pixels[i * 3 + 1];
      currentImage.pixels[i * 4 + 2] = pixels[i * 3 + 2];
      currentImage.pixels[i * 4 + 3] = 255;
    }
    currentImage.updatePixels();

    tensor.dispose();
    output.dispose();
    scaled.dispose();
    isGenerating = false;
  });
}

function draw() {
  background(10, 12, 18);

  if (currentImage) {
    noSmooth();
    image(currentImage, 0, 0, 512, 512);
  }

  // generate on mouse movement
  if (frameCount % 3 === 0) {
    requestGeneration();
  }

  // controls
  fill(40);
  noStroke();
  rect(0, 512, 512, 48);
  fill(130, 140, 160);
  textSize(9);
  textFont('monospace');
  text('mouse controls latent dims 0 & 1', 10, 532);
  text('SPACE: new base face | S: save', 10, 546);
}

function keyPressed() {
  if (key === ' ') {
    baseLatent = Array.from({ length: 128 }, () => randomGaussian(0, 1));
  }
  if (key === 's') {
    if (currentImage) save(currentImage, 'gan-face-' + frameCount + '.png');
  }
}

Move your mouse and the face shifts. Move left and maybe the face rotates. Move down and maybe the skin tone changes. What each axis controls isn't predetermined -- it depends on how the GAN organized its latent space during training. You discover the mapping through exploration. Some regions produce coherent faces. Others produce melted, abstract forms where the model's representation breaks down. The boundaries between coherent and incoherent regions are the most creative territory -- faces that are almost right but not quite, features that blend into each other, impossible anatomy that reads as portrait from a distance.

Space gives you a new base face and the mouse axes control different latent dimensions relative to that new base. Each base face gives you a different local neighborhood to explore. Some neighborhoods are boring (all variations look similar). Others are rich (tiny movements produce dramatic changes). Finding the interesting regions is part of the creative process.

GAN art: the uncanny aesthetic

GAN-generated images have a distinctive aesthetic that's recognizable even when the quality is high. Early GANs produced obvious artifacts -- grid patterns, checkerboard textures, symmetric distortions. Modern GANs (StyleGAN2/3) minimized most of these, but a subtle uncanniness remains. Backgrounds melt into foreheads. Earrings don't match. Text in the background is gibberish. Hair intersects with clothing in physically impossible ways.

This uncanniness is the medium's signature. Just like oil painting has visible brushstrokes and woodcuts have parallel lines, GAN art has its own visual fingerprint: the slightly-too-smooth skin, the asymmetric accessories, the melting boundaries between figure and ground.

// embracing GAN artifacts as creative material
// generate many images, filter for the "interesting" ones
// (glitchy, uncanny, surreal) rather than the "good" ones

let gallery = [];
let gallerySize = 16;
let currentPage = 0;

async function generateGallery() {
  gallery = [];
  for (let i = 0; i < gallerySize; i++) {
    let latent = randomLatent();
    let img = await generateFromModel(latent);
    gallery.push({
      image: img,
      latent: latent,
      score: 0  // you rate them manually
    });
  }
}

function draw() {
  background(10, 12, 18);

  // 4x4 grid of generated images
  let cols = 4;
  let cellW = width / cols;
  let cellH = (height - 40) / cols;

  for (let i = 0; i < gallery.length; i++) {
    let col = i % cols;
    let row = Math.floor(i / cols);
    let x = col * cellW;
    let y = row * cellH;

    if (gallery[i].image) {
      image(gallery[i].image, x + 2, y + 2, cellW - 4, cellH - 4);
    }

    // highlight scored images
    if (gallery[i].score > 0) {
      noFill();
      stroke(180, 140, 60);
      strokeWeight(2);
      rect(x + 1, y + 1, cellW - 2, cellH - 2);
    }
  }

  fill(120);
  noStroke();
  textSize(9);
  textFont('monospace');
  text('click to select favorites | R: regenerate | S: save selections', 10, height - 10);
}

function mousePressed() {
  let cols = 4;
  let cellW = width / cols;
  let cellH = (height - 40) / cols;
  let col = Math.floor(mouseX / cellW);
  let row = Math.floor(mouseY / cellH);
  let idx = row * cols + col;

  if (idx < gallery.length) {
    gallery[idx].score = gallery[idx].score > 0 ? 0 : 1;
  }
}

This is curation as creative practice. Generate a batch of 16 faces. Browse them. Pick the ones that resonate -- not the most realistic ones, but the ones with the most interesting artifacts, the most evocative distortions, the most surreal quality. The curation IS the art. You're selecting from the model's output space based on aesthetic judgment that no algorithm can replicate. The GAN provides the raw material. You provide the taste.

Artists like Robbie Barrat, Helena Sarin, and Mario Klingemann have built careers on this practice. Barrat's AI nude portraits sold at Christie's. Sarin's botanical GANs produce impossible flowers with organic, almost biological quality. Klingemann's neurographic portraits explore the boundary between portrait and abstraction. Each artist uses GANs as medium, not tool. The distinctive uncanniness of the output IS their artistic signature.

Feeding creative coding to GANs

Everything we've built in this series can interact with GANs. Our generative systems produce images. GANs consume and produce images. The pipeline works in both directions.

Direction 1: use code output as GAN input. Draw a noise field, particle traces, or L-system branches (episode 54), encode them as a sketch, and feed them to Pix2Pix (episode 98). The code generates the sketch. The GAN interprets the sketch. The output is a hybrid of algorithmic and neural generation.

Direction 2: use GAN output as creative coding input. Generate a face, load it into p5, and decompose it into particles. Apply our erosion simulation (episode 57) to a GAN landscape. Run our wave simulation (episode 59) using GAN colors. The GAN provides texture and form. Our code provides motion and transformation.

// direction 2: GAN output as raw material for particle decomposition

let ganImage;
let particles = [];

function setup() {
  createCanvas(512, 512);
  // assume ganImage is loaded from API or pre-generated
  loadImage('gan-face.png', function(img) {
    ganImage = img;
    decomposeIntoParticles();
  });
}

function decomposeIntoParticles() {
  ganImage.loadPixels();
  let step = 4;

  for (let y = 0; y < ganImage.height; y += step) {
    for (let x = 0; x < ganImage.width; x += step) {
      let idx = (y * ganImage.width + x) * 4;
      let r = ganImage.pixels[idx];
      let g = ganImage.pixels[idx + 1];
      let b = ganImage.pixels[idx + 2];

      // skip very dark pixels
      if (r + g + b < 30) continue;

      particles.push({
        x: map(x, 0, ganImage.width, 0, width),
        y: map(y, 0, ganImage.height, 0, height),
        homeX: map(x, 0, ganImage.width, 0, width),
        homeY: map(y, 0, ganImage.height, 0, height),
        vx: 0,
        vy: 0,
        r: r, g: g, b: b,
        size: step * 0.8
      });
    }
  }
}

function draw() {
  background(10, 12, 18, 25);

  for (let p of particles) {
    // drift away from home based on mouse distance
    let d = dist(mouseX, mouseY, p.x, p.y);
    if (d < 100) {
      let angle = atan2(p.y - mouseY, p.x - mouseX);
      p.vx += cos(angle) * 2;
      p.vy += sin(angle) * 2;
    }

    // spring back toward home position
    p.vx += (p.homeX - p.x) * 0.02;
    p.vy += (p.homeY - p.y) * 0.02;

    // friction
    p.vx *= 0.92;
    p.vy *= 0.92;

    p.x += p.vx;
    p.y += p.vy;

    noStroke();
    fill(p.r, p.g, p.b, 180);
    circle(p.x, p.y, p.size);
  }
}

A GAN-generated face, decomposed into colored particles. Move your mouse over the face and the particles scatter, revealing the void behind the portrait. Move away and they drift back into formation, the face reassembling itself. The image exists only as long as you let it. It was never a real person, and now it's not even a coherent image -- just a cloud of colored dots that temporarily arrange themselves into something that looks like someone.

The metaphor writes itself, honestly :-). A face that never existed, rendered as particles that scatter at the slightest disturbance. The impermanence of generated identity. The fragility of synthetic personhood. It works as an interactive piece because the interaction (scattering and reforming) mirrors the conceptual content (a face that exists only as statistical pattern, not as person).

Mode collapse and happy accidents

GANs don't always work perfectly. One common failure mode is mode collapse -- the generator discovers that one specific output fools the discriminator consistently, so it produces that same output (or minor variations of it) for every input. Instead of generating diverse faces, it generates the same face over and over.

For practical ML, mode collapse is a bug. For creative coding, it can be a feature. A GAN that only generates slight variations of one face is producing a series of portraits of the same nonexistent person from different angles, with different expressions, in different lighting. It's a character study of nobody. The repetition with variation has a hypnotic quality -- familiar but never quite the same.

// visualizing mode collapse: N generations in a grid
// if the GAN has collapsed, all images will look similar
// which is its own kind of aesthetic

let gridImages = [];
let cols = 6;
let rows = 4;

async function generateGrid() {
  gridImages = [];

  for (let i = 0; i < cols * rows; i++) {
    let latent = randomLatent();
    let img = await generateFromModel(latent);
    gridImages.push(img);
  }
}

function draw() {
  background(10, 12, 18);

  let cellW = width / cols;
  let cellH = height / rows;

  for (let i = 0; i < gridImages.length; i++) {
    let c = i % cols;
    let r = Math.floor(i / cols);

    if (gridImages[i]) {
      image(gridImages[i], c * cellW, r * cellH, cellW, cellH);
    }
  }
}

Another interesting failure mode: artifacts as texture. Some GAN architectures produce characteristic grid-like or checkerboard patterns, especially during early training or near the edges of the latent space. These patterns are technical failures but they have a distinctive visual quality -- almost like digital weaving or moire interference. Some artists deliberately use undertrained or poorly configured GANs specifically to harvest these artifacts as textutal material for collage and compositing.

The creative exercise: latent space portrait gallery

Allez, time to put it together. A portrait gallery that generates faces, lets you walk between them in latent space, and saves your favorties. The latent space walk IS the exhibition -- each frame on the wall is a person who never existed, generated at one point on a continuous path through high-dimensional space.

let portraits = [];
let walkFrames = [];
let currentWalkIdx = 0;
let displayMode = 'gallery';  // 'gallery' or 'walk'

// pre-generate a set of anchor points
let anchors = [];
let numAnchors = 8;

function setup() {
  createCanvas(800, 500);
  textFont('monospace');

  // create anchor points in latent space
  for (let i = 0; i < numAnchors; i++) {
    anchors.push(randomLatent());
  }
}

async function generatePortraits() {
  portraits = [];
  for (let i = 0; i < numAnchors; i++) {
    let img = await generateFromModel(anchors[i]);
    portraits.push({
      image: img,
      latent: anchors[i],
      idx: i
    });
  }
  displayMode = 'gallery';
}

async function walkBetween(idxA, idxB) {
  walkFrames = [];
  let steps = 30;

  for (let s = 0; s <= steps; s++) {
    let t = s / steps;
    let latent = slerpLatent(anchors[idxA], anchors[idxB], t);
    let img = await generateFromModel(latent);
    walkFrames.push(img);
  }

  currentWalkIdx = 0;
  displayMode = 'walk';
}

function draw() {
  background(10, 12, 18);

  if (displayMode === 'gallery') {
    drawGallery();
  } else {
    drawWalk();
  }
}

function drawGallery() {
  let cols = 4;
  let cellW = width / cols;
  let cellH = (height - 40) / 2;

  for (let i = 0; i < portraits.length; i++) {
    let c = i % cols;
    let r = Math.floor(i / cols);

    if (portraits[i] && portraits[i].image) {
      image(portraits[i].image, c * cellW + 4, r * cellH + 4,
            cellW - 8, cellH - 8);
    }

    // label
    fill(120);
    noStroke();
    textSize(8);
    text('#' + i, c * cellW + 8, r * cellH + 16);
  }

  fill(130);
  textSize(9);
  text('G: generate | click two portraits to walk between them', 10, height - 10);
}

function drawWalk() {
  if (walkFrames.length === 0) return;

  currentWalkIdx = frameCount % walkFrames.length;
  let img = walkFrames[currentWalkIdx];

  if (img) {
    image(img, 0, 0, width, height - 40);
  }

  // progress
  fill(40);
  noStroke();
  rect(0, height - 40, width, 40);

  fill(180, 140, 60);
  let progress = currentWalkIdx / walkFrames.length;
  rect(0, height - 40, width * progress, 4);

  fill(130, 140, 160);
  textSize(9);
  text('frame ' + currentWalkIdx + '/' + walkFrames.length +
       ' | ESC: back to gallery', 10, height - 14);
}

function keyPressed() {
  if (key === 'g') generatePortraits();
  if (keyCode === ESCAPE) displayMode = 'gallery';
}

Generate 8 portraits. Each one is a person who never existed, pinned to a specific point in latent space. Click two portraits and the system generates a walk between them -- 30 intermediate faces morphing smoothly from one to the other. The walk loops, oscillating between the two endpoints, showing a continuous transformation between two identities. Each intermediate frame is its own unique nonexistent person, alive for one frame and gone. The gallery is a museum of impossible people. The walk between them is the art.

Ethical and conceptual territory

GANs generate images of things that don't exist, and that raises questions worth thinking about. Faces especially. A GAN trained on photos of real people produces new "people" based on those real people's faces. The training data subjects didn't consent to having their likenesses used to generate synthetic faces. The generated faces inherit whatever biases exist in the training data -- demographic distributions, beauty standards, lighting conditions that favor certain skin tones.

For creative coding, these aren't just theoretical issues. If you build an installation that generates faces and displays them publicly, you're presenting synthetic people to an audience. Some viewers will connect emotionally with those faces. Some won't realize they're fake. The ability to generate photorealistic people who don't exist is a genuinely new thing in human history, and using it in art means engaging with what that new capability means.

This isn't a reason not to use GANs. It's a reason to use them thoughtfully. Know what the training data looks like. Acknowledge the synthetic nature of the output. Consider whether your use case benefits from photorealism or whether a more obviously artificial aesthetic serves the concept better. The low-resolution, pixelated, clearly-synthetic output of a small browser GAN might be more honest than a photorealistic API output pretending to be a photograph.

Where does this lead?

GANs are one approach to image generation but not the only one. Diffusion models (the architecture behind Stable Diffusion, DALL-E, Midjourney) have largely overtaken GANs for image quality in the last few years. They work differently -- instead of adversarial training, they learn to gradually remove noise from a random image until a clean image emerges. The results are generally higher quality and more controllable than GANs, especially for text-to-image generation.

But GANs gave us the latent space concept, and that's applicable everywhere. Whether you use a GAN, a diffusion model, or a future architecture that hasn't been invented yet, the idea of a continuous parameter space where nearby points produce similar outputs and specific directions correspond to semantic attributes -- that framework is fundamental. We saw it with seeds in episode 24. We see it here at massive scale. And we'll keep seeing it as ML models get integrated deeper into creative tools.

Next episode we'll look at training custom models on your own data -- taking the pre-trained models from this arc and fine-tuning them for your specific creative vision. Instead of using someone else's cat generator or face generator, you train on your own image collection. The model learns YOUR visual language and generates more of it.

't Komt erop neer...

GANs generate images from nothing but random noise. Two networks train against each other: the generator creates images from random vectors, the discriminator judges real vs fake. After enough adversarial training, the generator produces images the discriminator can't distinguish from real photographs. Unlike Pix2Pix (which translates from a sketch) or style transfer (which reshapes an existing image), GANs create genuinely new images from scratch
The latent space is the space of all possible random vectors that feed the generator. Each point maps to a unique image. Nearby points produce similar images. This continuity makes the space navigable -- you can interpolate between points, walk along paths, and explore neighborhoods. Spherical interpolation (slerp) produces smoother transitions than linear interpolation in high-dimensional spaces
Latent space interpolation -- smoothly walking between two random points -- produces morph sequences where every intermediate frame is a unique, never-before-seen image. The transition from Face A to Face B passes through faces that exist only for a single frame. The in-between is often more interesting than the endpoints. The walk itself is the artwork
Attribute directions: specific visual properties (smiling, glasses, age, hair color) correspond to directions in latent space. Find the direction by averaging latent vectors of images with the attribute and subtracting those without. Then add or subtract that direction from any image's latent vector to modify the attribute. This works because GANs naturally organize semantic properties as linear directions during training -- an emergent property, not a designed feature
Pre-trained models like StyleGAN2/3 produce photorealistic high-resolution images but require API access (too large for browser). Smaller models run in TensorFlow.js at lower resolution (64x64) with interactive speed. The pixelated quality of small models has its own aesthetic -- impressionistic, painterly, obviously synthetic
GAN output becomes raw material for creative coding: decompose generated images into particles, apply physics simulations, feed them through our noise and erosion systems. In the other direction, use generative code output (noise fields, L-systems, particle traces) as input for Pix2Pix. The pipeline works both ways -- code as input for neural networks, neural output as input for code
Mode collapse (generator producing the same image repeatedly) and training artifacts (checkerboard patterns, grid textures) are ML failures but creative opportunities. Repetition with variation creates portrait series. Artifact textures become digital weaving. Some artists deliberately use undertrained GANs to harvest these visual qualities
The curation workflow -- generate many images, select the interesting ones based on aesthetic judgment -- is a creative practice in its own right. Artists like Robbie Barrat, Helena Sarin, and Mario Klingemann built careers curating GAN output. The selection IS the art. The GAN provides raw material. The artist provides taste
GAN-generated faces raise ethical questions: training data consent, demographic bias, the novelty of synthetic people who don't exist. Low-resolution obviously-synthetic output may be more appropriate than photorealistic output depending on context. Using GANs thoughtfully means knowing the training data, acknowledging synthesis, and considering whether your aesthetic choices serve your conceptual goals
Diffusion models have largely overtaken GANs for image quality, but the latent space concept is universal. Continuous parameter spaces where nearby points produce similar outputs and semantic attributes map to linear directions -- that framework applies regardless of the underlying architecture

Seven episodes into the ML arc. Classification watches (episode 92). Body tracking follows (93-95). Classification goes deep (96). Style transfer paints (97). Pix2Pix translates (98). And now GANs generate from nothing. Each episode the network takes more creative control -- from observer to interpreter to translator to creator. The progression mirrors the history of the field itself. And we haven't talked about training your own models yet.

Sallukes! Thanks for reading.

@femdev

Learn Creative Coding (#99) - GANs: Creating Images from Nothing | Ecency