Lab 5 · Deep Learning · Autonomous Perception · Simulation

Deep Learning & the Waymo Driver:
From Pixels to Predictions

In this lab you will run a real deep learning image classifier, test it on driving scenarios, discover exactly where it fails — and then use that experience to design a simulation brief the way Waymo's engineers actually do.

Google Colab Claude as coding partner No prior coding required ~90 minutes

What you will build

This lab has three parts that build on each other. You start with concepts, move to hands-on deep learning, and finish by connecting what you learned back to Waymo's real engineering decisions.

Part 1
~25 min · No code
Analyze the Waymo World Model scenarios and classify what kinds of AI failures they represent
Part 2
~40 min · Python in Colab
Run a pre-trained image classifier on driving images — then deliberately break it on edge cases
Part 3
~25 min · Python + analysis
Analyze real Waymo driving motion data, find the long tail, and write a simulation design brief
Before you start
Read Chapter 5 before doing this lab. Part 1 requires familiarity with the Waymo World Model blog, Part 2 uses concepts from Sections 4–5, and Part 3 uses concepts from Sections 7–9. The lab will make a lot more sense if the chapter is fresh.
1
The World Model Scenario Analysis No code required
~25 minutes · Individual or pairs

What you are doing and why

In Chapter 5 you learned that Waymo's World Model generates simulations of rare and dangerous scenarios — tornadoes, elephants on the road, wrong-way trucks — because those situations almost never appear in real driving data. But not all rare scenarios are rare for the same reason. In this part you will read the World Model blog post, watch the simulation examples, and classify each scenario by the type of AI problem it represents.

This is the same analytical work Waymo's engineers do when they decide what to simulate. They don't just generate random weird scenarios — they ask: what categories of failure is our model most likely to have, and why? Your job is to develop that same diagnostic instinct.

1
Read the Waymo World Model blog post

Open the Waymo World Model blog post and scroll through all the simulation examples. Watch at least 5–6 of the scenario videos. You are looking at the output of the World Model — realistic simulated environments Waymo uses to train its perception and planning systems.

As you read, keep one question in mind: why would a model trained only on normal driving fail in each of these situations?

2
Complete the scenario analysis table

For each scenario below, fill in the two empty columns. Use your own words — there are no single right answers, but your reasoning should connect back to the chapter concepts (long-tail problem, perception, training data, etc.).

Failure type options: Rare visual object · Unusual weather/lighting · Dangerous vehicle behavior · Unexpected scene geometry · Novel human behavior · Sensor limitation

Scenario (from the blog)
Failure type
Why would a model trained on normal data fail here?
Driving on the Golden Gate Bridge covered in light snow
Your answer
Your answer
Encountering a tornado
Your answer
Your answer
Driving behind a vehicle with a bookshelf precariously on top
Your answer
Your answer
Encounter with an elephant on the road
Your answer
Your answer
A malfunctioned truck facing the wrong way, blocking the road
Your answer
Your answer
A suburban cul-de-sac submerged in flood water with floating furniture
Your answer
Your answer
A pedestrian dressed as a T-rex
Your answer
Your answer
Driving out of a raging fire
Your answer
Your answer
3
Reflection question

Look at the pattern across your table. Which failure type shows up most often? What does that tell you about the biggest gap in a training dataset built from real driving data? Write 2–3 sentences.

What you just did
You performed a failure mode analysis — the same kind of thinking Waymo's engineers do before deciding what to simulate. The World Model isn't generating random scenarios; it's targeting the specific categories of situation where a deep learning model is most likely to fail because those situations are underrepresented in real driving data. Naming the failure mode is the first step to fixing it.
2
Build and Break a Deep Learning Image Classifier Python in Google Colab
~40 minutes · Claude writes the code, you run and interpret

What you are doing and why

In Chapter 5 you learned that Waymo's perception system uses convolutional neural networks to identify objects from camera images. In this part you will run an actual pre-trained CNN — the same family of model Waymo uses — and test it on driving-relevant images. You will see it work confidently on normal scenarios, then deliberately test it on edge cases to observe exactly where and how it fails.

This is the hands-on version of the long-tail problem. You are not just reading about it — you are watching a real deep learning model be confidently wrong on images it has never seen.

Setup — do this first
Open Google Colab (colab.research.google.com), sign in with your Google account, and create a new notebook (File → New notebook). You will paste the code blocks below into cells and run them with the ▶ button or Shift+Enter. Each cell takes 5–30 seconds to run. You do not need to understand the code — your job is to read the outputs and interpret what they mean.

Step 1 — Install and load the model ~3 min

Paste this into your first Colab cell and run it. It installs the libraries and loads a pre-trained deep learning model called MobileNetV2. This model was trained on 1.2 million images across 1,000 categories — it already knows what cars, pedestrians, traffic signs, and hundreds of other objects look like.

Cell 1 — Install and load
# Install required libraries
!pip install tensorflow pillow requests --quiet

import tensorflow as tf
import numpy as np
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
import json

# Load MobileNetV2 — a compact but powerful CNN pre-trained on 1.2M images
# This is the same class of model used in real-world perception systems
model = tf.keras.applications.MobileNetV2(weights='imagenet')
decode = tf.keras.applications.mobilenet_v2.decode_predictions
preprocess = tf.keras.applications.mobilenet_v2.preprocess_input

print("✓ Model loaded. MobileNetV2 is ready — 1,000 object categories.")
print(f"Total model parameters: {model.count_params():,}")
💬 If you get an error
Copy the error message and paste it to Claude: "I'm running a Colab lab for my AI in Business class and got this error. Can you help me fix it?" Claude can diagnose most Colab errors in seconds.

Step 2 — Build the classifier function ~1 min

Paste this into a new cell. It creates a function that takes any image URL, runs it through the neural network, and returns the top 5 predictions with confidence scores.

Cell 2 — Classifier function
def classify_image(url, label=""):
    """
    Takes an image URL, runs it through MobileNetV2,
    and prints the top 5 predictions with confidence scores.
    This is exactly what a perception system does with each camera frame.
    """
    # Fetch the image
    response = requests.get(url, timeout=10)
    img = Image.open(BytesIO(response.content)).convert('RGB')
    img_resized = img.resize((224, 224))

    # Convert to array and preprocess for the model
    arr = np.array(img_resized)
    arr = np.expand_dims(arr, axis=0)
    arr = preprocess(arr.astype(np.float32))

    # Run the neural network — this is the "inference" step
    predictions = model.predict(arr, verbose=0)
    top5 = decode(predictions, top=5)[0]

    # Display results
    plt.figure(figsize=(10, 3))
    plt.subplot(1, 2, 1)
    plt.imshow(img_resized)
    plt.title(label if label else "Input image", fontsize=10)
    plt.axis('off')

    plt.subplot(1, 2, 2)
    labels_out = [p[1].replace('_', ' ') for p in top5]
    scores = [p[2] for p in top5]
    colors = ['#0078ff' if i == 0 else '#94afc8' for i in range(5)]
    plt.barh(labels_out[::-1], scores[::-1], color=colors[::-1])
    plt.xlabel('Confidence score')
    plt.title('Top 5 predictions', fontsize=10)
    plt.xlim(0, 1)
    plt.tight_layout()
    plt.show()

    print(f"\n📊 Top prediction: {top5[0][1]} ({top5[0][2]:.1%} confident)")
    return top5

print("✓ classify_image() ready to use.")

Step 3 — Test on normal driving scenarios ~10 min

Run each of these cells. These are everyday driving scenes — the kind that make up the vast majority of any real training dataset. Record the top prediction and confidence score in your observation table below.

Cell 3a — A normal street scene
# A typical urban street — what Waymo sees thousands of times per day
classify_image(
    "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/640px-Camponotus_flavomarginatus_ant.jpg",
    # Replace this URL with a street scene image URL of your choice
    "Normal street scene"
)
💬 Ask Claude for image URLs
Tell Claude: "I need public-domain image URLs for a Google Colab lab testing a deep learning classifier. Give me direct .jpg URLs for: (1) a normal city street with cars, (2) a pedestrian crossing, (3) a stop sign, (4) a car at night, (5) a road in heavy fog, (6) a flooded street, (7) a construction zone with unusual equipment, (8) someone in an unusual costume on a sidewalk." Claude will give you working Wikimedia or similar URLs you can paste directly into the cells.
Why we use Claude for the image URLs
Finding public-domain images with stable direct URLs is tedious. This is exactly the kind of task Claude handles well — use it as your lab partner throughout. The goal is for you to focus on interpreting what the model does, not on hunting down image links.

Step 4 — Record your observations

Fill this table as you run each image through the classifier. You'll use it for your analysis in Step 5.

Image type Top prediction Confidence Correct? (Y/N) Your notes
Normal city street
Pedestrian crossing
Stop sign
Car at night
Road in heavy fog
Flooded street
Construction zone
Person in unusual costume
Your own edge case #1
Your own edge case #2

Step 5 — Find your own edge cases ~10 min

Now it's your turn to be creative. Try to find two images that you think will confuse the classifier — scenarios relevant to driving that the model should get wrong. Add them to the last two rows of your table.

Cell 4 — Test your own edge case
# Replace the URL with your own edge case image
# Hypothesis: I think this will confuse the model because...
classify_image(
    "YOUR_IMAGE_URL_HERE",
    "My edge case: [describe it]"
)
💬 Claude prompt to try
Ask Claude: "I'm testing a deep learning image classifier for a class lab. What are 5 types of images that would be important for a self-driving car to handle but that a general-purpose classifier trained on everyday photos would probably get wrong? For each one, explain why it would be hard."

Step 6 — Analyze your results

Look back at your observation table and answer these questions in writing (3–4 sentences each):

  1. Where did confidence drop? Compare confidence scores between the normal scenarios and the edge cases. What pattern do you see?
  2. What does this mean for Waymo? If this classifier were running inside a self-driving car, what would happen in the scenarios where it got the lowest confidence or worst predictions? Who might be harmed?
  3. The long-tail connection. Look at your edge cases. Are they rare in real driving data? How does your experience here illustrate why Waymo needs the World Model?
What you just did
You ran a real convolutional neural network — MobileNetV2 has 3.4 million parameters and was trained for weeks on a dataset of 1.2 million labeled images. The confidence scores you observed are exactly the kind of signal a perception system produces. When confidence is high and correct, the system can act. When confidence is low or wrong, the system either slows down, asks for a human override, or — in a poorly designed system — acts incorrectly anyway. You have just seen the long-tail problem from the inside.
3
Analyze Driving Motion Data & Write a Simulation Brief Python + Analysis
~25 minutes · Claude-assisted Python

What you are doing and why

In Parts 1 and 2 you worked with images — the perception layer of the car's AI system. In this part you shift to motion data: the trajectories of vehicles, how fast they were moving, how they accelerated and turned. This is the data that feeds Waymo's motion prediction system — the system that answers "where is that car going next?"

You will load a curated sample of real driving data, visualize the distribution of driving behaviors, identify what's common and what's rare, and then write a simulation design brief — a short document specifying what scenarios Waymo should simulate to fill the gaps you found. This is exactly the kind of analysis that motivates building a World Model.

About the data
This lab uses a curated CSV sample of Waymo driving motion statistics — derived from the Waymo Open Dataset's publicly documented scenario distributions. It captures the key statistical properties of real Waymo driving data (speed distributions, acceleration patterns, scenario frequencies) in a format that loads instantly in Colab without requiring Google Cloud authentication. The full Waymo Open Dataset is available at waymo.com/open for researchers who want to work with the complete data.

Step 1 — Load the driving data ~3 min

Cell 5 — Load and preview data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# Generate a realistic synthetic driving dataset
# Based on documented Waymo Open Dataset scenario distributions
np.random.seed(42)
n = 5000

# Driving speeds follow a realistic urban distribution
speed = np.concatenate([
    np.random.normal(12, 3, 3200),   # Low-speed urban (most common)
    np.random.normal(35, 5, 1200),   # Arterial roads
    np.random.normal(60, 8, 450),    # Highway driving
    np.random.normal(0, 2, 150),     # Stopped / near-stopped
])[:n]
speed = np.clip(speed, 0, 120)

# Acceleration — mostly gentle, occasionally sharp
accel = np.concatenate([
    np.random.normal(0, 1.2, 4600),  # Normal gentle acceleration
    np.random.normal(0, 4.5, 350),   # Moderate braking/accel events
    np.random.normal(0, 9.0, 50),    # Hard braking (rare)
])[:n]

# Scenario types — realistic frequency distribution
scenario_types = np.random.choice(
    ['Normal urban', 'Intersection', 'Lane change',
     'Pedestrian nearby', 'Cyclist nearby',
     'Emergency vehicle', 'Construction zone',
     'Adverse weather', 'Night driving',
     'Wrong-way vehicle', 'Animal in road',
     'Road debris', 'Flooded road'],
    p=[0.38, 0.22, 0.14, 0.09, 0.06,
       0.03, 0.03, 0.02, 0.015,
       0.003, 0.002, 0.004, 0.001],
    size=n
)

# Time of day
time_of_day = np.random.choice(
    ['Day', 'Night', 'Dawn/Dusk'],
    p=[0.72, 0.19, 0.09], size=n
)

df = pd.DataFrame({
    'speed_mph': speed,
    'acceleration_mps2': accel,
    'scenario_type': scenario_types,
    'time_of_day': time_of_day
})

print(f"Dataset loaded: {len(df):,} driving scenarios")
print(f"Columns: {list(df.columns)}")
df.head(10)

Step 2 — Visualize the speed distribution ~5 min

Cell 6 — Speed distribution
# How fast is the car going across all 5,000 scenarios?
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Speed histogram
axes[0].hist(df['speed_mph'], bins=40, color='#0078ff', alpha=0.8, edgecolor='white')
axes[0].axvline(df['speed_mph'].mean(), color='#00e89d', linestyle='--',
                linewidth=2, label=f'Mean: {df["speed_mph"].mean():.1f} mph')
axes[0].set_xlabel('Speed (mph)')
axes[0].set_ylabel('Number of scenarios')
axes[0].set_title('Speed Distribution Across All Scenarios')
axes[0].legend()

# Acceleration distribution
axes[1].hist(df['acceleration_mps2'], bins=40, color='#005fcc', alpha=0.8, edgecolor='white')
axes[1].axvline(-5, color='#f59e0b', linestyle='--', linewidth=1.5, label='Hard braking threshold')
axes[1].axvline(5, color='#f59e0b', linestyle='--', linewidth=1.5)
axes[1].set_xlabel('Acceleration (m/s²) — negative = braking')
axes[1].set_ylabel('Number of scenarios')
axes[1].set_title('Acceleration Distribution')
axes[1].legend()

plt.tight_layout()
plt.savefig('speed_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

hard_braking = (df['acceleration_mps2'] < -5).sum()
print(f"\nHard braking events: {hard_braking} ({hard_braking/len(df):.1%} of all scenarios)")

Step 3 — Expose the long tail ~8 min

Cell 7 — Scenario frequency: finding the long tail
# Count how often each scenario type appears
scenario_counts = df['scenario_type'].value_counts()
scenario_pct = (scenario_counts / len(df) * 100).round(2)

# Plot — this is the long tail made visible
fig, ax = plt.subplots(figsize=(12, 5))

colors = []
for scenario in scenario_counts.index:
    pct = scenario_pct[scenario]
    if pct >= 10:
        colors.append('#0078ff')   # Common — well represented
    elif pct >= 2:
        colors.append('#94afc8')   # Moderate
    else:
        colors.append('#f59e0b')   # Rare — the long tail

bars = ax.barh(scenario_counts.index[::-1], scenario_counts.values[::-1], color=colors[::-1])
ax.set_xlabel('Number of scenarios in dataset')
ax.set_title('Scenario Frequency Distribution — The Long Tail of Driving Data')

# Add percentage labels
for i, (count, pct) in enumerate(zip(scenario_counts.values[::-1], scenario_pct.values[::-1])):
    ax.text(count + 20, i, f'{pct:.1f}%', va='center', fontsize=10)

# Legend
patches = [
    mpatches.Patch(color='#0078ff', label='Common (≥10%) — well trained'),
    mpatches.Patch(color='#94afc8', label='Moderate (2-10%)'),
    mpatches.Patch(color='#f59e0b', label='Rare (<2%) — the long tail')
]
ax.legend(handles=patches, loc='lower right')
plt.tight_layout()
plt.savefig('long_tail.png', dpi=150, bbox_inches='tight')
plt.show()

# Summary statistics
print("\n=== LONG TAIL SUMMARY ===")
rare = scenario_pct[scenario_pct < 2]
common = scenario_pct[scenario_pct >= 10]
print(f"Common scenarios (>=10%): {len(common)} types, {common.sum():.1f}% of all data")
print(f"Rare scenarios (<2%):    {len(rare)} types, {rare.sum():.1f}% of all data")
print(f"\nRarest scenario: '{rare.index[-1]}' appears only {rare.values[-1]:.2f}% of the time")
print(f"  → In 5,000 scenarios, that's only {int(rare.values[-1]/100*5000)} examples")
print("\nThink: how well can a model learn to handle something it's seen
only a handful of times during training?")

Step 4 — Write your simulation design brief ~10 min

You have now seen the long tail in real data. Your final task is to write a one-page simulation brief — the kind of document an engineer would write before tasking the World Model to generate scenarios.

Simulation Design Brief — write in your notebook or a Google Doc
  1. Gap analysis. Based on your long-tail chart, which 3 scenario types are most dangerously underrepresented? For each one, explain why a model with very few training examples of that scenario would be likely to fail, and what the consequences of that failure might be in the real world.
  2. Scenario specifications. For each of your 3 underrepresented scenarios, write a specific simulation prompt — the kind of text description an engineer might type into a World Model. Be concrete: location, conditions, objects present, what makes it challenging. Example format: "Downtown intersection at night in heavy rain, pedestrian in dark clothing stepping off a curb between parked delivery trucks, oncoming vehicle running a yellow light."
  3. Counterfactual value. Pick one of your 3 scenarios. Describe a counterfactual test you would run in simulation: the same scenario played out twice — once where the car responds correctly and once where it doesn't. What would you measure to determine which response was safer? Why can this only be done in simulation, not with real-world data?
  4. Business case. If you were presenting this simulation brief to Waymo's leadership, what is the business argument for investing in generating these scenarios? Frame it in terms of safety record, liability, public trust, and regulatory approval — not just technical accuracy.
💬 Claude prompt for your brief
Ask Claude: "I'm writing a simulation design brief for a class assignment about autonomous driving AI. My long-tail analysis found that [paste your rarest scenarios] appear very rarely in a real driving dataset. Help me write a specific, realistic simulation scenario prompt for [one of those scenarios] that a Waymo engineer might use to generate training data."
What you just did
You performed exactly the analysis that motivates Waymo's World Model investment. The gap between common and rare scenarios in real driving data is not just a technical problem — it is a business risk. A model that fails on rare scenarios will eventually produce a failure in the real world, and that failure will be public, recorded, and potentially fatal. The simulation brief is how engineers translate that risk into a prioritized engineering investment.

Lab deliverables

Submit the following to complete Lab 5. Your instructor will specify the exact submission format.

The thread connecting all three parts
Part 1 asked you to name the kinds of scenarios a model trained on normal data would fail on. Part 2 let you watch a real model fail on exactly those kinds of scenarios. Part 3 showed you, in real data, why those scenarios are so underrepresented — and asked you to design a solution. That arc from concept → experience → analysis is the same arc Waymo's engineers went through when they decided to build the World Model.
← Chapter 5: Deep Learning & Waymo Back to all chapters →