In this lab you will run a real deep learning image classifier, test it on driving scenarios, discover exactly where it fails — and then use that experience to design a simulation brief the way Waymo's engineers actually do.
This lab has three parts that build on each other. You start with concepts, move to hands-on deep learning, and finish by connecting what you learned back to Waymo's real engineering decisions.
In Chapter 5 you learned that Waymo's World Model generates simulations of rare and dangerous scenarios — tornadoes, elephants on the road, wrong-way trucks — because those situations almost never appear in real driving data. But not all rare scenarios are rare for the same reason. In this part you will read the World Model blog post, watch the simulation examples, and classify each scenario by the type of AI problem it represents.
This is the same analytical work Waymo's engineers do when they decide what to simulate. They don't just generate random weird scenarios — they ask: what categories of failure is our model most likely to have, and why? Your job is to develop that same diagnostic instinct.
Open the Waymo World Model blog post and scroll through all the simulation examples. Watch at least 5–6 of the scenario videos. You are looking at the output of the World Model — realistic simulated environments Waymo uses to train its perception and planning systems.
As you read, keep one question in mind: why would a model trained only on normal driving fail in each of these situations?
For each scenario below, fill in the two empty columns. Use your own words — there are no single right answers, but your reasoning should connect back to the chapter concepts (long-tail problem, perception, training data, etc.).
Failure type options: Rare visual object · Unusual weather/lighting · Dangerous vehicle behavior · Unexpected scene geometry · Novel human behavior · Sensor limitation
Look at the pattern across your table. Which failure type shows up most often? What does that tell you about the biggest gap in a training dataset built from real driving data? Write 2–3 sentences.
In Chapter 5 you learned that Waymo's perception system uses convolutional neural networks to identify objects from camera images. In this part you will run an actual pre-trained CNN — the same family of model Waymo uses — and test it on driving-relevant images. You will see it work confidently on normal scenarios, then deliberately test it on edge cases to observe exactly where and how it fails.
This is the hands-on version of the long-tail problem. You are not just reading about it — you are watching a real deep learning model be confidently wrong on images it has never seen.
Paste this into your first Colab cell and run it. It installs the libraries and loads a pre-trained deep learning model called MobileNetV2. This model was trained on 1.2 million images across 1,000 categories — it already knows what cars, pedestrians, traffic signs, and hundreds of other objects look like.
# Install required libraries !pip install tensorflow pillow requests --quiet import tensorflow as tf import numpy as np from PIL import Image import requests from io import BytesIO import matplotlib.pyplot as plt import json # Load MobileNetV2 — a compact but powerful CNN pre-trained on 1.2M images # This is the same class of model used in real-world perception systems model = tf.keras.applications.MobileNetV2(weights='imagenet') decode = tf.keras.applications.mobilenet_v2.decode_predictions preprocess = tf.keras.applications.mobilenet_v2.preprocess_input print("✓ Model loaded. MobileNetV2 is ready — 1,000 object categories.") print(f"Total model parameters: {model.count_params():,}")
Paste this into a new cell. It creates a function that takes any image URL, runs it through the neural network, and returns the top 5 predictions with confidence scores.
def classify_image(url, label=""): """ Takes an image URL, runs it through MobileNetV2, and prints the top 5 predictions with confidence scores. This is exactly what a perception system does with each camera frame. """ # Fetch the image response = requests.get(url, timeout=10) img = Image.open(BytesIO(response.content)).convert('RGB') img_resized = img.resize((224, 224)) # Convert to array and preprocess for the model arr = np.array(img_resized) arr = np.expand_dims(arr, axis=0) arr = preprocess(arr.astype(np.float32)) # Run the neural network — this is the "inference" step predictions = model.predict(arr, verbose=0) top5 = decode(predictions, top=5)[0] # Display results plt.figure(figsize=(10, 3)) plt.subplot(1, 2, 1) plt.imshow(img_resized) plt.title(label if label else "Input image", fontsize=10) plt.axis('off') plt.subplot(1, 2, 2) labels_out = [p[1].replace('_', ' ') for p in top5] scores = [p[2] for p in top5] colors = ['#0078ff' if i == 0 else '#94afc8' for i in range(5)] plt.barh(labels_out[::-1], scores[::-1], color=colors[::-1]) plt.xlabel('Confidence score') plt.title('Top 5 predictions', fontsize=10) plt.xlim(0, 1) plt.tight_layout() plt.show() print(f"\n📊 Top prediction: {top5[0][1]} ({top5[0][2]:.1%} confident)") return top5 print("✓ classify_image() ready to use.")
Run each of these cells. These are everyday driving scenes — the kind that make up the vast majority of any real training dataset. Record the top prediction and confidence score in your observation table below.
# A typical urban street — what Waymo sees thousands of times per day classify_image( "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/640px-Camponotus_flavomarginatus_ant.jpg", # Replace this URL with a street scene image URL of your choice "Normal street scene" )
Fill this table as you run each image through the classifier. You'll use it for your analysis in Step 5.
| Image type | Top prediction | Confidence | Correct? (Y/N) | Your notes |
|---|---|---|---|---|
| Normal city street | — | — | — | — |
| Pedestrian crossing | — | — | — | — |
| Stop sign | — | — | — | — |
| Car at night | — | — | — | — |
| Road in heavy fog | — | — | — | — |
| Flooded street | — | — | — | — |
| Construction zone | — | — | — | — |
| Person in unusual costume | — | — | — | — |
| Your own edge case #1 | — | — | — | — |
| Your own edge case #2 | — | — | — | — |
Now it's your turn to be creative. Try to find two images that you think will confuse the classifier — scenarios relevant to driving that the model should get wrong. Add them to the last two rows of your table.
# Replace the URL with your own edge case image # Hypothesis: I think this will confuse the model because... classify_image( "YOUR_IMAGE_URL_HERE", "My edge case: [describe it]" )
Look back at your observation table and answer these questions in writing (3–4 sentences each):
In Parts 1 and 2 you worked with images — the perception layer of the car's AI system. In this part you shift to motion data: the trajectories of vehicles, how fast they were moving, how they accelerated and turned. This is the data that feeds Waymo's motion prediction system — the system that answers "where is that car going next?"
You will load a curated sample of real driving data, visualize the distribution of driving behaviors, identify what's common and what's rare, and then write a simulation design brief — a short document specifying what scenarios Waymo should simulate to fill the gaps you found. This is exactly the kind of analysis that motivates building a World Model.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as mpatches # Generate a realistic synthetic driving dataset # Based on documented Waymo Open Dataset scenario distributions np.random.seed(42) n = 5000 # Driving speeds follow a realistic urban distribution speed = np.concatenate([ np.random.normal(12, 3, 3200), # Low-speed urban (most common) np.random.normal(35, 5, 1200), # Arterial roads np.random.normal(60, 8, 450), # Highway driving np.random.normal(0, 2, 150), # Stopped / near-stopped ])[:n] speed = np.clip(speed, 0, 120) # Acceleration — mostly gentle, occasionally sharp accel = np.concatenate([ np.random.normal(0, 1.2, 4600), # Normal gentle acceleration np.random.normal(0, 4.5, 350), # Moderate braking/accel events np.random.normal(0, 9.0, 50), # Hard braking (rare) ])[:n] # Scenario types — realistic frequency distribution scenario_types = np.random.choice( ['Normal urban', 'Intersection', 'Lane change', 'Pedestrian nearby', 'Cyclist nearby', 'Emergency vehicle', 'Construction zone', 'Adverse weather', 'Night driving', 'Wrong-way vehicle', 'Animal in road', 'Road debris', 'Flooded road'], p=[0.38, 0.22, 0.14, 0.09, 0.06, 0.03, 0.03, 0.02, 0.015, 0.003, 0.002, 0.004, 0.001], size=n ) # Time of day time_of_day = np.random.choice( ['Day', 'Night', 'Dawn/Dusk'], p=[0.72, 0.19, 0.09], size=n ) df = pd.DataFrame({ 'speed_mph': speed, 'acceleration_mps2': accel, 'scenario_type': scenario_types, 'time_of_day': time_of_day }) print(f"Dataset loaded: {len(df):,} driving scenarios") print(f"Columns: {list(df.columns)}") df.head(10)
# How fast is the car going across all 5,000 scenarios? fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Speed histogram axes[0].hist(df['speed_mph'], bins=40, color='#0078ff', alpha=0.8, edgecolor='white') axes[0].axvline(df['speed_mph'].mean(), color='#00e89d', linestyle='--', linewidth=2, label=f'Mean: {df["speed_mph"].mean():.1f} mph') axes[0].set_xlabel('Speed (mph)') axes[0].set_ylabel('Number of scenarios') axes[0].set_title('Speed Distribution Across All Scenarios') axes[0].legend() # Acceleration distribution axes[1].hist(df['acceleration_mps2'], bins=40, color='#005fcc', alpha=0.8, edgecolor='white') axes[1].axvline(-5, color='#f59e0b', linestyle='--', linewidth=1.5, label='Hard braking threshold') axes[1].axvline(5, color='#f59e0b', linestyle='--', linewidth=1.5) axes[1].set_xlabel('Acceleration (m/s²) — negative = braking') axes[1].set_ylabel('Number of scenarios') axes[1].set_title('Acceleration Distribution') axes[1].legend() plt.tight_layout() plt.savefig('speed_distribution.png', dpi=150, bbox_inches='tight') plt.show() hard_braking = (df['acceleration_mps2'] < -5).sum() print(f"\nHard braking events: {hard_braking} ({hard_braking/len(df):.1%} of all scenarios)")
# Count how often each scenario type appears scenario_counts = df['scenario_type'].value_counts() scenario_pct = (scenario_counts / len(df) * 100).round(2) # Plot — this is the long tail made visible fig, ax = plt.subplots(figsize=(12, 5)) colors = [] for scenario in scenario_counts.index: pct = scenario_pct[scenario] if pct >= 10: colors.append('#0078ff') # Common — well represented elif pct >= 2: colors.append('#94afc8') # Moderate else: colors.append('#f59e0b') # Rare — the long tail bars = ax.barh(scenario_counts.index[::-1], scenario_counts.values[::-1], color=colors[::-1]) ax.set_xlabel('Number of scenarios in dataset') ax.set_title('Scenario Frequency Distribution — The Long Tail of Driving Data') # Add percentage labels for i, (count, pct) in enumerate(zip(scenario_counts.values[::-1], scenario_pct.values[::-1])): ax.text(count + 20, i, f'{pct:.1f}%', va='center', fontsize=10) # Legend patches = [ mpatches.Patch(color='#0078ff', label='Common (≥10%) — well trained'), mpatches.Patch(color='#94afc8', label='Moderate (2-10%)'), mpatches.Patch(color='#f59e0b', label='Rare (<2%) — the long tail') ] ax.legend(handles=patches, loc='lower right') plt.tight_layout() plt.savefig('long_tail.png', dpi=150, bbox_inches='tight') plt.show() # Summary statistics print("\n=== LONG TAIL SUMMARY ===") rare = scenario_pct[scenario_pct < 2] common = scenario_pct[scenario_pct >= 10] print(f"Common scenarios (>=10%): {len(common)} types, {common.sum():.1f}% of all data") print(f"Rare scenarios (<2%): {len(rare)} types, {rare.sum():.1f}% of all data") print(f"\nRarest scenario: '{rare.index[-1]}' appears only {rare.values[-1]:.2f}% of the time") print(f" → In 5,000 scenarios, that's only {int(rare.values[-1]/100*5000)} examples") print("\nThink: how well can a model learn to handle something it's seen only a handful of times during training?")
You have now seen the long tail in real data. Your final task is to write a one-page simulation brief — the kind of document an engineer would write before tasking the World Model to generate scenarios.
Submit the following to complete Lab 5. Your instructor will specify the exact submission format.