Chapter 5 · Deep Learning · Computer Vision · Autonomous Vehicles

How Machines Learn to See:
Deep Learning & the Waymo Driver

From a game of Go that shocked the world to a car that navigates San Francisco without a human — this chapter is the story of deep learning, told through the technology trying to replace the most dangerous thing most people do every day.

Company: Waymo (Alphabet) Industry: Autonomous Vehicles / Transportation Core concept: Deep learning, computer vision, simulation
Chapter 5 · Deep Learning Case Study

How Machines Learn to See: Deep Learning & the Waymo Driver

From a board game no computer was supposed to win to a car that navigates San Francisco on its own — a business student's guide to the technology that changed everything

Level: Upper-division undergraduate Topics: Deep learning, CNNs, reinforcement learning, simulation & synthetic data Concepts introduced: 16 key terms

Primary sources: This chapter draws on The Waymo Driver Handbook: Perception (Waymo Blog, 2021), The Waymo World Model (Waymo Blog, 2026), Waymo software recall / school buses (NPR, December 2025), Do Waymo vehicles need more driving etiquette? (NPR, December 2025), and publicly available research from Waymo Research.

Contents
Origin Story
1. AlphaGo and the Moment Everything Changed 2. What Deep Learning Actually Is
The AI Factory
3. Step 1 — Data: Sensors, Miles, and the Long-Tail Problem 4. Step 2 — Model: Teaching Machines to See 5. Step 3 — Prediction: A 3D Picture of the World 6. Step 4 — Decision: Learning to Drive 7. Step 5 — Value: A Public Health Breakthrough — or a Job Killer? 8. The AI Factory at Waymo 9. Deep Learning as Competitive Advantage
Implications
10. Beyond the Road: Deep Learning and Physical AI 11. Responsible AI: What Happens When the Car Is Wrong? 12. Waymo in the Wild: What Actually Goes Wrong 13. Summary Table & Discussion Questions

1 AlphaGo and the Moment Everything Changed

Background: DeepMind and the "Apollo Project" of AI

In 2010, a neuroscientist and chess prodigy named Demis Hassabis co-founded a small London AI lab called DeepMind with a mission that sounded almost absurdly ambitious: build general-purpose AI by having machines learn to master complex tasks without being explicitly programmed. The company was sometimes called the "Apollo project" of artificial intelligence — a moonshot with a clear destination but no guarantee of arrival. In 2014, Google acquired DeepMind for roughly $500 million, giving the lab the compute and resources to pursue that ambition at scale.

DeepMind's early strategy was to use games as a testing ground. Games have clear rules and measurable outcomes — perfect environments for an AI to learn from trial and error. Their first milestone was a system that learned to play dozens of classic Atari video games at superhuman level, starting from nothing but raw pixel input and a score. But Hassabis had a bigger target in mind: Go.

Why Go Was the Holy Grail

Go is a board game invented in China more than 2,500 years ago. It is played on a 19×19 grid: players take turns placing black or white stones, and the goal is to surround more territory than your opponent. The rules fit on one page. The game itself is incomprehensibly complex — there are more possible board positions in Go than atoms in the observable universe. Chess, by comparison, was a solved problem; computers had beaten the best human players since 1997. But Go was considered different in kind. Experts believed mastering it required human intuition, pattern recognition, and something like aesthetic judgment — a feel for the board that could not be reduced to calculation. Go was the holy grail of AI: the game that supposedly required a mind.

A Go board mid-game, with black and white stones arranged across the 19×19 grid
Figure: A mid-game Go board. The 19×19 grid contains more possible board positions than atoms in the observable universe. Unlike chess, Go cannot be mastered by brute-force calculation — it requires pattern recognition and strategic judgment at a scale that no human programmer can encode by hand. This made it the defining challenge for deep learning.
Why Go, not chess?
Chess computers work by evaluating positions — they look ahead many moves, score each resulting position, and pick the best path. That works because there are a manageable number of positions to evaluate. Go has roughly 10170 possible board states. No computer can look them all up. A Go-playing program has to develop something more like intuition — a feel for which positions are promising and which are not, without evaluating every possibility. That's exactly what deep learning provides: the ability to recognize patterns in complex, messy data without being given explicit rules about what to look for.

The Match: March 2016

In March 2016, DeepMind's AlphaGo faced Lee Sedol — then considered the world's best player — in a five-game match held in Seoul and broadcast globally. Over 200 million people watched. Most expected Lee Sedol to win; he had predicted a 5-0 sweep. AlphaGo won the first three games. Then came Game 2, and Move 37.

The Move That Changed Everything
Move 37 is the moment the room went quiet. Midway through Game 2, AlphaGo placed a stone on the fifth row near the right side of the board — a position so unusual that professional commentators assumed it was a mistake. Human Go players virtually never play in that area at that stage of a game; it violated conventional strategic wisdom accumulated over thousands of years of competitive play. The commentators went quiet. One analyst walked off the broadcast to compose himself. AlphaGo's own team looked confused. Lee Sedol stood up, left the room, and took 15 minutes to think. He never recovered. AlphaGo won that game, and Move 37 turned out to be the decisive turning point — a move no human player had conceived of, generated not by following rules but by learning patterns across millions of self-played games. DeepMind later calculated that a human player would choose Move 37 in that position roughly once in 10,000 tries. AlphaGo had not been told it was creative. It just was.

The final score was AlphaGo 4, Lee Sedol 1. Sedol's single win — Game 4 — came via a counterattack that exploited a surprising weakness in AlphaGo's play under pressure. That one loss was actually crucial data for Hassabis and his team: it revealed that AlphaGo, despite its brilliance, could be destabilized by sufficiently novel situations. The system that had stunned the world still had architectural vulnerabilities.

What the loss revealed
Lee Sedol's Game 4 victory wasn't a fluke. He found a move — later called "the divine move" by the Go community — that sent AlphaGo into a kind of confusion, making a string of weak plays it would normally never attempt. The lesson: AlphaGo had learned to play brilliantly within the distribution of situations it had seen, but an unexpected enough position could push it outside that distribution. This is a pattern that shows up throughout AI deployment — systems that perform at superhuman levels in normal conditions can behave erratically when pushed into territory they haven't encountered before. You'll see this problem again in Section 7 when we discuss Waymo's long-tail problem.

Why does any of this matter for a business course? Because AlphaGo was not a rules-based program. It did not have a lookup table of Go positions or a set of if-then instructions. It learned to play Go by playing millions of games against itself, using a technique called deep learning combined with reinforcement learning. Its ability to play Go was entirely learned from experience — nobody programmed it to be creative. DeepMind later built a successor called AlphaGo Zero that skipped human game data entirely and learned only by playing itself from scratch; it surpassed the version that beat Lee Sedol in three days. The architecture that made all of this possible — the deep neural network — is the same family of technology powering facial recognition, medical imaging, fraud detection, content moderation, recommendation systems, and, as the rest of this chapter will show, self-driving cars.

2 What Deep Learning Actually Is

AlphaGo's ability to beat the world's best Go player — and Waymo's ability to navigate a city street — rest on the same underlying technology: deep learning. But what actually is it? The word "deep" refers to the depth of a specific kind of structure: a neural network with many layers. To understand what that means, start from the beginning.

Key Concept
Neural networks
A neural network is a computational structure loosely inspired by how biological brains process information. It consists of layers of simple units called neurons or nodes. Each neuron looks at what it receives from the previous layer, does a small calculation, and passes a result forward to the next layer. The first layer receives raw input — say, the pixel values of a camera image. The last layer produces an output — "this is a stop sign" or "this is a pedestrian." The layers in between are called hidden layers — they are where the network gradually builds up its understanding of the problem. A deep neural network simply has many of these hidden layers, sometimes dozens or hundreds. The depth is what gives deep learning its power and its name.

Here is the key insight that makes neural networks useful: the way each layer processes information is learned from data, not set by a programmer. Before training, a neural network knows nothing — its starting point is essentially random. You feed it thousands or millions of examples (images labeled "stop sign" or "not stop sign," games of Go labeled "won" or "lost"), and the network gradually adjusts itself until it gets better at producing the right answer. Nobody programs the network to look for round shapes or red colors. It figures out what matters entirely on its own, from the examples.

Key Concept
How a neural network learns
Think of each connection inside a neural network as a dial that can be turned up or down. A large modern network has billions of these dials. Before training, they are all set randomly — the network is useless. Training is the process of repeatedly showing the network a labeled example, checking whether its answer was right, and nudging every relevant dial slightly in the direction of a better answer. Do this millions of times across millions of examples, and the dials gradually settle into a configuration where the network is reliably good at the task. The technical name for this dial-adjustment process is backpropagation, but the intuition is simple: when the network gets something wrong, the system traces back through the layers to figure out which dials were most responsible and adjusts them. What is remarkable is that nobody tells the network what to look for — the patterns it learns to recognize emerge entirely from the data.

This is the paradigm shift that separates deep learning from earlier AI. Before deep learning dominated, building an AI system typically meant writing explicit rules: "if the object is roughly circular and red, it might be a stop sign." Deep learning replaced that with: "show the system a million images of stop signs and let it figure out what matters." That shift — from hand-written rules to learning from examples — is why deep learning spread so fast and so far. It turns out that for vision, language, speech, and games, learning from examples dramatically outperforms any set of rules a human expert can write.

Deep neural network: input layer, hidden layers, output layer INPUT px px px px HIDDEN LAYER 1 HIDDEN LAYER 2 HIDDEN LAYER 3 OUTPUT class "Deep" = many hidden layers. Each layer learns increasingly abstract features from the layer before it.
Figure 1: A deep neural network. Raw inputs (e.g., pixel values from a camera) enter on the left. Each hidden layer transforms those values into increasingly abstract representations — edges, then shapes, then objects. The output layer produces the final classification. No human programs what each layer looks for; the network learns it from data during training.

Three things converged around 2012 to make deep learning explode from a research curiosity into the dominant force in AI. First, massive labeled datasets — the internet had generated millions of tagged images, text, audio, and video. Second, cheap parallel computing — graphics processing units (GPUs), originally built for video games, turned out to be ideal for the matrix math that neural networks require. Third, algorithmic improvements — researchers discovered training tricks that made deeper networks stable and practical. When those three things combined, neural networks went from an interesting-but-marginal technique to the approach that broke every record in image recognition, speech recognition, and game-playing — often by stunning margins.

The 2012 turning point
In the 2012 ImageNet competition — a benchmark where systems compete to correctly label images from a set of 1,000 categories — a deep neural network called AlexNet reduced the error rate by nearly half compared to the previous year's best non-deep-learning system. The gap was so large that within two years, every top competitor was using deep learning. That moment is now considered the unofficial start of the modern AI era. Waymo, which had been running as the Google Self-Driving Car project since 2009, was well-positioned to absorb this shift immediately.

3 Step 1 Data: Sensors, Miles, and the Long-Tail Problem

Every deep learning system starts with data — and for Waymo, data means two things happening simultaneously. There is the real-time stream: every vehicle on the road processes millions of sensor readings per second, from 29 cameras, multiple lidar units, and radar arrays. And there is the historical record: over 200 million miles of real-world driving, every moment of which produced labeled training examples that Waymo's models learned from. No competitor can buy that history. It was built mile by mile, city by city, starting in 2009.

But real-world miles have a fundamental problem, and it is the same problem that AlphaGo's successor solved by abandoning human game data entirely: the distribution of what actually happens is not the distribution you need to train on.

Key Concept
The long-tail problem in safety-critical AI
In any large dataset of real-world events, a small number of common scenarios account for most of the data, while an enormous number of rare scenarios together account for very little. This distribution — heavy at the common end, trailing off into uncommon events — is called a long tail. For autonomous driving, the long tail is disproportionately dangerous: the scenarios that appear least in training data are often the ones where a mistake is most catastrophic. Most of driving is boring — normal lanes, normal intersections, normal weather. The data that Waymo's cars generate in abundance is exactly the data it doesn't need most urgently. The data it needs most — near-misses, severe weather, animals in the road, wrong-way drivers — appears rarely or not at all. You could drive for 10,000 years and still not accumulate enough examples of a driver going the wrong way on a freeway to train a reliable response model.

Sensors: what the car actually sees

Before the data problem can be solved, the data has to be collected. Waymo uses three types of sensors in parallel, each capturing a different dimension of the world. Cameras produce rich visual images — color, texture, the text on a sign, the posture of a pedestrian. Lidar (Light Detection and Ranging) fires laser pulses in 360 degrees and measures how long they take to bounce back, producing precise 3D point clouds at distances up to 300 meters, regardless of lighting. Radar detects the velocity of objects even through fog or heavy rain when cameras and lidar struggle.

Why lidar matters
A camera, like a human eye, loses depth information — you can't tell from a photograph alone whether a pedestrian is 10 meters away or 30. Lidar gives you that depth precisely. Think of it as the camera giving you "what" and the lidar giving you "where." Waymo uses lidar as one of its core competitive advantages — systems that rely primarily on cameras face significantly harder perception challenges in poor lighting or adverse weather. This is the central disagreement between Waymo and Tesla, which argues that human drivers navigate with cameras alone, so cameras should be sufficient for machines too.

Simulation: generating the data you can't collect

The solution to the long-tail problem is simulation — generating synthetic training data for scenarios too rare, too dangerous, or simply impossible to observe at scale in the real world. Waymo runs more than 20 billion simulated miles per year, vastly more than any other autonomous vehicle developer. These are not video-game approximations; they are high-fidelity digital reproductions of real streets, with physically accurate sensor models, accurate weather, and realistic simulated agents.

Key Concept
Simulation and synthetic data
Synthetic data is training data generated by a computer rather than collected from the real world. For autonomous driving, a simulation environment can produce labeled training examples for any scenario engineers want: extreme weather, unusual objects in the road, dangerous driver behavior. The critical requirement is that the simulation be realistic enough that a model trained inside it actually performs well when deployed on real streets. If the simulation is too simple — if real streets look or behave differently from the simulation — the model learns to handle the simulation well but fails in reality. This gap is called the sim-to-real gap, and closing it is one of the hardest problems in building AI for physical environments.

The Waymo World Model: simulating what never happened

In February 2026, Waymo announced a step beyond traditional simulation: the Waymo World Model, built on Google DeepMind's Genie 3 foundation model. Earlier simulation systems were reconstructive — they rebuilt existing reality from sensor recordings. The World Model is generative — it can create scenes from scratch, based on a description or a language prompt. An engineer can type "heavy snow on the Golden Gate Bridge at night, with a cyclist approaching from the wrong direction" and get back a realistic multi-sensor simulation of exactly that scenario.

Key Concept
World models in AI
A world model is an AI system that has learned a deep enough understanding of how the world works — how objects move, how light changes, how physical events unfold — that it can generate realistic new scenarios it has never directly seen. The Waymo World Model does this for driving: it can create a realistic simulation of a tornado, a flooded street, or a wrong-way driver from scratch, because it has learned enough about how the physical world behaves to fill in those details convincingly. Engineers can also run counterfactuals — replay the same scenario with the car making different decisions, and evaluate which response was safest. That is something that can never be done with real-world data: you cannot rewind a near-miss and try again.
The simulation flywheel: real data feeds simulation, simulation generates training data, training data improves the model Real-world fleet 200M+ miles of data Waymo World Model Generates rare & extreme scenarios on demand (built on Genie 3) Synthetic training 20B+ simulated miles Synthetic data retrains and improves the perception models
Figure: The simulation flywheel. Real-world fleet data trains the World Model's understanding of driving physics. The World Model generates synthetic training data for rare and extreme scenarios. That data retrains Waymo's models — completing a loop that produces a safer car before it ever encounters dangerous situations on real roads.
The sim-to-real gap: still unsolved
No matter how realistic a simulation becomes, there is always a gap between the simulated world and the real one. A model trained on synthetic data may learn to rely on subtle visual cues that exist in the simulation but look slightly different in reality — and those differences can cause unexpected failures. The Waymo team argues that building the World Model on Genie 3 — trained on an enormous set of real-world video — gives the simulation a richer baseline. But "better" is not "solved." Deploying models trained on synthetic data still requires extensive real-world testing before they go into live vehicles.

4 Step 2 Model: Teaching Machines to See

The data — real and synthetic — feeds a stack of deep learning models whose job is to understand the world around the vehicle. The most fundamental of these is the perception model: the system that converts raw sensor readings into a structured understanding of what is present, where it is, and what it is likely to do next.

A human driver glances at an intersection and instantly understands: red light, two stopped cars, cyclist on the right, pedestrian about to step off the curb, wet road. That recognition is effortless — the product of decades of visual learning. For a computer, every inference has to be built from scratch. The camera produces a grid of pixel values. Nothing in that grid is labeled "cyclist." The model has to learn, from training data, what patterns correspond to which real-world objects — at what distances, in what lighting, from what angles, in what motion.

Convolutional neural networks: how machines learn to see

Key Concept
Convolutional neural networks (CNNs)
A CNN is a type of neural network designed specifically for images. Its defining feature is the convolutional layer: instead of connecting every neuron to every pixel (computationally impossible for large images), it applies small learned filters — called kernels — that slide across the image and detect local patterns. Early layers detect edges and gradients. Middle layers detect shapes and textures. Later layers detect whole objects. Nobody programs what each layer should look for — the network learns it from training data. For driving, CNNs trained on millions of labeled images can detect pedestrians, vehicles, traffic lights, lane markings, and hundreds of other categories in real time.
CNN hierarchy: pixels to edges to shapes to objects RAW INPUT Camera pixels LAYER 1 Edges & gradients LAYER 2 Shapes & textures OUTPUT Object classification 🚶 Pedestrian 🚗 Car 🚲 Cyclist
Figure: CNN hierarchy. Raw camera pixels enter on the left. Each layer learns to detect increasingly complex patterns — edges first, then shapes, then whole objects. The output classifies what the model sees. No human programs what each layer looks for; the network learns it from labeled training examples.

Sensor fusion: combining what each sensor knows

Key Concept
Sensor fusion
Sensor fusion is the process of combining data from multiple different sensors into a single, more reliable understanding of the environment than any single sensor could provide. At Waymo, cameras detect rich visual features but lack depth; lidar provides precise 3D geometry but no color; radar detects velocity through rain and fog. The deep learning system learns to combine all three — associating a camera detection of a pedestrian with the lidar return that confirms their 3D position, then using radar to track their velocity across frames. Each sensor's weakness is covered by another's strength.

Motion prediction: where is everything going?

Knowing where every object is right now is not enough — the planning system needs to know where everything will be in the next two, five, ten seconds. A dedicated motion prediction model takes the classified, located objects from the perception system and forecasts their likely future trajectories. Because the future is uncertain — a pedestrian at the curb might step into the road or might not — the model produces a probability distribution over possible futures, not a single prediction. The planning system then has to make decisions that are safe across the full range of what might happen next.

5 Step 3 Prediction: A 3D Picture of the World

The output of the model stack — the perception system's combined, fused understanding of everything around the vehicle — is what Waymo engineers call the world representation: a live, continuously updated 3D map of the environment that the planning system can act on.

Key Concept
Perception in autonomous driving
Perception is the process of converting raw sensor data — camera images, lidar point clouds, radar reflections — into a structured understanding of the environment. A complete perception output tells the planning system: what objects are present (car, pedestrian, cyclist, sign), where they are in 3D space, how big they are, how fast they are moving, in what direction, and where they are likely to go next. This is the prediction step in Waymo's factory: it is what the deep learning models produce, and it is what the decision system consumes. Perception is the hardest part of autonomous driving — it is the problem that separates a self-driving car from a car with GPS.

This representation is produced thousands of times per second. Every cycle, the system re-reads incoming sensor data, updates its understanding of each tracked object, generates fresh trajectory distributions, and hands a new world representation to the planner. The planner is making decisions — speed, lane, steering — based on information that is milliseconds old. The latency of this pipeline is itself a safety parameter: a system that takes too long to update its world representation is operating on stale information, which becomes increasingly dangerous at highway speeds.

6 Step 4 Decision: Learning to Drive

We opened this chapter with AlphaGo. Now we can close the loop.

AlphaGo did not learn Go by studying human moves. It learned by playing millions of games against itself, updating its strategy based on whether it won or lost. That approach — learning from rewards and penalties rather than labeled examples — is called reinforcement learning. Waymo uses the same approach for its planning system: not to identify objects (that is the perception system's job), but to learn how to drive.

Key Concept
Reinforcement learning (RL)
Reinforcement learning is a way of teaching an AI system through trial, error, and feedback — rather than by showing it labeled examples of correct answers. The system tries something, gets a signal saying whether the outcome was good or bad, and gradually adjusts its behavior toward the things that earned good signals. In AlphaGo's case, the feedback was simply whether the game was won or lost. The system played millions of games, learned which moves led to wins, and eventually developed strategies no human expert had discovered. Waymo uses the same approach for its planning system — the feedback signal is whether the resulting driving behavior was safe, smooth, and legal. The environment in which this learning happens is the World Model simulation, which provides the practice environment that makes it safe to try millions of decisions before any of them happen on a real road.
AlphaGo (2016)
Learning through self-play
Played millions of games against itself. Each game produced a win/loss signal. RL updated the policy to favor moves that led to wins. Result: strategies no human had discovered, beating the world's best player.
Waymo Driver (2020s)
Learning through simulation
Drives billions of simulated miles. Each scenario produces a safety/comfort signal. RL updates the policy to favor driving behaviors that are safe, smooth, and legal. Result: robust responses to rare and dangerous scenarios before encountering them on real roads.

The connection between AlphaGo and Waymo is not just analogical. The team that built AlphaGo — Google DeepMind — is the same team that built Genie 3, the foundation model underlying the Waymo World Model. The research lineage is direct: a technique developed to play an ancient board game is now being applied to one of the hardest engineering problems in the world. This is a common pattern in deep learning — fundamental research on games or language produces architectures that transfer to applied problems nobody originally intended.

The specification problem
Reinforcement learning optimizes for whatever outcome you tell it to optimize for. But specifying "drive safely and comfortably" as a clear, measurable objective is harder than it sounds. A system that minimizes travel time might cut corners on safety margins. A system that avoids all risk might stop in the middle of traffic rather than make a difficult merge. Early Waymo vehicles were famously polite — they always yielded, always waited. This created its own problems: overly deferential cars disrupt traffic flow, and other drivers learn to exploit a car that will always wait. In 2025, Waymo updated its vehicles to be "more confidently assertive" — which shortly resulted in a police officer pulling over a Waymo for an illegal U-turn. Defining good driving as a clear specification for a machine turns out to be genuinely hard.

7 Step 5 Value: A Public Health Breakthrough — or a Job Killer?

Every AI Factory eventually has to answer the same question: what is this actually worth, and to whom? For Waymo, the answer depends entirely on who you ask — and the gap between those answers is wide enough that people have started setting cars on fire.

The case for value: a trauma surgeon's view

Dr. Jonathan Slotkin is a trauma surgeon who has spent his career watching people die from car crashes. In December 2025, he wrote an op-ed in the New York Times after reviewing Waymo's 100-million-mile safety dataset. His conclusion: autonomous vehicles are not primarily a technology story. They are a public health breakthrough.

"How do we let the equivalent of one plane full of people — more than 100 lives — be lost every single day as the cost of driving? If this was a disease, we would have declared war." — Dr. Jonathan Slotkin, trauma surgeon

What the data showed him was a greater than 90% reduction in the most serious types of crashes — pedestrians struck, T-bone collisions at intersections — the injuries he sees most often in the trauma bay. Waymo's own data reports its vehicles are 3.5 times safer than human drivers in injury-producing crashes. More than 38,000 Americans die in car crashes every year. A technology 3.5 times safer than human drivers, deployed at scale, would prevent tens of thousands of deaths annually — more than most medical breakthroughs ever achieve.

→ NPR: Why one trauma doctor sees self-driving cars as a public health breakthrough (December 2025)

The case against: whose value, exactly?

The people most affected by Waymo's expansion are not reading trauma surgery statistics. They are professional drivers — rideshare drivers, taxi drivers, delivery workers — who are watching their livelihoods be automated away in real time. There are approximately 4.4 million professional driving jobs in the United States. A Pew Research Center survey found that 85% of Americans believe the rollout of driverless cars will lead to job losses. They are not wrong.

Organized labor has responded. Uber and Lyft drivers rallied in San Francisco demanding regulations on autonomous vehicles. In Seattle, chants of "Waymo? Hell no!" echoed outside a building where Waymo lobbyists were hosting a private party. Boston's city council held a four-hour hearing and passed legislation effectively banning driverless vehicles without a human present. New York's Governor Hochul withdrew a robotaxi proposal entirely after taxi driver opposition — her spokesperson cited "insufficient stakeholder support," which translated roughly to: the Teamsters have votes. In London, the App Drivers and Couriers Union declared a "state of emergency" as Waymo prepared to launch there, warning that up to 100,000 licensed private hire drivers could face displacement.

The argument that doesn't get resolved
Waymo's defenders point out that the same argument — "this technology will destroy jobs" — was made about elevators, ATMs, automatic looms, and self-checkout machines. In each case, the economy eventually created new jobs elsewhere, even if not for the same workers. Waymo's critics point out that "eventually" is not a plan, and that the workers displaced are real people with real families right now, not economic abstractions. Both of these things are true simultaneously. This is not a debate with a clean answer — it is the central tension of AI at scale, and it will recur in every chapter that follows.

When the tension turns physical

In February 2024, a Waymo in San Francisco's Chinatown was surrounded by a crowd during Lunar New Year celebrations. Someone threw a lit firework inside. The car burned. In June 2025, during protests against ICE immigration raids in downtown Los Angeles, protesters summoned Waymo vehicles using the app, then smashed their windows, spray-painted anti-ICE slogans, and set five cars on fire. Witnesses reported that protesters called them "spy cars" — a reference to the fact that Waymo vehicles collect data that can be shared with law enforcement.

There is something particularly revealing about how the fires happened: the cars were programmed not to hit pedestrians. Surrounded by a crowd with no escape route, they could not move. They were, as one observer put it, "sitting ducks." The cars' core safety feature — their absolute refusal to endanger pedestrians — made them defenseless.

→ TIME: Why Waymos Have Been Vandalized by Protesters (June 2025)

The value of any technology is not just what it produces — it is who captures that value and who bears the cost. Waymo's deep learning stack may save tens of thousands of lives. That is real value. But if that value flows to Alphabet shareholders while the cost falls on millions of workers with no transition plan, the political response will reflect that imbalance. The cars on fire in Los Angeles are not a technology story. They are an economics story.

8 The AI Factory at Waymo

We have now seen the full stack: sensor data feeding deep learning models, models producing a real-time prediction of the world, a planning system using reinforcement learning to decide how to drive, and value that is real but deeply contested. The AI Factory framework gives us a way to see how those pieces connect as a business system — the same lens we applied to EveryCure, Netflix, Spotify, and Uber.

200M+
Autonomous miles driven
20B+
Simulated miles
450K+
Weekly paid rides (2025)
3.5×
Safer than human drivers (injury crashes)
29
Cameras per vehicle
Step What happens Deep learning's role
DataCameras, lidar, and radar produce millions of data points per second; 200M+ real-world miles and 20B+ simulated miles form the training setRaw sensor data is the input to every model; historical and synthetic miles are what the models learned from
ModelCNNs classify and locate objects in camera images; sensor fusion combines lidar and radar; motion prediction models forecast where each object will goCNNs learned from millions of labeled examples; sensor fusion learned to combine modalities; all models retrained on synthetic data from the World Model
PredictionA live 3D world representation: every object classified, located, sized, and given a probability distribution over its future trajectoriesThe output of the deep learning perception stack — what the planning system receives as input, updated thousands of times per second
DecisionThe planning system chooses speed, lane position, and steering — decisions that must be safe, comfortable, and legal simultaneouslyIncreasingly learned via reinforcement learning in the World Model simulation, not hand-coded rules
Value3.5× safer than human drivers; potential to eliminate car crashes as a leading cause of US death — but contested by workers facing displacement and communities questioning who captures the benefitDeep learning is what makes the entire value proposition possible — and also what makes the scale of disruption possible

9 Deep Learning as Competitive Advantage

Waymo has been public about its technology and has published extensive research. Its architectures are not secret. So where does the competitive advantage actually come from?

  • Data at scale, over time. Waymo has been collecting real-world driving data since 2009. Every near-miss, every rare weather event, every unusual pedestrian in that 200-million-mile dataset is a training signal competitors cannot buy.
  • The World Model as a data-generation engine. A competitor who has driven fewer real miles starts with a weaker foundation for their world model, which produces lower-quality synthetic data, which produces a less capable model. The compounding is structural.
  • Hardware-software co-design. Waymo designs its own sensors, chips, and models — all optimized to work together. This kind of vertical integration is hard to replicate quickly.
  • Safety as trust. The 3.5× safety record is a commercial and regulatory asset. The deep learning quality underlying it is the primary input — but trust, once lost, is very hard to rebuild.
"The companies best positioned to benefit from deep learning are often those that already invested in earlier generations of ML — because the infrastructure, the data pipelines, the culture of experimentation, and the institutional knowledge about what works compound over time."

10 Beyond the Road: Deep Learning and the Future of Physical AI

Waymo is one application of a broader shift that researchers and engineers sometimes call physical AI — deep learning systems that don't just process text or images on a screen, but perceive the real world and act in it. The same stack described in this chapter — sensor-based perception, motion prediction, and behavior learned from data — is now being applied to humanoid robots, surgical assistants, warehouse automation, and home robotics. Understanding where Waymo fits in that larger picture helps clarify both the opportunity and the risk.

Key Concept
Imitation learning
A technique in which a robot or AI system learns a behavior by observing a human (or another agent) perform it, then generalizes that behavior to its own body and context. Unlike reinforcement learning — where the agent learns by trial and error in an environment — imitation learning starts from demonstrated examples. The challenge is transfer: the robot's body, sensors, and physical constraints are different from the human's, so it can't simply copy movements. It has to abstract the underlying goal from the demonstration and re-derive how to achieve it with its own capabilities.

In April 2026, NPR reported on research published in Science Robotics by Swiss scientists who demonstrated a significant step forward in this direction. Their system allowed robots to watch a human perform a task — picking up a ball and tossing it into a container — and then reproduce that task while automatically compensating for the robot's own physical differences. Crucially, the robots could also self-correct, and transfer learned skills to other robots with different designs.

The tennis metaphor
Researcher Sthithpragya Gupta describes the core challenge this way: an instructor can demonstrate a tennis backhand, and a human student will eventually learn it. But when conditions change — the opponent moves, the light shifts — a human can adapt fluidly. Robots, trained on a fixed demonstration, historically could not. The new approach uses machine learning to let robots adjust their movements based on their own physical capabilities, not just copy a recorded motion. This is the same structural problem Waymo faces with the long tail: a system trained on what it has seen struggles precisely when reality diverges from that training distribution.

The parallel to Waymo is direct. Waymo's World Model extends the car's experience through simulation, allowing it to encounter scenarios it hasn't seen in the real world. Imitation learning for robots attempts something similar: instead of programming explicit movements for every task, you let the robot derive behavior from observation and generalize it. Both approaches are trying to solve the same fundamental problem — how do you build a system that handles the full range of situations the world will throw at it, when you can never enumerate every situation in advance?

New capabilities, new risks
The same NPR report notes that self-improving robots raise immediate questions for AI safety researchers. If a robot can adjust and improve its own behavior autonomously, what constraints ensure it won't be directed — or drift — toward harmful actions? The researchers included safety protocols designed to prevent robots from hurting people, and they acknowledge the risk openly. As AI philosopher Susan Schneider notes, the absence of consciousness doesn't resolve the safety question — it may actually sharpen it. A system that has no inner experience also has no values, no reluctance, and no instinct to refuse. The constraints have to be entirely external, built into the system by design. This is not a hypothetical for the distant future; it is an engineering and governance problem being worked on right now.

Read the full NPR story →

For business students, the takeaway is not that robots are about to replace everything — the researchers themselves estimate we are years away from reliable home robots. The takeaway is that the deep learning capabilities described throughout this chapter are not confined to self-driving cars. They are a general-purpose technology that is moving, systematically, into any domain where a machine needs to perceive the physical world and act in it. The companies and managers who understand the underlying logic — what this technology can do, where it fails, and what governance it requires — will be better positioned to evaluate these developments as they arrive.

11 Responsible AI: What Happens When the Car Is Wrong?

The competitive advantages above are real — but they are built on a system that, like every deep learning system, makes mistakes. The safety record Waymo cites is genuine and impressive, but it sits alongside a growing public record of specific failures that tell a more complicated story. The question for this section is not whether Waymo's models ever misclassify an object or make a wrong decision — they do. The question is: what happens when they do, who is responsible, and what does that mean for the companies and managers deploying AI in physical environments?

The opacity problem

Deep neural networks are black boxes. A CNN that classifies a stop sign as a speed-limit sign because of an unusual shadow can rarely tell you why it made that mistake. The internal representations learned during training are not human-interpretable — they are distributed across billions of parameters in ways that resist simple explanation. This creates a real challenge for safety validation: how do you certify that a system is safe when you can't fully explain why it makes the decisions it does?

Waymo's approach is to rely on empirical safety records rather than theoretical guarantees. Instead of proving that the system will always behave correctly, they demonstrate that it has behaved correctly across hundreds of millions of miles and tens of billions of simulated miles, with a resulting safety record better than human drivers. This is a pragmatic answer — but it is not the same as understanding the system, and it does not predict how the system will behave in genuinely novel situations it has never encountered before.

The long-tail failure mode

Deep learning systems learn from the examples they were trained on. By construction, they are less reliable on situations they have never encountered. The Waymo World Model is a direct response to this problem — it extends what the car has "seen" by generating synthetic examples of rare scenarios. But a generative model can only simulate what its own experience has prepared it to imagine. A truly novel event — something that has never appeared in any video the model was trained on — could still produce a failure. This is not a hypothetical concern; it is the defining hard problem of deploying AI in open-ended physical environments, and it is not fully solved.

Liability and accountability

When a Waymo vehicle is involved in a collision, the liability questions are genuinely new. There is no human driver to blame. Waymo, the manufacturer, and potentially the software engineers whose models made the decision all enter the legal picture in ways that existing law was not designed to address. Several U.S. states have passed autonomous vehicle legislation, but the legal framework is still evolving. For business students, this is not an abstract legal question — it is a material risk for any company deploying AI in physical, safety-critical environments.

When the algorithm meets the real world: two recent cases

Abstract discussions of AI risk become concrete quickly when you look at what has actually happened with Waymo on real streets. Two stories from December 2025 illustrate the gap between a model that works well on average and a system that is ready for everything.

Case 1: The school bus recall (NPR, December 2025)
In fall 2025, multiple Waymo vehicles were caught on video driving around stopped school buses — a serious safety failure, since children may be crossing the road when a school bus has its stop arm deployed. The Austin Independent School District documented 19 such incidents with Waymo vehicles; in one case, a Waymo drove past only moments after a student had crossed in front of it, while the student was still in the road. Waymo filed a voluntary software recall with the National Highway Traffic Safety Administration. No injuries occurred — but the episode is a direct illustration of the long-tail problem in practice. A stopped school bus with a crossing arm is a relatively rare scenario. The model had not been adequately calibrated for it, and the gap only became visible once the fleet accumulated enough miles in cities where school buses are common.

Read the full NPR story →
Case 2: Teaching a car to drive like a human — the assertiveness update (NPR, December 2025)
Early Waymo vehicles were famously polite — they always yielded at four-way stops, always waited for a gap rather than asserting themselves in traffic. This created its own problems: overly deferential vehicles disrupt traffic flow, and other drivers learn to exploit an autonomous car that will always wait. In 2025, Waymo deliberately updated its vehicles to be "more confidently assertive." The result: vehicles were observed not signaling before lane changes and making what turned out to be an illegal U-turn in San Bruno — a police officer on patrol actually pulled the Waymo over. With no driver present, the window rolled down and a remote operator came on the speaker, apologized, and promised to look into it. Waymo's senior director of product management acknowledged that overly passive cars are themselves disruptive to traffic. This is a genuine design dilemma with no clean answer: a car that follows every rule precisely may be safer by one metric and worse by another. Defining "good driving" as a clear specification for an AI system turns out to be harder than it looks.

Read the full NPR story →

Together these two cases illustrate a theme that runs through every responsible AI discussion in this course: the gap between a system that performs well on average and one that is trustworthy across the full range of situations it will actually encounter. The school bus case is about the long tail — a rare scenario the model wasn't ready for. The assertiveness case is about specification — what does "correct behavior" even mean for a machine driving in human traffic? Both are business problems as much as technical ones. They require ongoing human judgment, public accountability, and a willingness to issue a recall when the model falls short.

When the goal and the values aren't the same thing
Reinforcement learning optimizes for whatever outcome you tell it to optimize for. But specifying "drive safely and comfortably" as a clear, measurable objective is harder than it sounds. A system that minimizes travel time might cut corners on safety margins. A system that avoids all risk might stop in the middle of traffic rather than make a difficult merge. The gap between the outcome you measured in testing and the values you actually care about in the full range of real-world situations is one of the deepest challenges in deploying AI for physical decisions. It requires ongoing human judgment — not just at the design stage, but throughout the life of the system.

12 Waymo in the Wild: What Actually Goes Wrong

The NPR stories above were serious. But Waymo's public record also includes a growing collection of incidents that are, depending on your perspective, funny, alarming, or both — and all of them raise genuine questions about what it means to put an AI system in a context it wasn't fully designed for. These are not cherry-picked failures; they represent a real category of problem that engineers working on physical AI systems face constantly: the world is stranger than your training data.

🔄
"Why is this thing going in circles?"
In late 2024, a passenger named Mike Johns hailed a Waymo for a ride to the airport. The car began looping a parking lot repeatedly. Johns called Waymo support from the back seat: "Why is this thing going in a circle? I'm getting dizzy. Has this been hacked? What's going on? I feel like I'm in the movies." He had a plane to catch. The incident — captured on video and shared on LinkedIn — went nationally viral. Waymo identified a software glitch. No injuries, significant embarrassment.
📌 The lesson: A bug that is merely inconvenient in a phone app can strand a person with a flight to catch. Physical AI failures have physical consequences, even when nobody gets hurt.
📣
A parking lot full of Waymos honking at each other all night
In August 2024, a San Francisco resident named Randol White noticed his neighborhood was filling with the sound of car horns — day and night. The source: a Waymo staging lot nearby, where a fleet of autonomous vehicles had developed a software glitch that caused them to confuse each other and begin honking. They continued until Waymo patched the bug. "I could not be more cranky today," said a neighbor. Waymo acknowledged the issue and said it was working on a fix.
📌 The lesson: Autonomous systems interact with each other in ways that weren't anticipated. A behavior that is correct for a single vehicle (honk when confused) becomes a problem at fleet scale.
🚓
Pulled over near a DUI checkpoint — no one to ticket
In September 2024, a Phoenix police officer on patrol noticed a Waymo making an illegal U-turn near a DUI checkpoint. The officer activated lights and pulled the car over. There was no driver. The window rolled down, a remote Waymo operator came on the speaker, and the operator apologized. Under California (and Arizona) law at the time, there was no one to ticket. The incident was widely shared, including by Elon Musk, who called it "straight out of the Silicon Valley show."
📌 The lesson: Existing legal and enforcement infrastructure assumes a human driver is responsible. Autonomous vehicles create genuine gaps in accountability that laws haven't caught up to yet.
🪄
A man in the trunk
In 2025, a woman in Los Angeles summoned a Waymo for her daughter. When the car arrived, she discovered a man had hidden himself in the trunk. The encounter — filmed and posted to TikTok — shows the woman confronting him ("Why are you in the trunk?") while the man claims he is stuck. Police were called. Waymo called it "unacceptable" and said it would review its protocols. The incident raised a question the perception engineers almost certainly never modeled: what happens when a person hides in a compartment the car's sensors don't monitor?
📌 The lesson: AI systems are designed around anticipated use cases. When reality produces scenarios outside that design envelope — not just rare driving events, but human behavior that wasn't considered — the system has no answer.

These incidents have something important in common: none of them involved a failure of the core deep learning perception system. The car wasn't confused about whether the parking lot wall was a pedestrian. The honking wasn't caused by a misidentified object. The man in the trunk wasn't a sensor failure. They were failures of system design — the broader set of decisions about how an AI-powered product behaves in the full messiness of the real world. Deep learning makes the car able to drive. It does not automatically make the whole system ready for everything.

Growing pains or deeper signal?
It is worth noting that Waymo's overall safety record — 91% fewer serious-injury crashes than human drivers — remains strong despite these incidents. The question for a business student is not whether these problems disqualify the technology, but what they reveal about the gap between "the model works" and "the product is ready." Every one of these failures is a product design problem as much as an AI problem. And all of them became public, went viral, and shaped public trust — which is ultimately what determines whether the technology gets to keep operating.

13 Summary Table & Discussion Questions

The AI Factory model: Waymo mapped

Step Real-world driving Simulation & training
Data Cameras, lidar, radar producing millions of data points per second per vehicle Real fleet miles + Waymo World Model generating synthetic scenarios
Model Vision models classify and locate every object; motion prediction models anticipate where each will go next Deep learning models retrained on synthetic + real data; driving behavior learned through trial-and-error in simulation
Prediction 3D environmental map: objects classified, located, with predicted trajectories Safety evaluation: did the model handle the simulated scenario correctly?
Decision Planning system selects speed, lane, steering — in real time, dozens of times per second Policy update: reinforce behaviors that led to safe, comfortable outcomes
Value Passenger arrives safely; 3.5× safer than human drivers in injury crashes Continuously improving model that handles more scenarios, more reliably

Key vocabulary introduced in this chapter

Neural network
A computational structure of layered mathematical units that learns to transform inputs into outputs by adjusting its parameters based on training examples
Deep learning
Machine learning using neural networks with many layers — "deep" refers to the number of hidden layers, each building more abstract representations from the one before
Training
The process of improving a neural network by repeatedly showing it labeled examples and adjusting its internal settings — billions of tiny dials — in the direction of the correct answer, until the network reliably gets things right
Convolutional neural network (CNN)
A neural network architecture specialized for images, using learned filters that detect local patterns and build up to full object recognition layer by layer
Perception
The process of converting raw sensor data into a structured understanding of the environment — what objects are present, where they are, and how they are moving
Sensor fusion
Combining data from multiple sensors (camera, lidar, radar) so each sensor's weakness is covered by another's strength
Motion prediction
A deep learning model that forecasts the likely future trajectories of every object in the environment, producing a probability distribution over possible futures
Reinforcement learning (RL)
A learning approach where an agent takes actions in an environment and learns a policy that maximizes cumulative reward — used by AlphaGo for game moves and by Waymo for driving behavior
The long-tail problem
In safety-critical AI, the rare scenarios that appear least in training data are often the most dangerous — and deep learning systems are least reliable precisely where reliability matters most
Synthetic data
Training data generated by a computer simulation rather than collected from the real world — essential for teaching autonomous vehicles how to handle scenarios that are too rare or dangerous to observe at scale
Sim-to-real transfer
The degree to which a model trained on synthetic data actually performs well when deployed in the real world — closing this gap is one of the central challenges of simulation-based training
World model
An AI system that has learned an internal representation of how the world works and can generate realistic new scenarios — the Waymo World Model produces driving simulations of events that never occurred
Foundation model
A large model pre-trained on vast, general data that can be adapted (fine-tuned) for specific tasks — Genie 3 is a foundation model that Waymo adapted for autonomous driving simulation
Transfer learning
Applying knowledge learned in one domain (Genie 3 trained on general video) to a different but related domain (Waymo's driving simulation) — one of deep learning's most powerful properties
Counterfactual simulation
Replaying the same scenario with different decisions to evaluate which action is safest — possible only in simulation, not in real-world data
Hardware-software co-design
Designing sensors, chips, and AI models together so each is optimized for the others — a source of competitive advantage that is difficult to replicate

Discussion questions

These work well as written assignments or in-class discussion prompts.

  1. What AlphaGo actually proved. AlphaGo's victory over Lee Sedol was widely reported as a milestone for AI. But what exactly did it demonstrate — and what did it not demonstrate? A Go champion is not a self-driving car, and a self-driving car is not a general-purpose AI. What are the limits of the analogy, and what should a business professional take away from the 2016 match?
  2. The rules vs. learning tradeoff. Earlier AI systems for driving were built around explicit rules: "if the light is red, stop." Deep learning replaces rule-writing with learning from data. What does that tradeoff mean in practice for a safety-critical system? When would you want explicit rules, and when would you want a learned model?
  3. Who is responsible when the car is wrong? When a human driver causes a crash, the accountability framework is established law. When a Waymo vehicle is involved in a collision, the liability picture is genuinely unclear. How should responsibility be distributed between the vehicle manufacturer, the software team, the company deploying the service, and the regulator who permitted it? Does your answer change if the crash involved a long-tail scenario the model had never seen?
  4. The data moat question. Waymo has 200 million real-world miles and a structural advantage in training data. A new entrant has neither. Is the data moat in autonomous driving permanent, or can a well-funded competitor close the gap? What would it take?
  5. The school bus and the U-turn. In December 2025, Waymo issued a software recall after its vehicles repeatedly drove past stopped school buses, and separately updated its driving style to be "more assertive" — resulting in a police officer pulling over a driverless car for an illegal U-turn. What do these two incidents reveal about the gap between "the model works" and "the product is ready"?
  6. Who captures the value? A trauma surgeon reviewed Waymo's safety data and concluded that autonomous vehicles could eliminate car crashes as a leading cause of death in the United States. Rideshare drivers and taxi unions are calling it an existential threat to their livelihoods. Both are right. How should society think about a technology that saves tens of thousands of lives but displaces millions of workers? Who should decide — and how?

MIS 432 · AI in Business · Case Study · For classroom discussion purposes.

MIS 432 · AI in Business · Western Washington University · College of Business and Economics

Home · Contact · MIS @ WWU · Free to use for educational purposes