From a game of Go that shocked the world to a car that navigates San Francisco without a human — this chapter is the story of deep learning, told through the technology trying to replace the most dangerous thing most people do every day.
From a board game no computer was supposed to win to a car that navigates San Francisco on its own — a business student's guide to the technology that changed everything
Primary sources: This chapter draws on The Waymo Driver Handbook: Perception (Waymo Blog, 2021), The Waymo World Model (Waymo Blog, 2026), Waymo software recall / school buses (NPR, December 2025), Do Waymo vehicles need more driving etiquette? (NPR, December 2025), and publicly available research from Waymo Research.
In 2010, a neuroscientist and chess prodigy named Demis Hassabis co-founded a small London AI lab called DeepMind with a mission that sounded almost absurdly ambitious: build general-purpose AI by having machines learn to master complex tasks without being explicitly programmed. The company was sometimes called the "Apollo project" of artificial intelligence — a moonshot with a clear destination but no guarantee of arrival. In 2014, Google acquired DeepMind for roughly $500 million, giving the lab the compute and resources to pursue that ambition at scale.
DeepMind's early strategy was to use games as a testing ground. Games have clear rules and measurable outcomes — perfect environments for an AI to learn from trial and error. Their first milestone was a system that learned to play dozens of classic Atari video games at superhuman level, starting from nothing but raw pixel input and a score. But Hassabis had a bigger target in mind: Go.
Go is a board game invented in China more than 2,500 years ago. It is played on a 19×19 grid: players take turns placing black or white stones, and the goal is to surround more territory than your opponent. The rules fit on one page. The game itself is incomprehensibly complex — there are more possible board positions in Go than atoms in the observable universe. Chess, by comparison, was a solved problem; computers had beaten the best human players since 1997. But Go was considered different in kind. Experts believed mastering it required human intuition, pattern recognition, and something like aesthetic judgment — a feel for the board that could not be reduced to calculation. Go was the holy grail of AI: the game that supposedly required a mind.
In March 2016, DeepMind's AlphaGo faced Lee Sedol — then considered the world's best player — in a five-game match held in Seoul and broadcast globally. Over 200 million people watched. Most expected Lee Sedol to win; he had predicted a 5-0 sweep. AlphaGo won the first three games. Then came Game 2, and Move 37.
The final score was AlphaGo 4, Lee Sedol 1. Sedol's single win — Game 4 — came via a counterattack that exploited a surprising weakness in AlphaGo's play under pressure. That one loss was actually crucial data for Hassabis and his team: it revealed that AlphaGo, despite its brilliance, could be destabilized by sufficiently novel situations. The system that had stunned the world still had architectural vulnerabilities.
Why does any of this matter for a business course? Because AlphaGo was not a rules-based program. It did not have a lookup table of Go positions or a set of if-then instructions. It learned to play Go by playing millions of games against itself, using a technique called deep learning combined with reinforcement learning. Its ability to play Go was entirely learned from experience — nobody programmed it to be creative. DeepMind later built a successor called AlphaGo Zero that skipped human game data entirely and learned only by playing itself from scratch; it surpassed the version that beat Lee Sedol in three days. The architecture that made all of this possible — the deep neural network — is the same family of technology powering facial recognition, medical imaging, fraud detection, content moderation, recommendation systems, and, as the rest of this chapter will show, self-driving cars.
AlphaGo's ability to beat the world's best Go player — and Waymo's ability to navigate a city street — rest on the same underlying technology: deep learning. But what actually is it? The word "deep" refers to the depth of a specific kind of structure: a neural network with many layers. To understand what that means, start from the beginning.
Here is the key insight that makes neural networks useful: the way each layer processes information is learned from data, not set by a programmer. Before training, a neural network knows nothing — its starting point is essentially random. You feed it thousands or millions of examples (images labeled "stop sign" or "not stop sign," games of Go labeled "won" or "lost"), and the network gradually adjusts itself until it gets better at producing the right answer. Nobody programs the network to look for round shapes or red colors. It figures out what matters entirely on its own, from the examples.
This is the paradigm shift that separates deep learning from earlier AI. Before deep learning dominated, building an AI system typically meant writing explicit rules: "if the object is roughly circular and red, it might be a stop sign." Deep learning replaced that with: "show the system a million images of stop signs and let it figure out what matters." That shift — from hand-written rules to learning from examples — is why deep learning spread so fast and so far. It turns out that for vision, language, speech, and games, learning from examples dramatically outperforms any set of rules a human expert can write.
Three things converged around 2012 to make deep learning explode from a research curiosity into the dominant force in AI. First, massive labeled datasets — the internet had generated millions of tagged images, text, audio, and video. Second, cheap parallel computing — graphics processing units (GPUs), originally built for video games, turned out to be ideal for the matrix math that neural networks require. Third, algorithmic improvements — researchers discovered training tricks that made deeper networks stable and practical. When those three things combined, neural networks went from an interesting-but-marginal technique to the approach that broke every record in image recognition, speech recognition, and game-playing — often by stunning margins.
Every deep learning system starts with data — and for Waymo, data means two things happening simultaneously. There is the real-time stream: every vehicle on the road processes millions of sensor readings per second, from 29 cameras, multiple lidar units, and radar arrays. And there is the historical record: over 200 million miles of real-world driving, every moment of which produced labeled training examples that Waymo's models learned from. No competitor can buy that history. It was built mile by mile, city by city, starting in 2009.
But real-world miles have a fundamental problem, and it is the same problem that AlphaGo's successor solved by abandoning human game data entirely: the distribution of what actually happens is not the distribution you need to train on.
Before the data problem can be solved, the data has to be collected. Waymo uses three types of sensors in parallel, each capturing a different dimension of the world. Cameras produce rich visual images — color, texture, the text on a sign, the posture of a pedestrian. Lidar (Light Detection and Ranging) fires laser pulses in 360 degrees and measures how long they take to bounce back, producing precise 3D point clouds at distances up to 300 meters, regardless of lighting. Radar detects the velocity of objects even through fog or heavy rain when cameras and lidar struggle.
The solution to the long-tail problem is simulation — generating synthetic training data for scenarios too rare, too dangerous, or simply impossible to observe at scale in the real world. Waymo runs more than 20 billion simulated miles per year, vastly more than any other autonomous vehicle developer. These are not video-game approximations; they are high-fidelity digital reproductions of real streets, with physically accurate sensor models, accurate weather, and realistic simulated agents.
In February 2026, Waymo announced a step beyond traditional simulation: the Waymo World Model, built on Google DeepMind's Genie 3 foundation model. Earlier simulation systems were reconstructive — they rebuilt existing reality from sensor recordings. The World Model is generative — it can create scenes from scratch, based on a description or a language prompt. An engineer can type "heavy snow on the Golden Gate Bridge at night, with a cyclist approaching from the wrong direction" and get back a realistic multi-sensor simulation of exactly that scenario.
The data — real and synthetic — feeds a stack of deep learning models whose job is to understand the world around the vehicle. The most fundamental of these is the perception model: the system that converts raw sensor readings into a structured understanding of what is present, where it is, and what it is likely to do next.
A human driver glances at an intersection and instantly understands: red light, two stopped cars, cyclist on the right, pedestrian about to step off the curb, wet road. That recognition is effortless — the product of decades of visual learning. For a computer, every inference has to be built from scratch. The camera produces a grid of pixel values. Nothing in that grid is labeled "cyclist." The model has to learn, from training data, what patterns correspond to which real-world objects — at what distances, in what lighting, from what angles, in what motion.
Knowing where every object is right now is not enough — the planning system needs to know where everything will be in the next two, five, ten seconds. A dedicated motion prediction model takes the classified, located objects from the perception system and forecasts their likely future trajectories. Because the future is uncertain — a pedestrian at the curb might step into the road or might not — the model produces a probability distribution over possible futures, not a single prediction. The planning system then has to make decisions that are safe across the full range of what might happen next.
The output of the model stack — the perception system's combined, fused understanding of everything around the vehicle — is what Waymo engineers call the world representation: a live, continuously updated 3D map of the environment that the planning system can act on.
This representation is produced thousands of times per second. Every cycle, the system re-reads incoming sensor data, updates its understanding of each tracked object, generates fresh trajectory distributions, and hands a new world representation to the planner. The planner is making decisions — speed, lane, steering — based on information that is milliseconds old. The latency of this pipeline is itself a safety parameter: a system that takes too long to update its world representation is operating on stale information, which becomes increasingly dangerous at highway speeds.
We opened this chapter with AlphaGo. Now we can close the loop.
AlphaGo did not learn Go by studying human moves. It learned by playing millions of games against itself, updating its strategy based on whether it won or lost. That approach — learning from rewards and penalties rather than labeled examples — is called reinforcement learning. Waymo uses the same approach for its planning system: not to identify objects (that is the perception system's job), but to learn how to drive.
The connection between AlphaGo and Waymo is not just analogical. The team that built AlphaGo — Google DeepMind — is the same team that built Genie 3, the foundation model underlying the Waymo World Model. The research lineage is direct: a technique developed to play an ancient board game is now being applied to one of the hardest engineering problems in the world. This is a common pattern in deep learning — fundamental research on games or language produces architectures that transfer to applied problems nobody originally intended.
Every AI Factory eventually has to answer the same question: what is this actually worth, and to whom? For Waymo, the answer depends entirely on who you ask — and the gap between those answers is wide enough that people have started setting cars on fire.
Dr. Jonathan Slotkin is a trauma surgeon who has spent his career watching people die from car crashes. In December 2025, he wrote an op-ed in the New York Times after reviewing Waymo's 100-million-mile safety dataset. His conclusion: autonomous vehicles are not primarily a technology story. They are a public health breakthrough.
What the data showed him was a greater than 90% reduction in the most serious types of crashes — pedestrians struck, T-bone collisions at intersections — the injuries he sees most often in the trauma bay. Waymo's own data reports its vehicles are 3.5 times safer than human drivers in injury-producing crashes. More than 38,000 Americans die in car crashes every year. A technology 3.5 times safer than human drivers, deployed at scale, would prevent tens of thousands of deaths annually — more than most medical breakthroughs ever achieve.
→ NPR: Why one trauma doctor sees self-driving cars as a public health breakthrough (December 2025)
The people most affected by Waymo's expansion are not reading trauma surgery statistics. They are professional drivers — rideshare drivers, taxi drivers, delivery workers — who are watching their livelihoods be automated away in real time. There are approximately 4.4 million professional driving jobs in the United States. A Pew Research Center survey found that 85% of Americans believe the rollout of driverless cars will lead to job losses. They are not wrong.
Organized labor has responded. Uber and Lyft drivers rallied in San Francisco demanding regulations on autonomous vehicles. In Seattle, chants of "Waymo? Hell no!" echoed outside a building where Waymo lobbyists were hosting a private party. Boston's city council held a four-hour hearing and passed legislation effectively banning driverless vehicles without a human present. New York's Governor Hochul withdrew a robotaxi proposal entirely after taxi driver opposition — her spokesperson cited "insufficient stakeholder support," which translated roughly to: the Teamsters have votes. In London, the App Drivers and Couriers Union declared a "state of emergency" as Waymo prepared to launch there, warning that up to 100,000 licensed private hire drivers could face displacement.
In February 2024, a Waymo in San Francisco's Chinatown was surrounded by a crowd during Lunar New Year celebrations. Someone threw a lit firework inside. The car burned. In June 2025, during protests against ICE immigration raids in downtown Los Angeles, protesters summoned Waymo vehicles using the app, then smashed their windows, spray-painted anti-ICE slogans, and set five cars on fire. Witnesses reported that protesters called them "spy cars" — a reference to the fact that Waymo vehicles collect data that can be shared with law enforcement.
There is something particularly revealing about how the fires happened: the cars were programmed not to hit pedestrians. Surrounded by a crowd with no escape route, they could not move. They were, as one observer put it, "sitting ducks." The cars' core safety feature — their absolute refusal to endanger pedestrians — made them defenseless.
→ TIME: Why Waymos Have Been Vandalized by Protesters (June 2025)
The value of any technology is not just what it produces — it is who captures that value and who bears the cost. Waymo's deep learning stack may save tens of thousands of lives. That is real value. But if that value flows to Alphabet shareholders while the cost falls on millions of workers with no transition plan, the political response will reflect that imbalance. The cars on fire in Los Angeles are not a technology story. They are an economics story.
We have now seen the full stack: sensor data feeding deep learning models, models producing a real-time prediction of the world, a planning system using reinforcement learning to decide how to drive, and value that is real but deeply contested. The AI Factory framework gives us a way to see how those pieces connect as a business system — the same lens we applied to EveryCure, Netflix, Spotify, and Uber.
| Step | What happens | Deep learning's role |
|---|---|---|
| Data | Cameras, lidar, and radar produce millions of data points per second; 200M+ real-world miles and 20B+ simulated miles form the training set | Raw sensor data is the input to every model; historical and synthetic miles are what the models learned from |
| Model | CNNs classify and locate objects in camera images; sensor fusion combines lidar and radar; motion prediction models forecast where each object will go | CNNs learned from millions of labeled examples; sensor fusion learned to combine modalities; all models retrained on synthetic data from the World Model |
| Prediction | A live 3D world representation: every object classified, located, sized, and given a probability distribution over its future trajectories | The output of the deep learning perception stack — what the planning system receives as input, updated thousands of times per second |
| Decision | The planning system chooses speed, lane position, and steering — decisions that must be safe, comfortable, and legal simultaneously | Increasingly learned via reinforcement learning in the World Model simulation, not hand-coded rules |
| Value | 3.5× safer than human drivers; potential to eliminate car crashes as a leading cause of US death — but contested by workers facing displacement and communities questioning who captures the benefit | Deep learning is what makes the entire value proposition possible — and also what makes the scale of disruption possible |
Waymo has been public about its technology and has published extensive research. Its architectures are not secret. So where does the competitive advantage actually come from?
Waymo is one application of a broader shift that researchers and engineers sometimes call physical AI — deep learning systems that don't just process text or images on a screen, but perceive the real world and act in it. The same stack described in this chapter — sensor-based perception, motion prediction, and behavior learned from data — is now being applied to humanoid robots, surgical assistants, warehouse automation, and home robotics. Understanding where Waymo fits in that larger picture helps clarify both the opportunity and the risk.
In April 2026, NPR reported on research published in Science Robotics by Swiss scientists who demonstrated a significant step forward in this direction. Their system allowed robots to watch a human perform a task — picking up a ball and tossing it into a container — and then reproduce that task while automatically compensating for the robot's own physical differences. Crucially, the robots could also self-correct, and transfer learned skills to other robots with different designs.
The parallel to Waymo is direct. Waymo's World Model extends the car's experience through simulation, allowing it to encounter scenarios it hasn't seen in the real world. Imitation learning for robots attempts something similar: instead of programming explicit movements for every task, you let the robot derive behavior from observation and generalize it. Both approaches are trying to solve the same fundamental problem — how do you build a system that handles the full range of situations the world will throw at it, when you can never enumerate every situation in advance?
For business students, the takeaway is not that robots are about to replace everything — the researchers themselves estimate we are years away from reliable home robots. The takeaway is that the deep learning capabilities described throughout this chapter are not confined to self-driving cars. They are a general-purpose technology that is moving, systematically, into any domain where a machine needs to perceive the physical world and act in it. The companies and managers who understand the underlying logic — what this technology can do, where it fails, and what governance it requires — will be better positioned to evaluate these developments as they arrive.
The competitive advantages above are real — but they are built on a system that, like every deep learning system, makes mistakes. The safety record Waymo cites is genuine and impressive, but it sits alongside a growing public record of specific failures that tell a more complicated story. The question for this section is not whether Waymo's models ever misclassify an object or make a wrong decision — they do. The question is: what happens when they do, who is responsible, and what does that mean for the companies and managers deploying AI in physical environments?
Deep neural networks are black boxes. A CNN that classifies a stop sign as a speed-limit sign because of an unusual shadow can rarely tell you why it made that mistake. The internal representations learned during training are not human-interpretable — they are distributed across billions of parameters in ways that resist simple explanation. This creates a real challenge for safety validation: how do you certify that a system is safe when you can't fully explain why it makes the decisions it does?
Waymo's approach is to rely on empirical safety records rather than theoretical guarantees. Instead of proving that the system will always behave correctly, they demonstrate that it has behaved correctly across hundreds of millions of miles and tens of billions of simulated miles, with a resulting safety record better than human drivers. This is a pragmatic answer — but it is not the same as understanding the system, and it does not predict how the system will behave in genuinely novel situations it has never encountered before.
Deep learning systems learn from the examples they were trained on. By construction, they are less reliable on situations they have never encountered. The Waymo World Model is a direct response to this problem — it extends what the car has "seen" by generating synthetic examples of rare scenarios. But a generative model can only simulate what its own experience has prepared it to imagine. A truly novel event — something that has never appeared in any video the model was trained on — could still produce a failure. This is not a hypothetical concern; it is the defining hard problem of deploying AI in open-ended physical environments, and it is not fully solved.
When a Waymo vehicle is involved in a collision, the liability questions are genuinely new. There is no human driver to blame. Waymo, the manufacturer, and potentially the software engineers whose models made the decision all enter the legal picture in ways that existing law was not designed to address. Several U.S. states have passed autonomous vehicle legislation, but the legal framework is still evolving. For business students, this is not an abstract legal question — it is a material risk for any company deploying AI in physical, safety-critical environments.
Abstract discussions of AI risk become concrete quickly when you look at what has actually happened with Waymo on real streets. Two stories from December 2025 illustrate the gap between a model that works well on average and a system that is ready for everything.
Together these two cases illustrate a theme that runs through every responsible AI discussion in this course: the gap between a system that performs well on average and one that is trustworthy across the full range of situations it will actually encounter. The school bus case is about the long tail — a rare scenario the model wasn't ready for. The assertiveness case is about specification — what does "correct behavior" even mean for a machine driving in human traffic? Both are business problems as much as technical ones. They require ongoing human judgment, public accountability, and a willingness to issue a recall when the model falls short.
The NPR stories above were serious. But Waymo's public record also includes a growing collection of incidents that are, depending on your perspective, funny, alarming, or both — and all of them raise genuine questions about what it means to put an AI system in a context it wasn't fully designed for. These are not cherry-picked failures; they represent a real category of problem that engineers working on physical AI systems face constantly: the world is stranger than your training data.
These incidents have something important in common: none of them involved a failure of the core deep learning perception system. The car wasn't confused about whether the parking lot wall was a pedestrian. The honking wasn't caused by a misidentified object. The man in the trunk wasn't a sensor failure. They were failures of system design — the broader set of decisions about how an AI-powered product behaves in the full messiness of the real world. Deep learning makes the car able to drive. It does not automatically make the whole system ready for everything.
| Step | Real-world driving | Simulation & training |
|---|---|---|
| Data | Cameras, lidar, radar producing millions of data points per second per vehicle | Real fleet miles + Waymo World Model generating synthetic scenarios |
| Model | Vision models classify and locate every object; motion prediction models anticipate where each will go next | Deep learning models retrained on synthetic + real data; driving behavior learned through trial-and-error in simulation |
| Prediction | 3D environmental map: objects classified, located, with predicted trajectories | Safety evaluation: did the model handle the simulated scenario correctly? |
| Decision | Planning system selects speed, lane, steering — in real time, dozens of times per second | Policy update: reinforce behaviors that led to safe, comfortable outcomes |
| Value | Passenger arrives safely; 3.5× safer than human drivers in injury crashes | Continuously improving model that handles more scenarios, more reliably |
These work well as written assignments or in-class discussion prompts.
MIS 432 · AI in Business · Case Study · For classroom discussion purposes.