You will build the same forecasting system Uber uses to set prices and move drivers. Then you will stress-test it, watch it drive automated decisions, feel it break from the inside as a driver, and decide what that means for the people whose livelihoods depend on it.
Built collaboratively with Claude. Every chart, simulator, and game in this lab was built by your instructor working with Claude — describing what was needed, evaluating the output, and iterating until it was right. That is the same workflow you will use in the Colab steps. Pay attention to what it produced. That capability is available to you too.
Imagine you are a data analyst at Uber. Your manager has just asked you to build a demand forecast for this Friday night — there's a Death Cab for Cutie, Sleater-Kinney, and ODESZA homecoming concert at WWU and operations needs to know how many drivers to deploy at 10pm when it lets out. Before you can build that forecast, you need data. Specifically, you need historical Friday night trip data so the model has something to learn from.
To make the Friday night forecast, you need two things from that historical data:
1. Trend. Has Friday night demand been growing over time? Over the past two years, Friday night trips in your city have risen steadily from around 280 trips to around 380 per Friday night. That's Uber's market expanding — more drivers, more riders, more trips every month. Your model needs to know this so it doesn't predict next Friday based on what Friday looked like two years ago.
2. Seasonality. How does Friday night demand vary by time of year? Summers run about 20% busier as activity picks up. Late December drops sharply. New Year's Eve produces a massive spike. And WWU's academic calendar matters too — when students are on campus, Friday nights are busier. These cycles repeat on a schedule, which means a model trained on enough history can learn to anticipate them — including a concert night in the middle of the school year.
The chart below shows what that historical Friday night data looks like. Take a minute to read it before running the prompt — you will be generating this same chart in Colab.
For this step, you will ask Claude to simulate that historical Friday night dataset. Yes, a real analyst would do exactly this — before building any model, they would pull the data, plot it, and stare at it. You cannot forecast what you do not understand. Here is exactly what you are pulling and why each piece matters for the concert forecast:
Once you have the data in Colab, you will plot it so you can see these patterns yourself before handing it to a model. That is what a real analyst does — the chart is not decoration, it is how you catch problems in the data before they become problems in the forecast.
You have the data. Before you use it to predict this Friday's concert night, you need to answer a harder question: can your model actually predict anything? You cannot know that by pointing it at the future — the future hasn't happened yet. So instead, you do something clever: you pretend part of the past is the future.
This is called backtesting. Here is how it works. You take your two years of historical Friday night data and split it in two. You give the model the first 18 months — weeks 1 through 78 — and tell it to learn everything it can from that period: the trend, the seasonal patterns, the school-year effect. Then you hide the last 6 months — weeks 79 through 104 — and ask the model to predict them, as if those weeks hadn't happened yet.
You already know what actually happened in those 26 weeks. So you can compare the model's predictions directly against reality. If the predictions are close, the model has earned your trust. If they're not, you know something is wrong before you rely on it for the concert forecast. A real analyst never skips this step — you do not hand an untested model to an operations manager.
Remember, your boss is still waiting for an answer about Friday night. You are not ready to give it yet — but this step is how you get there.
In the prompt below, you will ask Claude to take the same dataset you built in Step 1 and split it into two parts. Weeks 1–78 is your training window — roughly January 2023 through July 2024, about 18 months of actual Friday night history. That is what the model learns from. Weeks 79–104 is your test window — roughly August 2024 through January 2025, the last 6 months of your dataset. The model never sees this period during training. Once it has learned from weeks 1–78, you ask it to predict weeks 79–104 as if they haven't happened yet. Then you compare those predictions against what actually happened. That comparison is your backtest — and it tells you whether this model is good enough to trust for the concert forecast.
Your backtest results just came in. The model tracked the test period well — the error was within an acceptable range for operational planning. You're ready to use it.
Then your boss emails you:
You reply: "Backtest looked good — MAE was within range. Running the Friday night forecast now, will have a number for you within the hour."
That number is what this step produces. You will take the validated model from Step 2, give it the features for this specific Friday — a school-year week, not summer, not NYE, week 105 in the sequence — and ask it to predict how many trips to expect at 10pm when the show lets out. The model will return a predicted trip count, a range, and from that you will calculate exactly how many drivers your boss needs to deploy.
Here is what you are asking Claude to do in the prompt below:
You have a forecast. You have a driver number. You are not ready to send it yet.
A real Uber analyst would not hand a single forecast to their operations manager and call it done. Before committing, they would ask: what if my key assumptions are wrong? They would take the same model and deliberately change the inputs — push the assumptions to their edges — and see how much the recommendation moves. If the driver number barely changes when you tweak the assumptions, the forecast is robust and you can send it with confidence. If it swings wildly, you have a fragile forecast and your boss needs to know that before they act on it.
This is called stress-testing, and it is standard practice on any serious forecasting team. At Uber, analysts run scenario models before every major event — not because they expect the worst case, but because they need to know what the worst case looks like so operations can plan for it. The question is never "will our forecast be exactly right?" It is "how wrong could we be, and can we still handle it?"
For the WWU concert forecast, you made four key assumptions: 25 drivers typically online, each handling ~3 trips per hour, a ±15% prediction interval, and a crowd that behaves like a normal Friday. In this step you will stress-test two of those assumptions directly — changing them one at a time and watching what happens to the driver recommendation and surge call. By the end, you will know the range of outcomes you are actually committing to when you send that memo.
Instead of writing code for this step, use the simulator below. It has the same forecast model you built in Step 3 already built in — the same baseline of ~850 predicted Friday night trips, 25 drivers online, and 3 trips per driver per hour. Those are the exact assumptions from your Step 3 memo. Now change them and watch what happens to the recommendation in real time.
You just sent your boss a driver recommendation based on your forecast. But here is what you did not see happen on Uber's side: the moment your predicted trip count exists, it flows directly into Uber's pricing engine. No human reviews it. No one decides whether this is the right moment to surge. The algorithm reads the forecast, compares demand to available supply, calculates a multiplier, and pushes it to every rider's screen in milliseconds.
You were the analyst who built the forecast. Now flip to the other side and watch what the algorithm does with it. The simulator below loads four scenarios — starting with the Death Cab for Cutie, Sleater-Kinney, and ODESZA concert you just forecasted, then three others that get progressively harder for the algorithm to handle. Each time, you decide: trust it, override it, or pause it. The outcomes are not equal.
Friday night went well. The Death Cab for Cutie, Sleater-Kinney, and ODESZA show let out at 10pm and the forecast held up. Drivers were in the right place. Riders got cars in under 8 minutes. Surge fired at 3.2×, cleared the market, and came back down by 11:30pm. Your operations manager sent you a two-word Slack message: "nice work."
That was six weeks ago. Life went on. Things changed — then they changed again.
Since then, WWU announced a 12% drop in fall enrollment — fewer students on campus means quieter Friday nights. A new apartment complex opened on the north side of Bellingham, shifting where riders are actually coming from. And last week it came out that Lyft quietly launched in Whatcom County, pulling about 15% of your regular drivers onto their platform.
Your model doesn't know any of this. It is still predicting Friday nights based on the two years of history you trained it on. It passed the backtest. It worked on concert night. But the world it learned from is not the world it is predicting in anymore.
This is called distribution shift — and it is one of the most common reasons AI systems quietly fail after a successful launch. The model is not broken. It is just wrong, in ways that are invisible unless someone is watching. Every prediction it makes is based on a world that no longer exists. And every automated decision downstream — surge pricing, driver incentives, server provisioning — is based on those wrong predictions.
The simulator below lets you feel this happen. Pick a scenario, drag the slider to choose when the world changes, and watch the gap open between what the model predicts and what actually happens. That gap is not just an error number — it is the cost, in real decisions, of a model that nobody told the world had changed.
The simulator shows three numbers as you drag the slider. Here is what they mean before you start:
MAE (Mean Absolute Error) is the average size of the model's mistakes, measured in trips. If your model predicts 320 trips on a Friday night but 355 actually show up, that's an error of 35 trips. Do that across 26 weeks and average the gaps — that's your MAE. A MAE of 20 on a normal Bellingham Friday is fine. A MAE of 20 on concert night, when you need precision to staff drivers correctly, is a problem.
The simulator tracks three numbers as the world shifts. Here is what each one means:
In Step 6 you watched distribution shift happen in a simulator — a chart, some numbers, an error multiplier climbing. Now you are going to feel it from the inside.
You have spent this lab as the analyst: pulling data, building the model, backtesting it, stress-testing it, sending the memo. Every decision you made was at a desk, looking at charts. Step 7 puts you on the other side. You are now a driver in Bellingham on a Friday night. The algorithm tells you where demand is high. Your job is to decide whether to follow it — and to notice when something feels off.
Every surge multiplier you see in this game was calculated by a forecasting system running the same logic as the model you built in Steps 1–3 — it reads historical Friday night trip patterns, predicts where demand will be, and sets a price automatically. No human approved each number. That is what happens when the forecast triggers the decision instantly, at scale.
You will play 6 rounds. Each round you see a situation — a time of night, conditions in the city, what the algorithm is predicting — and you choose where to drive. Then you see what you earned, and what your rider experienced.
Rounds 1–3 are stable. The forecast is accurate. Learn the rhythm. Rounds 4–6 — something changes in the world, and the forecast does not know. You will not be told when it happens. Pay attention.
This is the final step — and it is the one that makes everything real. You are going to publish a webpage that shows a recruiter what you actually built. Not a summary of the lab. Not Claude's words. Your work, your outputs, your analysis.
Think about what that means. A recruiter who clicks your GitHub link has 60 seconds. They want to see: can this person take a real business problem, build something with AI tools, and explain what they found? Your page needs to answer that with evidence — your actual chart, your actual forecast numbers, your actual concert night memo, your own written analysis connecting all of it.
Before you send the prompt below, fill it in. Copy the entire prompt, paste your actual content from Colab where indicated, then send the whole thing to Claude in a fresh conversation. It will come back as a finished page.
Here is what to collect before you start:
Submit two things: your GitHub Pages URL and a Word or PDF document with your answers to every reflection question below.
https://[username].github.io/uber-ai-lab