Lab 4 · Uber · Forecasting · Surge Pricing · Responsible AI

Build a Demand Forecasting Model
& Decide When to Trust the Algorithm

You will build the same forecasting system Uber uses to set prices and move drivers. Then you will stress-test it, watch it drive automated decisions, feel it break from the inside as a driver, and decide what that means for the people whose livelihoods depend on it.

Built collaboratively with Claude. Every chart, simulator, and game in this lab was built by your instructor working with Claude — describing what was needed, evaluating the output, and iterating until it was right. That is the same workflow you will use in the Colab steps. Pay attention to what it produced. That capability is available to you too.

Claude Google Colab Live Simulators Driver Game GitHub Pages
📈
Step 1
Explore the Data
🔭
Step 2
Build & Backtest
📅
Step 3
Friday Night Forecast
🔀
Step 4
Stress-Test It
Step 5
The Algorithm Decides
⚠️
Step 6
Break the Model
🎮
Step 7 — Game
Drive for Uber
🌐
Step 8
Publish to GitHub
1
Explore the Data
AI Factory stage: Data — see trend and seasonality in raw trip history

Imagine you are a data analyst at Uber. Your manager has just asked you to build a demand forecast for this Friday night — there's a Death Cab for Cutie, Sleater-Kinney, and ODESZA homecoming concert at WWU and operations needs to know how many drivers to deploy at 10pm when it lets out. Before you can build that forecast, you need data. Specifically, you need historical Friday night trip data so the model has something to learn from.

To make the Friday night forecast, you need two things from that historical data:

1. Trend. Has Friday night demand been growing over time? Over the past two years, Friday night trips in your city have risen steadily from around 280 trips to around 380 per Friday night. That's Uber's market expanding — more drivers, more riders, more trips every month. Your model needs to know this so it doesn't predict next Friday based on what Friday looked like two years ago.

2. Seasonality. How does Friday night demand vary by time of year? Summers run about 20% busier as activity picks up. Late December drops sharply. New Year's Eve produces a massive spike. And WWU's academic calendar matters too — when students are on campus, Friday nights are busier. These cycles repeat on a schedule, which means a model trained on enough history can learn to anticipate them — including a concert night in the middle of the school year.

The chart below shows what that historical Friday night data looks like. Take a minute to read it before running the prompt — you will be generating this same chart in Colab.

Friday Night Uber Trips Near WWU Campus (Simulated, Weeks 1–52)

For this step, you will ask Claude to simulate that historical Friday night dataset. Yes, a real analyst would do exactly this — before building any model, they would pull the data, plot it, and stare at it. You cannot forecast what you do not understand. Here is exactly what you are pulling and why each piece matters for the concert forecast:

  • 2-year Friday night trip counts — 104 weeks of how many Uber trips were taken on Friday nights near WWU campus. This is the core signal the model will learn from.
  • Trend — the steady growth from ~280 to ~380 Friday night trips over two years. Without this, the model predicts next Friday based on what Friday looked like two years ago and consistently undershoots.
  • Seasonal pattern — summers run ~20% busier, late December drops, New Year's Eve spikes hard. The model needs to know these cycles so it doesn't confuse a slow winter Friday with a slow concert Friday.
  • School-year effect — WWU's academic calendar (Sept–May) makes Friday nights ~15% busier than summer Fridays. A concert in the middle of the school year sits on top of an already-elevated baseline — the model needs to account for that.

Once you have the data in Colab, you will plot it so you can see these patterns yourself before handing it to a model. That is what a real analyst does — the chart is not decoration, it is how you catch problems in the data before they become problems in the forecast.

Prompt 1 — paste into Claude to generate and explore your dataset
I am a student in MIS 432: AI in Business at Western Washington University. We are studying how Uber uses demand forecasting. I need to build a forecast for Friday night Uber demand near WWU's campus — specifically for a night when a sold-out show lets out at 10pm. Before I can build that forecast, I need to pull and explore two years of historical Friday night trip data. Please write complete Python code for Google Colab that does the following: 1. Simulates two years of historical Friday night Uber trip data (104 weeks) near WWU campus with: - A realistic upward trend (growing from ~280 Friday night trips to ~380 over 2 years as Uber's market expands in Bellingham) - Seasonality: summer Fridays 20% busier, late December drops sharply, a spike every New Year's Eve - A school-year effect: Fridays during WWU's academic calendar (Sept–May) run ~15% busier than summer Fridays - Store as a pandas DataFrame with columns: week (date, starting 2023-01-02) and friday_night_trips (integer) 2. Plot the data as a clean line chart titled "Friday Night Uber Trips Near WWU — 2 Years of History" with the X axis labeled "Week" and Y axis labeled "Friday Night Trips". Mark the start of each summer with a vertical dashed line and annotate the New Year's Eve spike. 3. Print the first and last 10 rows, then print a summary: min, max, and average Friday night trip count, and which week had the highest demand. Add a plain-English comment above each section explaining what it does and why. Do not build any forecast model yet — this step is data exploration only.
Step 1 Reflection — answer two questions
Now that you have the data plotted in Colab, answer these two questions:

1. Look at your chart. Which of the four data signals — trend, seasonality, school-year effect, or the NYE spike — would matter most for predicting a Friday night after a WWU concert? Why that one over the others?

2. What would happen to your forecast if you had skipped this step and just handed raw trip counts directly to a model without understanding what was in the data first? Name one specific thing you would have missed.
2
Backtest the Model
AI Factory stage: Model — prove the model works before trusting it with the future

You have the data. Before you use it to predict this Friday's concert night, you need to answer a harder question: can your model actually predict anything? You cannot know that by pointing it at the future — the future hasn't happened yet. So instead, you do something clever: you pretend part of the past is the future.

This is called backtesting. Here is how it works. You take your two years of historical Friday night data and split it in two. You give the model the first 18 months — weeks 1 through 78 — and tell it to learn everything it can from that period: the trend, the seasonal patterns, the school-year effect. Then you hide the last 6 months — weeks 79 through 104 — and ask the model to predict them, as if those weeks hadn't happened yet.

You already know what actually happened in those 26 weeks. So you can compare the model's predictions directly against reality. If the predictions are close, the model has earned your trust. If they're not, you know something is wrong before you rely on it for the concert forecast. A real analyst never skips this step — you do not hand an untested model to an operations manager.

Why cut off at week 78?
The 78/26 split gives the model 18 months to learn from and 6 months to be tested on. That 6-month test window is long enough to include seasonal variation — the model has to predict summer Fridays, fall school-year Fridays, and holiday Fridays it never saw during training. If it handles all of those reasonably well, it's ready for the concert prediction.

Remember, your boss is still waiting for an answer about Friday night. You are not ready to give it yet — but this step is how you get there.

In the prompt below, you will ask Claude to take the same dataset you built in Step 1 and split it into two parts. Weeks 1–78 is your training window — roughly January 2023 through July 2024, about 18 months of actual Friday night history. That is what the model learns from. Weeks 79–104 is your test window — roughly August 2024 through January 2025, the last 6 months of your dataset. The model never sees this period during training. Once it has learned from weeks 1–78, you ask it to predict weeks 79–104 as if they haven't happened yet. Then you compare those predictions against what actually happened. That comparison is your backtest — and it tells you whether this model is good enough to trust for the concert forecast.

Prompt 2 — paste into Claude to build and backtest your model
I have the historical Friday night Uber trip dataset from Step 1 — a DataFrame called df with columns: week (datetime) and friday_night_trips (integer), 104 weeks of data. I need to backtest a forecasting model before I can trust it with a real prediction. Please add Python code that does the following: Step A — Feature engineering: Add these columns to df: - week_num: integer index of each week (1 through 104) - is_summer: 1 if the week falls in June/July/August, 0 otherwise - is_nye: 1 if the week contains December 31st, 0 otherwise - is_school_year: 1 if the week falls in WWU's academic calendar (Sept–May), 0 otherwise Add a plain-English comment explaining why each feature matters for predicting Friday night demand. Step B — Train/test split: - Weeks 1–78 are the training set (18 months the model learns from) - Weeks 79–104 are the test set (6 months the model has never seen — this is the backtest window) - Print a plain-English explanation of what this split means and why we do it this way Step C — Train and evaluate: - Train a linear regression model on the training set using all four features - Generate predictions for the test set (weeks 79–104) - Calculate the Mean Absolute Error (MAE) and print it as: "On average, the model's Friday night trip predictions were off by X trips" - Print whether this error level is acceptable for operational planning (e.g. does it matter if we're off by 50 trips vs 500?) Step D — Visualize the backtest: Plot the full 104 weeks showing: - Solid cyan line: actual Friday night trips (all 104 weeks) - Dashed orange line: model predictions for weeks 79–104 only - Shaded ±15% band around the predictions - Vertical line at week 78 labeled "Model trained on everything left of this line" - Title: "Friday Night Demand — Backtest Results" Add plain-English comments throughout. No ARIMA, no LSTM — linear regression only.
What you are looking for in the output
The most important thing on your chart is how closely the dashed orange line tracks the solid cyan line in weeks 79–104. Those are weeks the model never trained on — it is making genuine predictions about a period it never saw. Where the lines stay close, the model understood the pattern. Where they diverge, the model missed something. Pay particular attention to the school-year ramp-up in September and any event spikes — those are the same kinds of signals you will be relying on for the concert forecast in Step 3.

What is MAE? MAE stands for Mean Absolute Error — it is simply the average size of the model's mistakes across the test period. If the MAE is 25, it means that on a typical Friday night in the test window, the model's prediction was off by about 25 trips. Whether that is acceptable depends entirely on what you are using the forecast for. Being off by 25 trips on a normal Bellingham Friday is manageable. Being off by 25 on a show night where precision matters is a much bigger problem. Always read the MAE in context, not in isolation.
Step 2 Reflection — answer both
Q1 — Reading the backtest. Look at your chart. Find a week in the test period where the model's prediction was noticeably off from what actually happened. What do you think caused that gap — was it a signal the model didn't have, a pattern it couldn't have learned from 18 months of data, or something else?

Q2 — Why backtest at all? Why can't you just skip straight to predicting this Friday's concert night? What would you not know about your model if you skipped the backtest — and what could go wrong as a result?
3
Make the Friday Night Forecast
The model passed the backtest — now answer your boss's question

Your backtest results just came in. The model tracked the test period well — the error was within an acceptable range for operational planning. You're ready to use it.

Then your boss emails you:

From: Operations Manager — Thursday 9:14am
"Hey — is the forecast ready? Concert's tomorrow night. I need to know how many drivers to put on near campus at 10pm and whether we're activating surge. Let me know ASAP."

You reply: "Backtest looked good — MAE was within range. Running the Friday night forecast now, will have a number for you within the hour."

That number is what this step produces. You will take the validated model from Step 2, give it the features for this specific Friday — a school-year week, not summer, not NYE, week 105 in the sequence — and ask it to predict how many trips to expect at 10pm when the show lets out. The model will return a predicted trip count, a range, and from that you will calculate exactly how many drivers your boss needs to deploy.

Here is what you are asking Claude to do in the prompt below:

  • Build the concert Friday data row — a single row with the right feature values for this specific night (school year, not summer, week 105)
  • Run the prediction — use the trained model to output a trip count plus a ±15% range (lower and upper bound)
  • Calculate the driver recommendation — convert the trip count into a specific number of drivers needed, using a realistic trips-per-driver assumption
  • Make the surge call — based on predicted demand vs. likely driver supply, recommend whether to activate surge pricing
  • Print an operations memo — formatted so you could paste it directly into your reply to your boss
Assumptions baked into this model
Every forecast is only as good as its assumptions. Before you trust the output, you should know exactly what this model is assuming — because if any of these are wrong, the prediction could be significantly off.

1 driver handles ~3 trips per hour. This is an average. In a post-concert surge with short rides, one driver might handle 5. In bad traffic, maybe 2. The real number depends on conditions you don't know yet.

25 drivers are typically online on a Friday night in Bellingham. This is a baseline assumption. If it's raining, drivers tend to come online. If there's a competing event elsewhere in town, they might be pulled away. Your surge recommendation is sensitive to this number.

The ±15% prediction interval is fixed. The real uncertainty is higher on unusual nights like a concert. The model learned from normal Fridays — a sold-out show is outside that range of experience.

The show crowd behaves like a typical Friday crowd. They may not. A large event can produce a demand spike concentrated in a short window (10–11pm) rather than spread across the night. The model doesn't know that.
Prompt 3 — paste into Claude to generate the concert night forecast
My backtest in Step 2 showed the model is reliable — the MAE was acceptable for operational planning. Now I need to use it to answer a real business question. Scenario: it is Thursday morning. There is a Death Cab for Cutie, Sleater-Kinney, and ODESZA homecoming concert at WWU tomorrow (Friday) night. The show ends at 10pm and the crowd will be looking for rides. My operations manager needs to know: how many drivers should we have online near campus at 10pm, and should we activate surge pricing? Please add Python code that does the following: Step A — Build the concert Friday feature row: Create a single-row DataFrame for the upcoming concert Friday with these values: - week_num: 105 (the next week after our dataset ends) - is_summer: 0 (it's during the school year) - is_nye: 0 (not New Year's Eve) - is_school_year: 1 (WWU is in session) Print a plain-English explanation of what each value means and why it matters. Step B — Generate the prediction: Use the trained model from Step 2 to predict friday_night_trips for the concert Friday. Calculate lower bound (predicted × 0.85) and upper bound (predicted × 1.15). Print: "Predicted Friday night trips: X (range: Y to Z)" Step C — Calculate the driver recommendation: Assume each driver handles approximately 3 trips per hour during the post-concert rush. Calculate how many drivers are needed to cover the predicted demand, the lower bound, and the upper bound. Print these three numbers clearly — the operations manager needs to decide how many drivers to put on. Step D — Make the surge pricing call: Assume Uber has approximately 25 drivers typically online in the Bellingham/WWU area on a Friday night. If predicted demand requires more than 25 drivers, recommend activating surge pricing. Print a clear YES or NO on surge, with a one-sentence reason. Step E — Print an operations memo: Format the output as a short memo the analyst could send to their manager, including: predicted trips, driver range (low to high), surge recommendation, and one sentence on what could push demand above the upper bound. Add plain-English comments throughout.
What makes this different from the backtest
In the backtest, you already knew the right answers — you were checking whether the model could find them. Here, nobody knows the right answer yet. The prediction interval matters more now than it did in Step 2: the lower bound is the minimum you should staff for, and the upper bound is what happens if demand runs hot. An operations manager who only plans for the middle number and ignores the upper bound will get caught short on a night like this. Your driver recommendation should be based on the upper bound — it is always cheaper to have a driver who isn't needed than a rider who can't get a car.
Step 3 Reflection — answer both
Q1 — Read the output. Look at the operations memo your code printed. What is the predicted trip count, the driver recommendation, and the surge call? Then answer: which of the four assumptions listed above concerns you most — and if that assumption turned out to be wrong, how would it change the recommendation?

Q2 — Forecasting and its limits. Your model produced a specific number. But a forecast is not a fact — it is a structured guess built on historical patterns and assumptions. Given what you now know about how this model was built, what would you tell your boss alongside the driver number? What should they know about the confidence level of this forecast before they act on it?
4
Stress-Test the Forecast
What happens when one thing you assumed turns out to be wrong?

You have a forecast. You have a driver number. You are not ready to send it yet.

A real Uber analyst would not hand a single forecast to their operations manager and call it done. Before committing, they would ask: what if my key assumptions are wrong? They would take the same model and deliberately change the inputs — push the assumptions to their edges — and see how much the recommendation moves. If the driver number barely changes when you tweak the assumptions, the forecast is robust and you can send it with confidence. If it swings wildly, you have a fragile forecast and your boss needs to know that before they act on it.

This is called stress-testing, and it is standard practice on any serious forecasting team. At Uber, analysts run scenario models before every major event — not because they expect the worst case, but because they need to know what the worst case looks like so operations can plan for it. The question is never "will our forecast be exactly right?" It is "how wrong could we be, and can we still handle it?"

For the WWU concert forecast, you made four key assumptions: 25 drivers typically online, each handling ~3 trips per hour, a ±15% prediction interval, and a crowd that behaves like a normal Friday. In this step you will stress-test two of those assumptions directly — changing them one at a time and watching what happens to the driver recommendation and surge call. By the end, you will know the range of outcomes you are actually committing to when you send that memo.

Why change one assumption at a time?
If you change everything at once, you cannot tell which assumption drove the change in the output. Real analysts isolate variables — change one thing, hold everything else constant, observe the effect. This is the same logic as a controlled experiment. It also makes it much easier to explain to your boss: "if driver availability drops to 15, the surge recommendation flips — here's why."

Instead of writing code for this step, use the simulator below. It has the same forecast model you built in Step 3 already built in — the same baseline of ~850 predicted Friday night trips, 25 drivers online, and 3 trips per driver per hour. Those are the exact assumptions from your Step 3 memo. Now change them and watch what happens to the recommendation in real time.

How the stress-test simulator works
  • Drivers online at 10pm — how many Uber drivers you expect to be active near campus when the show lets out. Your Step 3 baseline was 25.
  • Trips per driver per hour — how efficiently each driver can move through the demand. 3 is realistic on a normal night, but traffic after a sold-out show could push this lower.
  • Demand scenario — shift between the lower bound (−15%), your base prediction, and the upper bound (+15%) from your forecast interval.
  • As you move any slider, the driver capacity, drivers needed, and surge recommendation all update instantly. Watch for the moment the SURGE indicator at the top right flips from blue to red — that is the exact threshold where your assumptions tip the decision.
U
WWU Concert Forecast — Stress-Test Simulator
Adjust assumptions and watch the recommendation change in real time
SURGE: OFF
Assumptions
Drivers online at 10pm 25
10 (low) 60 (high)
Trips per driver per hour 3.0
1.5 (traffic) 5.0 (fast)
Demand scenario Base (850)
Low (−15%) High (+15%)
Recommendation
Predicted trips
850
Driver capacity (supply)
75
25 drivers × 3.0 trips/hr
Drivers needed
284
✓ No surge needed
Supply meets demand at current driver levels.
Demand vs. Supply
Predicted demand (trips) 850
Driver capacity (trips handled) 75
Surge activates when demand exceeds supply by more than 20%
Step 4 Reflection — answer both
Q1 — What did the stress test reveal? Did the surge call stay the same across all scenarios, or did it flip? Which assumption had the bigger effect on the recommendation — driver availability or demand level? What does that tell you about where the real risk is on Friday night?

Q2 — Now send the memo. You have the forecast and the stress test. Write 3–4 sentences to your operations manager: give them the driver recommendation, the surge call, and — based on what you learned in the stress test — tell them the one condition that would change that recommendation. Be specific with the numbers.
5
The Algorithm Decides
AI Factory stage: Prediction → Decision — the prediction-decision gap lands here

You just sent your boss a driver recommendation based on your forecast. But here is what you did not see happen on Uber's side: the moment your predicted trip count exists, it flows directly into Uber's pricing engine. No human reviews it. No one decides whether this is the right moment to surge. The algorithm reads the forecast, compares demand to available supply, calculates a multiplier, and pushes it to every rider's screen in milliseconds.

You were the analyst who built the forecast. Now flip to the other side and watch what the algorithm does with it. The simulator below loads four scenarios — starting with the Death Cab for Cutie, Sleater-Kinney, and ODESZA concert you just forecasted, then three others that get progressively harder for the algorithm to handle. Each time, you decide: trust it, override it, or pause it. The outcomes are not equal.

U
Uber Surge Engine
Real-time pricing simulation
Monitoring
How to use this simulator
1. Pick a scenario below — start with the Concert to see the algorithm work, end with the Earthquake to see where it breaks down.
2. Watch the Demand vs. Supply bars update and read the surge multiplier the algorithm sets.
3. Make your decision — Trust, Override, or Pause — and see what happens.
Load scenario:
Demand vs. Supply — Live
Demand
40
Supply
38
Active events:
No unusual events
Algorithm prediction:
Demand within normal range. Surge: 1.0×
Recommended Surge
Surge Multiplier
1.0×
Normal pricing
Your decision:
Awaiting scenario...
Step 5 Reflection — answer both using the simulator above
Q1 — When should you trust it? Load the Concert Ends scenario and choose “Trust the Algorithm,” then load the Earthquake scenario and choose “Trust the Algorithm” again. The algorithm is doing exactly the same thing both times — yet one outcome feels right and one feels wrong. What is the difference? Is it the algorithm that changed, the context, or your expectations of what the algorithm should be responsible for?

Q2 — The human in the loop. You just saw four scenarios where the algorithm made automatic decisions with no human review. Pick the one scenario where you think a human should have been in the loop before surge fired — and write one specific rule Uber could add to their system to make that happen. Be concrete: what condition would trigger the rule, and what would it do?
6
Break the Model
AI Factory stage: Value — distribution shift, model decay, and monitoring

Friday night went well. The Death Cab for Cutie, Sleater-Kinney, and ODESZA show let out at 10pm and the forecast held up. Drivers were in the right place. Riders got cars in under 8 minutes. Surge fired at 3.2×, cleared the market, and came back down by 11:30pm. Your operations manager sent you a two-word Slack message: "nice work."

That was six weeks ago. Life went on. Things changed — then they changed again.

Since then, WWU announced a 12% drop in fall enrollment — fewer students on campus means quieter Friday nights. A new apartment complex opened on the north side of Bellingham, shifting where riders are actually coming from. And last week it came out that Lyft quietly launched in Whatcom County, pulling about 15% of your regular drivers onto their platform.

Your model doesn't know any of this. It is still predicting Friday nights based on the two years of history you trained it on. It passed the backtest. It worked on concert night. But the world it learned from is not the world it is predicting in anymore.

This is called distribution shift — and it is one of the most common reasons AI systems quietly fail after a successful launch. The model is not broken. It is just wrong, in ways that are invisible unless someone is watching. Every prediction it makes is based on a world that no longer exists. And every automated decision downstream — surge pricing, driver incentives, server provisioning — is based on those wrong predictions.

The simulator below lets you feel this happen. Pick a scenario, drag the slider to choose when the world changes, and watch the gap open between what the model predicts and what actually happens. That gap is not just an error number — it is the cost, in real decisions, of a model that nobody told the world had changed.

The simulator shows three numbers as you drag the slider. Here is what they mean before you start:

MAE (Mean Absolute Error) is the average size of the model's mistakes, measured in trips. If your model predicts 320 trips on a Friday night but 355 actually show up, that's an error of 35 trips. Do that across 26 weeks and average the gaps — that's your MAE. A MAE of 20 on a normal Bellingham Friday is fine. A MAE of 20 on concert night, when you need precision to staff drivers correctly, is a problem.

The simulator tracks three numbers as the world shifts. Here is what each one means:

  • MAE Before Shift — how accurate the model was before the world changed. This is your baseline, roughly what you saw in the backtest in Step 2.
  • MAE After Shift — how accurate the model is once the world has moved on and the model hasn't. Watch this number climb as the gap between prediction and reality widens week by week.
  • Error Multiplier — how many times worse the model has gotten. A 4× multiplier means it is now making mistakes four times larger than before the shift. In practice: if your pre-shift MAE was 20 trips and the multiplier hits 4×, the model is off by ~80 trips per Friday — the difference between having enough drivers and having a surge crisis.
Choose what changes in the world:
MAE Before Shift
trips/week
MAE After Shift
trips/week
Error Multiplier
× worse after shift
Monitoring alert threshold (20% error)
Distribution Shift Simulator — 🦠 Pandemic Lockdown
Step 6 Reflection — answer both using the simulator
Q1 — Pick a scenario and trace the damage. Run one of the three scenarios — WWU enrollment drop, new apartments, or Lyft entering Bellingham — and drag the slider so the world changes early. Look at the Error Multiplier and the weeks before the monitoring alert fires. That window is real: Uber is still making driver bonuses, surge pricing, and staffing decisions based on a model that is quietly wrong. Using the specific scenario you chose, describe in 2–3 sentences what distribution shift actually means here — what changed in the real world, why the model doesn't know, and what a Bellingham driver or rider would experience during those undetected weeks.

Q2 — AI Governance. The monitoring alert in the simulator is just a threshold check — someone had to decide what that threshold should be, build the check, set a policy for who reviews it when it fires, and give that person authority to pause the model. Each of those is a governance decision, not a code decision. Based on what you saw, write 2–3 sentences on what an AI governance policy for this forecasting model should include — who is accountable when the model runs wrong for weeks undetected, and what rule would you put in place to prevent that from happening silently?
7
Drive for Uber
🎮 Interactive game — feel the forecast from both sides, as a driver and a rider

In Step 6 you watched distribution shift happen in a simulator — a chart, some numbers, an error multiplier climbing. Now you are going to feel it from the inside.

You have spent this lab as the analyst: pulling data, building the model, backtesting it, stress-testing it, sending the memo. Every decision you made was at a desk, looking at charts. Step 7 puts you on the other side. You are now a driver in Bellingham on a Friday night. The algorithm tells you where demand is high. Your job is to decide whether to follow it — and to notice when something feels off.

Every surge multiplier you see in this game was calculated by a forecasting system running the same logic as the model you built in Steps 1–3 — it reads historical Friday night trip patterns, predicts where demand will be, and sets a price automatically. No human approved each number. That is what happens when the forecast triggers the decision instantly, at scale.

You will play 6 rounds. Each round you see a situation — a time of night, conditions in the city, what the algorithm is predicting — and you choose where to drive. Then you see what you earned, and what your rider experienced.

Rounds 1–3 are stable. The forecast is accurate. Learn the rhythm. Rounds 4–6 — something changes in the world, and the forecast does not know. You will not be told when it happens. Pay attention.

🚗
You are driving for Uber in Bellingham, WA
It is a Friday evening near WWU campus. Your shift starts now. The algorithm will tell you where demand is high — your job is to decide whether to follow it.
Step 7 Reflection — answer both after completing all 6 rounds
Q1 — Distribution shift from the inside. At what point during rounds 4–6 did something feel off? What tipped you off — the multiplier, your earnings, the rider outcome, or something else? Connect what you experienced as a driver to what you saw in the Step 6 simulator: the gap between the cyan line and the dashed line has a real human cost. In 2–3 sentences, describe what that cost looked like from inside the game — for you as a driver, and for the riders waiting in Cordata.

Q2 — Driver transparency and the algorithm. In rounds 4–6 your earnings dropped and the app never explained why. You just followed the algorithm and got less. In 2022, Uber rolled out a policy called Upfront Fares — a secretive algorithm that now determines driver pay in many cities using undisclosed factors, meaning two drivers doing the same trip can earn different amounts with no explanation. Based on what you experienced in this game: why does algorithmic transparency matter to drivers specifically? And if drivers cannot trust or understand the algorithm that controls their income, what happens to Uber's supply side — and ultimately to the forecast model you built in Steps 1–3?
8
Publish to GitHub
Put everything you built into one shareable HTML page, live on the web

This is the final step — and it is the one that makes everything real. You are going to publish a webpage that shows a recruiter what you actually built. Not a summary of the lab. Not Claude's words. Your work, your outputs, your analysis.

Think about what that means. A recruiter who clicks your GitHub link has 60 seconds. They want to see: can this person take a real business problem, build something with AI tools, and explain what they found? Your page needs to answer that with evidence — your actual chart, your actual forecast numbers, your actual concert night memo, your own written analysis connecting all of it.

🔍 Pause and notice — you just did AI Prototyping
Before you build the final page, stop and notice what you have done over the past several hours. You described what you wanted in natural language. Claude wrote the code. You ran it, looked at the output, decided what to change, and asked Claude to change it. You iterated your way to a working forecast, a backtest, a surge recommendation, and a distribution shift analysis — in one afternoon — without writing most of the code yourself. This workflow is called AI Prototyping and it is exactly how Uber's product teams now turn rough ideas into working products in hours rather than weeks. The cost of making an idea tangible has dropped to nearly zero. You are not learning to replace engineers. You are learning to do the thing that used to be too expensive to even try.

Before you send the prompt below, fill it in. Copy the entire prompt, paste your actual content from Colab where indicated, then send the whole thing to Claude in a fresh conversation. It will come back as a finished page.

Here is what to collect before you start:

  • Your backtest chart description — from Step 2, look at your chart and write 2–3 sentences describing what you see: where the model's predictions tracked reality well, and where they diverged. You cannot paste an image directly into the prompt, so describe it in words — Claude will recreate a chart in the same style.
  • Your concert night memo — from Step 3, copy the exact plain-English operations memo your code printed. It should include the predicted trip count, the driver recommendation, and the surge call.
  • Your stress-test result — from Step 4, look at the summary table your code printed. Find the specific row where the surge recommendation flips from NO to YES (or YES to NO). Copy that row, then copy the plain-English conclusion paragraph your code printed at the bottom — it usually starts with something like "What the stress test revealed..." Paste both.
  • Your Step 8 reflection — write it now, before you send the prompt. It goes in last.
Prompt 8 — paste into a fresh Claude conversation to build your page
I am a student in MIS 432: AI in Business at Western Washington University. I just completed a lab building Uber's demand forecasting system from scratch. I need you to build a complete, self-contained HTML page I can publish to GitHub Pages as a portfolio piece. The page should showcase my actual work from the lab — not a generic summary. Here is everything to include: SECTION 1 — THE BUSINESS PROBLEM (you write this) One punchy paragraph explaining why demand forecasting is the foundation of Uber's entire business. No jargon. Direct. A recruiter who has never heard of Uber's AI system should understand it in 30 seconds. SECTION 2 — THE DATA AND THE MODEL (I will provide my chart) Write 3–4 sentences explaining what I did in Steps 1 and 2: I pulled two years of historical Friday night trip data near WWU campus in Bellingham, identified trend and seasonality, and used it to train a forecasting model. Then insert this placeholder where I will paste my backtest chart image: [PASTE YOUR BACKTEST CHART DESCRIPTION HERE — 2-3 sentences describing where the model tracked reality and where it diverged] Below the chart, write one sentence: "The dashed line is what the model predicted. The solid line is what actually happened. The gap between them is the model's error — and the honest measure of how much to trust it." SECTION 3 — THE CONCERT NIGHT PREDICTION (my actual memo) Write one sentence introducing this section: "Here is the actual operations recommendation the model produced for the Death Cab for Cutie, Sleater-Kinney, and ODESZA homecoming concert at WWU." Then insert this placeholder where I will paste my memo: [PASTE YOUR CONCERT NIGHT MEMO HERE — copy the exact text your Step 3 code printed] SECTION 4 — STRESS-TESTING THE FORECAST (my actual finding) Write one sentence: "Before sending that memo, I stress-tested the key assumptions to understand how fragile the recommendation was." Then insert this placeholder: [PASTE YOUR STRESS TEST RESULT HERE — the flip row from your table and the conclusion paragraph your code printed] Below it, write 2 sentences about what stress-testing means in practice — why a real analyst runs it before committing to a recommendation. SECTION 5 — WHEN THE ALGORITHM DECIDES (you write this) 2–3 sentences explaining the prediction-decision gap: how Uber's algorithm takes the forecast number and automatically sets the surge multiplier with no human review, and what that means when the model is right vs. when it is wrong (earthquake scenario). SECTION 6 — WHEN THE MODEL BREAKS (you write this) 2–3 sentences explaining distribution shift using the Bellingham scenario: six weeks after the concert, WWU enrollment dropped, new apartments opened in Cordata, Lyft entered the market — and the model had no idea. Drivers earned less. Riders waited longer or switched apps. Nobody told them why. SECTION 7 — MY ANALYSIS (my reflection answer) Insert this placeholder where I will paste my Step 8 reflection: [PASTE YOUR STEP 8 REFLECTION HERE — write it before sending this prompt] DESIGN: - Uber brand colors: black (#000000) background, cyan (#1fbad6) accent, white text - Clean professional layout, nav bar: "MIS 432 · AI in Business · Western Washington University" - Each section scannable in 10 seconds with a clear header - All CSS inline — no external files or CDN dependencies - Single .html file ready to upload to GitHub Pages Generate the complete HTML file with all placeholders clearly marked so I know exactly where to paste my content.
Step 8 Reflection — write this before building your page
You have now seen Uber's AI system from every angle — analyst, builder, stress-tester, observer, and driver. In one paragraph, describe how the steps connect: how a decision made at the data stage in Step 1 ultimately affects what a rider pays and what a driver earns six weeks later. Then add one sentence: if you were building this forecasting system for a real Bellingham business, what is the one thing you would add that this model didn't have?
✅ What to Submit

Submit two things: your GitHub Pages URL and a Word or PDF document with your answers to every reflection question below.

← Chapter 4: Uber Is AI Chapter 5: Waymo →