Lab 3 · Spotify · Recommendation Systems

Build a Spotify-Style
Recommendation System

You will build a working recommendation engine step by step — the same way Spotify generated 1.4 billion personalized reports for the 2025 Wrapped, through prompt engineering. Along the way you will engineer features from raw data, discover hidden listener clusters, and see how the same dataset that powers recommendations also generates your own Wrapped.

Claude Google Colab Live Demo GitHub Pages Spotify (optional)
🎵
What you build
Recommendation engine
👥
Technique
Collaborative filtering
⚙️
You engineer
Features from raw data
📊
You discover
Taste clusters (unsupervised)
⚠️
You also break it
Training-serving gap
🎁
You generate
Your own Wrapped
🌐
You publish
GitHub Pages site
The through-line
Every step in this lab maps to a step in Spotify's real AI Factory: Data → Model → Prediction → Decision → Value. Pay attention to how the same dataset powers both the recommendation engine and the Wrapped summary at the end — that connection is the central business lesson of this case.
You will also be doing what Spotify's engineers did
Every prompt you paste into Claude in this lab is an act of prompt engineering — the same discipline Spotify used to generate 1.4 billion personalized Wrapped reports in 2025. They spent three months iterating on prompts, running evaluations, and layering in human review to get reliable, on-brand outputs at scale. You will go through the same loop: write a prompt, see the output, refine it, and run it again. The difference is scale. The practice is identical.
Data
Model
Prediction
Decision
Value
1
Understand the Problem
AI Factory stage: framing the business challenge

Before touching any code, use Claude to build your intuition. Open claude.ai in a new tab and paste the first prompt below. Read Claude's response carefully — the concepts it explains here will show up in every step that follows. Then paste the second prompt to test yourself.

Prompt 1 of 2 — paste this into Claude first
I am a student in MIS 432: AI in Business at Western Washington University. We are studying how Spotify built its recommendation system. Please explain the following in plain English — no jargon, no code: 1. What problem is Spotify actually solving with recommendations? Why can't they just show everyone the same popular songs? 2. What is collaborative filtering and how does it work? Use an everyday analogy. 3. Spotify tracks implicit signals — skips, replays, session time — rather than asking users to rate songs. Why is that more useful? 4. What is the training-serving gap, and why is it dangerous specifically because it fails silently rather than crashing? Keep each answer to 3-4 sentences.
Prompt 2 of 2 — paste this into the same Claude conversation after reading the response
Now quiz me on those four concepts. Ask me one question at a time, wait for my answer, tell me if I got it right and what I missed, then move on to the next question. Don't move on until I can explain each concept correctly in my own words.
Step 01 Reflection — write 2–3 sentences
After completing the quiz with Claude, report your score: how many questions did you get right on the first try, and which concept was hardest to explain in your own words? Then answer this: Spotify has 751 million users — why can't they just hire enough humans to personally recommend music to everyone, and what does that tell you about why AI exists in this business context?
2
Build the Dataset
AI Factory stage: Data

Every recommendation system starts with behavioral data. In this step you will create the raw material — simulated listening history for 10 users across 15 real songs. Open claude.ai and paste the prompt below. Claude will generate the code and explain what it does. Then open Google Colab, create a new notebook, and paste the code in to run it and see the output.

What you are building
A table where each row is one listening event: a user, a song, and an implicit engagement score between 0 (skipped immediately) and 1 (replayed multiple times). This is the same type of data Spotify collects every time you interact with the app — except Spotify's table has billions of rows.
Feature engineering — you are already doing it
Notice that the raw event “user skipped this song” has already been transformed into a number: 0.1. The raw event “user replayed this song three times” becomes 0.9. That transformation — turning a messy real-world action into a clean, structured number a model can learn from — is called feature engineering. It happens before any algorithm runs, and its quality determines how well the whole system works.
Paste this into Claude — then copy the code Claude generates into Colab and run it
I am building a simplified Spotify-style recommendation system in Python for MIS 432: AI in Business at Western Washington University. Please write beginner-friendly Python code that creates a simulated listening history dataset. Use these exact 15 real songs: - "Blinding Lights" by The Weeknd (Pop) - "HUMBLE." by Kendrick Lamar (Hip-Hop) - "Levitating" by Dua Lipa (Pop) - "bad guy" by Billie Eilish (Indie Pop) - "God's Plan" by Drake (Hip-Hop) - "As It Was" by Harry Styles (Pop) - "MONTERO" by Lil Nas X (Hip-Hop) - "drivers license" by Olivia Rodrigo (Pop) - "Watermelon Sugar" by Harry Styles (Pop) - "Peaches" by Justin Bieber (R&B) - "Industry Baby" by Lil Nas X (Hip-Hop) - "good 4 u" by Olivia Rodrigo (Pop Rock) - "Leave The Door Open" by Bruno Mars (R&B) - "STAY" by The Kid LAROI and Justin Bieber (Pop) - "Heat Waves" by Glass Animals (Indie) Create 10 users. Each row should have: user_id, song_name, artist, genre, engagement_score (0.0-1.0 where 0.1 = skipped, 0.5 = played once, 0.9 = replayed). Make sure some users share similar taste so collaborative filtering will work. Store in a pandas DataFrame and print it. Give me the complete code in one block. Add a short plain-English comment above each section explaining what it does.
► What the code looks like — click to peek show
# Define the song catalog
songs = [
    {"name": "Blinding Lights", "artist": "The Weeknd", "genre": "Pop"},
    # ... 14 more songs
]

# Simulate user behavior
# 0.1 = skipped, 0.5 = played once, 0.9 = replayed
listening_history = []
for user_id in range(1, 11):
    for song in songs:
        score = simulate_engagement(user_id, song)
        listening_history.append({
            "user_id": user_id,
            "song": song["name"],
            "engagement_score": score
        })

# Store as a DataFrame and print
df = pd.DataFrame(listening_history)
print(df.head(20))
Step 02 Reflection — write 2–3 sentences
Look at your dataset. Each row represents one person's reaction to one song — captured as a number between 0 and 1. Spotify collects this same kind of data for 751 million users across 100 million songs. What does that scale tell you about why this behavioral data is so valuable, and why a new competitor would struggle to replicate it?
3
Feature Engineering
AI Factory stage: Data → turning raw behavior into something a model can use

You have raw behavioral data — numbers between 0 and 1 representing how each user engaged with each song. But raw data is not the same as useful data. Before any model can learn from it, those raw interactions need to be transformed into features: structured, meaningful summaries that describe a person's taste rather than just listing individual events.

What feature engineering actually means
Raw data: "user_01 played HUMBLE. with score 0.1, played Blinding Lights with score 0.9, played God's Plan with score 0.1." That is three separate rows. Feature engineering collapses those rows into meaningful summaries: skip rate in Hip-Hop = 67%, replay rate in Pop = 90%, top genre = Pop with affinity score 0.93. Now the model has one row per user that describes their taste — not three disconnected events. That transformation is feature engineering, and its quality often matters more than which algorithm you choose.
Paste this into Claude — then copy the code into Colab and run it
I have a listening history dataset from my previous step. It has columns: user_id, song, artist, genre, engagement_score (0.1 = skipped, 0.5 = played once, 0.9 = replayed). Please add Python code that engineers these three new features for each user, and explain each one in a plain-English comment: 1. skip_rate — the proportion of songs this user skipped (engagement_score = 0.1) out of all songs they interacted with 2. replay_rate — the proportion of songs this user replayed (engagement_score = 0.9) out of all songs they interacted with 3. top_genre_affinity — for each user, find the genre where their average engagement score is highest, then record that average score as a number After engineering the features, create a new DataFrame called user_profiles where each row is one user and the columns are: user_id, skip_rate, replay_rate, top_genre, top_genre_affinity. Print it clearly. Give me all previous code plus this new code in one complete block. Add a comment above each new section explaining what it does and why, as if explaining to someone who has never coded before.
► What feature engineering looks like — click to peek show
# Calculate skip rate per user
# Skip rate = proportion of songs with engagement score of 0.1
skip_rate = df[df['engagement_score'] == 0.1].groupby('user_id').size() /             df.groupby('user_id').size()

# Calculate replay rate per user
# Replay rate = proportion of songs with engagement score of 0.9
replay_rate = df[df['engagement_score'] == 0.9].groupby('user_id').size() /               df.groupby('user_id').size()

# Find each user's top genre by average engagement score
genre_affinity = df.groupby(['user_id', 'genre'])['engagement_score'].mean()
top_genre = genre_affinity.groupby('user_id').idxmax().apply(lambda x: x[1])
top_genre_affinity = genre_affinity.groupby('user_id').max()

# Combine into a user profile table — one row per user
user_profiles = pd.DataFrame({
    'skip_rate': skip_rate,
    'replay_rate': replay_rate,
    'top_genre': top_genre,
    'top_genre_affinity': top_genre_affinity
}).reset_index()
print(user_profiles)
Step 03 Reflection — write 2–3 sentences
Look at your user_profiles table. Two users might have very similar individual song scores but very different skip rates and replay rates — which tells a completely different story about their listening habits. Why does collapsing raw events into engineered features give a model a better picture of a user's taste than looking at individual song scores row by row?
Where your data goes from here
Raw behavioral event (Step 2)
user_01 | Blinding Lights | 0.9
user_id · song · engagement_score
⚙ Feature Engineering (Step 3)
skip_rate replay_rate top_genre_affinity
one row per user · describes taste, not events
Clustering (Step 4)
Groups users by taste profile.
Axes: skip_rate vs replay_rate.
No labels given.
Recommender (Step 5)
Pivots raw scores into a user × song matrix. Cosine similarity on behavior — not features.
The same raw data feeds two separate models. Feature engineering is used for clustering. Raw scores are used for the recommender. Both choices are deliberate.
Figure 1. Data flow from raw behavioral events through feature engineering to two downstream models.
4
Clustering & Unsupervised Learning
AI Factory stage: Model → finding hidden structure with no labels

Now that each user has a feature profile, you can ask a question Spotify asks constantly: which users are naturally similar to each other? This is where clustering comes in — and where one of the most important ideas in machine learning becomes visible. The algorithm finds groups of similar users without you ever telling it what those groups should be.

Supervised vs. unsupervised learning
Most of the AI you have seen in this lab so far is supervised learning — the model learns from labeled examples. You give it data where you already know the answer ("this engagement score means the user liked the song") and it learns to predict that answer for new cases. Unsupervised learning is different. There are no labels. You give the model data and ask it to find hidden structure on its own — groups, patterns, relationships that nobody defined in advance. Clustering is the most common form of unsupervised learning. Spotify never told its system "here is what a hip-hop fan looks like." The system found those groups itself, from behavior alone.
Prompt 1 of 2 — paste this into Claude to build your intuition first
I am a student in MIS 432: AI in Business at Western Washington University. We are studying how Spotify uses clustering and unsupervised learning. Please explain the following using only plain English and everyday analogies — no code, no technical jargon: 1. What is clustering? Use a concrete real-world analogy that has nothing to do with music or technology. 2. What makes it "unsupervised"? What is the model NOT being given that makes it different from the collaborative filtering we already built? 3. Spotify has a cluster of listeners it calls "late-night chill" — but no human ever created that label. How did that name emerge, and what does that tell us about what unsupervised learning can discover that humans might miss? Keep each answer to 3-4 sentences.
Prompt 2 of 2 — paste this into the same Claude conversation after reading the response
Here is the user_profiles table I built in my previous step. Each row is one user with their skip_rate, replay_rate, top_genre, and top_genre_affinity: [paste your user_profiles table output from Colab here] Without running any algorithm — just by reading these numbers — what three natural groups do you see among these users? For each group, give it a descriptive name (like "casual pop listener" or "devoted hip-hop fan") and explain what pattern in the data led you to that grouping. Then explain: this is essentially what a clustering algorithm does automatically — for 751 million users simultaneously. What would that scale of discovery mean strategically for Spotify?
The key insight
Claude just grouped your users by reading their feature profiles — the same logic a clustering algorithm applies at scale. Notice that nobody told Claude (or the algorithm) what the groups should be called, how many there should be, or what characteristics define each one. The groups emerged from the data. That is what unsupervised means: the structure was always there, hidden in the numbers. The algorithm just surfaced it.
Now see it — the taste space your feature engineering created
Listener Taste Clusters
Each dot is one user. Position = their engineered features. Click any dot.
Selective Listener Enthusiastic Replayer Casual Explorer
← More Skips · Replay Rate · More Replays →
← Fewer Skips · Skip Rate · More Skips →
Selected user
Click a dot to explore
These positions come directly from the user_profiles table your feature engineering produced. The clusters were not defined — they emerged from where users naturally landed in this space.
Figure 2. Listener Taste Clusters — 9 users plotted by skip_rate (X) and replay_rate (Y). Clusters emerged from behavior, not genre labels.

Each dot's position is determined by two engineered features: skip_rate (X axis) and replay_rate (Y axis). Nobody drew these clusters or named them. The algorithm placed each user at their coordinates and the groups formed on their own — because users with similar listening behavior naturally landed near each other.

Notice something important: the clusters are behavioral, not genre-based. The green Selective Listeners (top-left) skip a lot and replay selectively — they include a Pop fan (user_01), a Hip-Hop fan (user_04), and an Indie Pop fan (user_03). What they share is how they listen, not what they listen to. The purple Enthusiastic Replayers (user_08, user_09) barely skip anything and replay over 60% of songs — one loves Pop Rock, the other Hip-Hop. Again, behavior groups them, not genre.

Click any dot to trace the chain: raw listening events → engineered features → position in taste space → cluster membership. This is the same chain Spotify runs for 751 million users. The only difference is scale.

Step 04 Reflection — write 2–3 sentences on each part
Part A — Claude's groupings: When you pasted your user_profiles table into Claude and asked it to find natural groups, what three clusters did it identify, and what features in the data led it to each one? How closely did Claude's groupings match what you can see in the Listener Taste Clusters chart above?

Part B — What the chart reveals: Look at the green Selective Listeners cluster. It contains a Pop fan, a Hip-Hop fan, and an Indie Pop fan — three very different genre tastes, but they all landed in the same cluster. What does that tell you about what the algorithm actually discovered? And why is grouping people by how they listen potentially more useful for Spotify than grouping them by what genre they prefer?

Part C — Scale and strategy: Spotify discovers these listener communities without ever asking a single user "what kind of music fan are you?" What does it mean strategically that a company can learn this much about its customers purely from behavioral data — and what are the implications for privacy and trust?
5
Build the Recommendation Engine
AI Factory stage: Model + Prediction

Now you add the algorithm. Before you paste the prompt, take a minute to understand what collaborative filtering is actually doing — the code will make a lot more sense once you see the logic.

Quick reminder: what is collaborative filtering?
Collaborative filtering is the idea that people with similar taste in the past will have similar taste in the future. The system never listens to a song or reads its lyrics — it only looks at human behavior. Who listened to what, and how much did they engage? That pattern of behavior is enough to make surprisingly good recommendations.
How Collaborative Filtering works — four steps
STEP 01
Build a matrix
Users on one axis, songs on the other. Each cell is an engagement score — 0.9 = replayed, 0.5 = played, 0.1 = skipped, 0 = not heard. This is your Step 2 dataset, pivoted into a grid.
User Blinding Lights Levitating As It Was STAY Peaches Heat Waves HUMBLE. God's Plan
user_01 ←you 0.9 0.9 0.9 0.9 0.1 0.5 0.1 0.1
user_020.90.90.50.50.90.50.10.1
user_030.10.10.10.10.50.10.90.9
user_040.10.10.50.10.10.50.90.9
Showing 4 of 10 users · 8 of 15 songs for clarity
STEP 02
Find similar users
Compare user_01's row against every other user. Users whose rows look most similar have similar taste — measured with cosine similarity. Score closer to 1.0 = more similar.
user_10
0.89 ✓
user_02
0.87 ✓
user_06
0.84 ✓
user_03
0.35
Top 3 similar users selected (✓) · user_03 is too different to be useful
STEP 03
Identify gaps
Look at what those similar users loved that user_01 hasn't heard yet. Those unheard songs are the candidates for recommendation.
user_01 has heard
Blinding Lights Levitating As It Was STAY drivers license
gaps — not yet heard
Peaches bad guy Leave The Door Open HUMBLE. God's Plan
STEP 04
Rank & surface
Score each candidate based on how much the similar users loved it. The higher the score, the stronger the recommendation.
1
Peaches
Justin Bieber · R&B
0.763
2
bad guy
Billie Eilish · Indie Pop
0.507
3
Leave The Door Open
Bruno Mars · R&B
0.496
4
HUMBLE.
Kendrick Lamar · Hip-Hop
0.230
5
God's Plan
Drake · Hip-Hop
0.100
Score closer to 0.9 = similar users replayed it · closer to 0.1 = they only skipped it
Notice what the algorithm never does: it never listens to the songs or analyzes their genre. It only looks at human behavior — which is exactly what makes it dependent on having lots of users.
Figure 3. How collaborative filtering works — four steps from user-song matrix to ranked recommendations.
Paste this into Claude — then copy the code into Colab and run it
Using the listening history dataset from my previous step, please add code that builds a collaborative filtering recommendation engine. The code should: 1. Create a user-song matrix where rows are users, columns are songs, and cells are engagement scores (0 if not listened) 2. Calculate similarity between users using cosine similarity — add a plain English comment explaining what cosine similarity measures 3. For User 1, find the 3 most similar users 4. Recommend the top 5 songs that similar users loved but User 1 has not heard yet 5. Print the recommendations clearly with song name, artist, genre, and predicted score Give me all previous code plus this new code in one complete block. Add a comment above each new section explaining what it does and why, as if explaining to someone who has never coded before.
► What the algorithm looks like — click to peek show
# Build the user-song matrix
# Rows = users, columns = songs, values = engagement scores
user_song_matrix = df.pivot_table(
    index='user_id', columns='song',
    values='engagement_score', fill_value=0
)

# Calculate cosine similarity
# Measures the angle between two users' rows.
# Users with similar taste point in the same direction — score near 1.0
similarity = cosine_similarity(user_song_matrix)
sim_df = pd.DataFrame(similarity,
    index=user_song_matrix.index,
    columns=user_song_matrix.index)

# Find the 3 users most similar to User 1
similar_users = sim_df[1].sort_values(ascending=False)[1:4]

# Recommend songs those users loved that User 1 hasn't heard
recommendations = get_recommendations(
    target_user=1, similar_users=similar_users,
    matrix=user_song_matrix
)
Step 05 Reflection — write 2–3 sentences
Your system just recommended songs to User 1 without knowing anything about what those songs actually sound like — it only looked at who else listened to them. Why does that work as a way to find good recommendations, and what does it tell you about why Spotify needs hundreds of millions of users for its system to be accurate?
6
Build the Live System — Then Try the Demo
AI Factory stage: Decision

You have built the algorithm in Python. Now you are going to build a live interactive version of it — a web app with Skip, Play, and Replay buttons and real-time recommendations. You will do this by prompting Claude to build it for you. Once you have your version working, compare it against the demo below to see how it stacks up.

Why build it yourself first?
If you see the demo first, building your own version just feels like copying. If you build first, the demo becomes a benchmark — and the moment you compare the two is where the real learning happens. What does your version do the same? What does it do differently? Why?
Step 1 — paste this into Claude to build your own interactive system
I am a student in MIS 432: AI in Business at Western Washington University. I have just built a collaborative filtering recommendation engine in Python. Now I want to build an interactive web version of it. Please build me a complete, single HTML file that: 1. Shows a catalog of these 15 songs with Skip, Play, and Replay buttons for each: - "Blinding Lights" by The Weeknd (Pop) - "HUMBLE." by Kendrick Lamar (Hip-Hop) - "Levitating" by Dua Lipa (Pop) - "bad guy" by Billie Eilish (Indie Pop) - "God's Plan" by Drake (Hip-Hop) - "As It Was" by Harry Styles (Pop) - "MONTERO" by Lil Nas X (Hip-Hop) - "drivers license" by Olivia Rodrigo (Pop) - "Watermelon Sugar" by Harry Styles (Pop) - "Peaches" by Justin Bieber (R&B) - "Industry Baby" by Lil Nas X (Hip-Hop) - "good 4 u" by Olivia Rodrigo (Pop Rock) - "Leave The Door Open" by Bruno Mars (R&B) - "STAY" by The Kid LAROI and Justin Bieber (Pop) - "Heat Waves" by Glass Animals (Indie) 2. Records implicit engagement scores as the user interacts: 0.9 for replay, 0.5 for play, 0.1 for skip 3. Uses collaborative filtering and cosine similarity to generate real-time recommendations based on these 9 pre-defined users and their engagement scores: user_01: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.1, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.5, Peaches=0.1, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.1, drivers license=0.9, good 4 u=0.5 user_02: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.1, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.5, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.9, good 4 u=0.1 user_03: As It Was=0.1, Blinding Lights=0.1, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.1, Levitating=0.1, MONTERO=0.5, Peaches=0.5, STAY=0.1, Watermelon Sugar=0.5, bad guy=0.9, drivers license=0.1, good 4 u=0.1 user_04: As It Was=0.5, Blinding Lights=0.1, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.5, Industry Baby=0.9, Leave The Door Open=0.1, Levitating=0.1, MONTERO=0.9, Peaches=0.1, STAY=0.1, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.1, good 4 u=0.1 user_05: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.5, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.5, Levitating=0.5, MONTERO=0.1, Peaches=0.9, STAY=0.5, Watermelon Sugar=0.5, bad guy=0.5, drivers license=0.1, good 4 u=0.5 user_06: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.5, Heat Waves=0.1, Industry Baby=0.1, Leave The Door Open=0.9, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.9, Watermelon Sugar=0.5, bad guy=0.1, drivers license=0.9, good 4 u=0.9 user_07: As It Was=0.5, Blinding Lights=0.1, God's Plan=0.1, HUMBLE.=0.5, Heat Waves=0.9, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.9, Peaches=0.5, STAY=0.5, Watermelon Sugar=0.9, bad guy=0.9, drivers license=0.5, good 4 u=0.5 user_08: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.1, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.5, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.9, good 4 u=0.9 user_09: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.5, Levitating=0.9, MONTERO=0.9, Peaches=0.5, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.1, drivers license=0.9, good 4 u=0.1 4. Shows a live data log of signals being collected as the user interacts 5. Uses Spotify-style design: dark background (#191414), green accents (#1DB954) Give me the complete HTML in one block. Add plain-English comments explaining each section.
Step 2 — open your HTML file and test it
Save the HTML file Claude generates to your desktop. Open it in your browser. Interact with at least 8 songs and watch the recommendations update in real time. Does it work? If anything breaks, paste the error back to Claude and say "fix this."
Step 3 — now compare against the demo below
Use the live demo below and interact with the same songs in the same order. Do you get the same recommendations? If not — why not? What did you do differently in your prompt that changed the behavior?
MIS 432 Recommendation Lab
Your interactions train the recommender in real time
Demo mode
Catalog — skip, play, or replay each track
0 played
0 skipped
0 replayed
implicit signals
▶ Playing
Recommended for you
Interact with a few songs to see recommendations
Live data log
Waiting for interactions...
Your Wrapped
Notice the explore/exploit tradeoff
As you interact, notice how your recommendations change. If you only replay the same type of song, the system exploits what it knows about you — narrowing in on familiar territory. If you skip around and try different things, it explores — broadening your recommendations. This tension between familiarity and discovery is one of the most important design decisions in any recommendation system. Spotify's ranking layer manages it deliberately for 751 million people simultaneously.
Figure 4. Live recommendation system — collaborative filtering and Wrapped running in real time on your interactions.
Step 06 Reflection — write 2–3 sentences
Watch the live data log as you interact. Every click — skip, play, replay — is being recorded as a number and fed into the algorithm. Spotify calls these implicit signals. Why is this type of behavioral data more valuable than simply asking users to rate songs on a scale of 1 to 5?
7
Generate Your Wrapped
AI Factory stage: Value

You are going to add a Wrapped feature directly to the interactive system you built in Step 6 — no new data, no separate pipeline. The same engagement scores that power your recommendations will now generate a personalized year-in-review, right inside the same HTML file.

The central insight of this step
The Wrapped you are about to build uses the exact same data — the same clicks, the same engagement scores — that already powers your recommendation engine. Spotify does not build a separate system for Wrapped. It is the same infrastructure, repurposed. That is why it costs Spotify almost nothing to produce, and why it generated 600 million social media posts in 2023.
You are doing what Spotify’s engineers did
In 2025, Spotify used prompt engineering to generate 1.4 billion personalized Wrapped reports. Their team spent three months iterating on prompts — designing a system prompt to define the creative contract (tone, safety rules, data-driven storytelling) and a user prompt to supply each listener’s specific context. The prompt you are about to paste into Claude is your version of that system. Think carefully about what you ask for — the quality of your output depends entirely on the quality of your instructions.
In the same Claude conversation from Step 6, send this follow-up
Based on the Spotify recommendation engine you just built, can you add a Wrapped feature to it? Specifically: 1. Add a "Generate my Wrapped" button that becomes active after the user has interacted with at least 5 songs 2. When clicked, it should compute and display — using only the engagement data already being collected in the page — a Wrapped summary that includes: - Top 3 songs (highest engagement scores from this session) - Top artist (artist with the most high-engagement songs) - Top genre - Total interactions this session - A fun personalized line like "You were in the top X% of [artist] listeners this session" 3. Display the Wrapped output in a styled panel inside the existing interface, using the same Spotify dark theme (#191414, #1DB954) 4. Add a small note inside the Wrapped output that says: "This Wrapped was generated from the exact same data log that powered your recommendations above." Return the complete updated HTML file in one block.
Step 07 Reflection — write 2–3 sentences
Spotify Wrapped is essentially a marketing campaign that costs almost nothing extra to produce because it runs on a data pipeline that already exists. Can you think of another company in any industry that could do something similar — turning data they already collect into a product or experience their customers would want to share?
8
Publish Your Work on GitHub Pages
Turn what you built into a live portfolio piece

You have built a working recommendation system, demonstrated a real AI failure mode, and generated a Wrapped. Now use Claude to turn all of that into a polished live website. The site should do three things: tell the story of what you built and why it matters, showcase a working version of your recommendation system that visitors can actually use, and present your Wrapped output.

Claude will generate a complete HTML file. You copy it, paste it into GitHub as index.html, and you have a live site.

Why this matters
Most business students can say they “learned about AI” in a class. A live website where visitors can actually interact with a recommendation system you built — and see a Wrapped summary generated from real behavioral data — is something completely different. This is your deliverable and something you can be genuinely proud of.
A note on Spotify Connect
The prompt below uses your professor’s Spotify Developer app Client ID. The Connect Spotify button will work on the lab page hosted on the course site. For it to work on your own GitHub Pages site, your URL needs to be registered in the developer app. You have two options: (1) Email your professor your GitHub Pages URL (e.g. https://yourname.github.io/spotify-recommendation-lab) and they will add it to the developer app for you. (2) Create your own free Spotify Developer account at developer.spotify.com, register a new app, add your GitHub Pages URL as a Redirect URI, and replace the Client ID in your HTML with yours. Either way, demo mode works for everyone regardless — you only need this step if you want real Spotify audio playback on your personal site.
Paste this into Claude — then copy the HTML it generates
I am a student in MIS 432: AI in Business at Western Washington University's College of Business and Economics. I have just completed a lab where I built a simplified Spotify-style recommendation system from scratch using Python and collaborative filtering. Here is what I built and learned: 1. I created a simulated listening history dataset for 10 users across 15 real songs, where each row captured an implicit engagement signal — a number between 0 (skipped) and 1 (replayed multiple times) 2. I did feature engineering — transforming raw listening events into structured engagement scores that a machine learning model could actually use. I learned that the quality of your features determines the quality of your model. 3. I built a collaborative filtering recommendation engine that creates a user-song matrix, calculates cosine similarity between users, and recommends songs that similar users loved but the target user hasn't heard yet 4. I learned about unsupervised learning through clustering — how Spotify groups listeners into taste clusters without anyone labeling the data, and how those hidden patterns in behavior drive personalization at scale 5. I explored what happens when AI systems fail silently — and why AI governance is a technology management problem, not just a technical one 6. I generated a Wrapped-style year-in-review summary for a user using the exact same dataset that powered the recommendations — demonstrating that Spotify Wrapped costs almost nothing to produce because it runs on the same infrastructure Please create a single, complete HTML file for GitHub Pages that does ALL THREE of the following: PART 1 — THE BUSINESS STORY - Explain what Spotify's business problem was: 100 million songs, 751 million users, and the need to make personalization the core strategy - Explain what a recommendation engine is and why it matters for businesses, in plain English - Walk through the key concepts I learned: implicit signals, feature engineering, collaborative filtering, unsupervised learning and taste clustering, embeddings, the training-serving gap, AI governance, and the data moat - Explain why Wrapped costs almost nothing to produce and what that tells us about data infrastructure as a business asset - End with 3 key business lessons from this project PART 2 — THE LIVE RECOMMENDATION SYSTEM This section must feel like a real, polished Spotify-quality product. Build it to this exact specification: - A song catalog with these exact 15 songs, each with Skip, Play, and Replay buttons: "Blinding Lights" by The Weeknd (Pop), "HUMBLE." by Kendrick Lamar (Hip-Hop), "Levitating" by Dua Lipa (Pop), "bad guy" by Billie Eilish (Indie Pop), "God's Plan" by Drake (Hip-Hop), "As It Was" by Harry Styles (Pop), "MONTERO" by Lil Nas X (Hip-Hop), "drivers license" by Olivia Rodrigo (Pop), "Watermelon Sugar" by Harry Styles (Pop), "Peaches" by Justin Bieber (R&B), "Industry Baby" by Lil Nas X (Hip-Hop), "good 4 u" by Olivia Rodrigo (Pop Rock), "Leave The Door Open" by Bruno Mars (R&B), "STAY" by The Kid LAROI & Justin Bieber (Pop), "Heat Waves" by Glass Animals (Indie) - Engagement scores: 0.9 for replay, 0.5 for play, 0.1 for skip - Collaborative filtering using cosine similarity against these 9 pre-defined users: user_01: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.1, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.5, Peaches=0.1, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.1, drivers license=0.9, good 4 u=0.5 user_02: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.1, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.5, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.9, good 4 u=0.1 user_03: As It Was=0.1, Blinding Lights=0.1, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.1, Levitating=0.1, MONTERO=0.5, Peaches=0.5, STAY=0.1, Watermelon Sugar=0.5, bad guy=0.9, drivers license=0.1, good 4 u=0.1 user_04: As It Was=0.5, Blinding Lights=0.1, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.5, Industry Baby=0.9, Leave The Door Open=0.1, Levitating=0.1, MONTERO=0.9, Peaches=0.1, STAY=0.1, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.1, good 4 u=0.1 user_05: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.5, Heat Waves=0.5, Industry Baby=0.1, Leave The Door Open=0.5, Levitating=0.5, MONTERO=0.1, Peaches=0.9, STAY=0.5, Watermelon Sugar=0.5, bad guy=0.5, drivers license=0.1, good 4 u=0.5 user_06: As It Was=0.9, Blinding Lights=0.9, God's Plan=0.1, HUMBLE.=0.5, Heat Waves=0.1, Industry Baby=0.1, Leave The Door Open=0.9, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.9, Watermelon Sugar=0.5, bad guy=0.1, drivers license=0.9, good 4 u=0.9 user_07: As It Was=0.5, Blinding Lights=0.1, God's Plan=0.1, HUMBLE.=0.5, Heat Waves=0.9, Industry Baby=0.1, Leave The Door Open=0.1, Levitating=0.9, MONTERO=0.9, Peaches=0.5, STAY=0.5, Watermelon Sugar=0.9, bad guy=0.9, drivers license=0.5, good 4 u=0.5 user_08: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.1, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.5, Levitating=0.9, MONTERO=0.1, Peaches=0.9, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.5, drivers license=0.9, good 4 u=0.9 user_09: As It Was=0.5, Blinding Lights=0.9, God's Plan=0.9, HUMBLE.=0.9, Heat Waves=0.1, Industry Baby=0.9, Leave The Door Open=0.5, Levitating=0.9, MONTERO=0.9, Peaches=0.5, STAY=0.9, Watermelon Sugar=0.9, bad guy=0.1, drivers license=0.9, good 4 u=0.1 - A Spotify Connect button that lets users with Spotify Premium log in using OAuth PKCE flow with this client ID: b586d81f528c4bc29c12141188543ec1 — when connected, clicking Play or Replay should stream the actual song through Spotify's Web Playback SDK. Users without Premium use demo mode automatically. - A playbar at the bottom showing the currently playing song with Play, Pause, and Stop controls - A live data log showing implicit signals being collected as visitors interact - A "Generate my Wrapped" button that produces a personalized summary from the same behavioral data - A signal counter showing total plays, skips, and replays PART 3 — DESIGN - Use Spotify-style design: dark background (#191414), green accents (#1DB954), clean and professional - The recommendation system must feel like a real product — polished, not a class project - Two-column layout for the system: song catalog on the left, recommendations and data log on the right - Be written for a potential employer who has no technical background — focus on business insight, not code - Include a note that this was completed as part of MIS 432: AI in Business at Western Washington University Give me the complete HTML in one block that I can paste directly into GitHub as index.html.

Publish on GitHub

  1. Go to github.com and create a new repository named spotify-recommendation-lab
  2. Click Add file → Create new file
  3. Name it index.html and paste the HTML Claude generated
  4. Click Commit changes
  5. Go to Settings → Pages → Deploy from branch → main and save
  6. Wait 1–2 minutes, then your site will be live at https://yourusername.github.io/spotify-recommendation-lab
Want to change something on your site?
Don't regenerate the whole thing. Go back to Claude and ask it to change specific sections — for example "can you make the header darker" or "can you add a section about what I found most surprising." Claude can edit targeted parts of the HTML without touching the rest. This is a much more efficient way to work.
Step 08 Reflection — write 2–3 sentences
Look at the website Claude generated. You described what you built in plain English and Claude turned it into a polished, employer-facing site. What does that tell you about how AI changes what non-technical business students can produce — and what skills become more valuable as a result?
9
AI Governance: The Management Problem
AI Factory stage: keeping the system working

You just built a working recommendation system. Spotify built one too — and then watched it quietly fail for four months without anyone noticing. No crash. No error message. Just subtly worse recommendations, invisible on any dashboard.

What happened at Spotify
When Spotify moved the Podcast Model from batch processing to real-time processing, a small discrepancy crept in between how training data and live data were being processed. Batch processing had computed recommendations overnight in large chunks; real-time processing had to score songs on the fly for each user. That shift introduced subtle differences in how the data was handled — and the model kept running, podcast recommendations kept appearing, they were just quietly worse than they should have been. Nobody noticed for four months. Spotify’s real fix wasn’t just technical. They built monitoring systems, unified their data pipeline, and assigned people to own it. That’s AI governance.
Step 09 Reflection — write a few sentences
Based on what you read in the case study and what you just built: why is AI governance fundamentally a technology management question — not just “management” in the general sense? What specific technical realities (model drift, training-serving gaps, silent failures, data pipeline dependencies) create the need for management decisions that a non-technical manager would be poorly equipped to make alone? Be specific about what organizational decisions, roles, or processes would need to exist — and why understanding the technology is inseparable from governing it well.
10
Connect Back to the Business Case
AI Factory stage: the full loop

Map what you built to Spotify's real system and to the AI Factory model from the case study.

Data
Model
Prediction
Decision
Value

For each step above, write one sentence describing: (a) what you built in this lab that corresponds to it, and (b) what Spotify does at that step at real scale.

Step 10 Final Reflection — write a few sentences
Your system used 10 simulated users. Spotify uses 751 million real ones. Identify one thing that works fine at your scale that would break or become dangerous at Spotify's scale — and explain how Spotify would need to address it. Think about the data, the algorithm, the training-serving gap, or governance. Use specific concepts from the case study in your answer.
What to Submit
← Back to Case Study ← Back to Course Home