You will build a working recommendation engine step by step — the same way Spotify generated 1.4 billion personalized reports for the 2025 Wrapped, through prompt engineering. Along the way you will engineer features from raw data, discover hidden listener clusters, and see how the same dataset that powers recommendations also generates your own Wrapped.
Before touching any code, use Claude to build your intuition. Open claude.ai in a new tab and paste the first prompt below. Read Claude's response carefully — the concepts it explains here will show up in every step that follows. Then paste the second prompt to test yourself.
Every recommendation system starts with behavioral data. In this step you will create the raw material — simulated listening history for 10 users across 15 real songs. Open claude.ai and paste the prompt below. Claude will generate the code and explain what it does. Then open Google Colab, create a new notebook, and paste the code in to run it and see the output.
# Define the song catalog songs = [ {"name": "Blinding Lights", "artist": "The Weeknd", "genre": "Pop"}, # ... 14 more songs ] # Simulate user behavior # 0.1 = skipped, 0.5 = played once, 0.9 = replayed listening_history = [] for user_id in range(1, 11): for song in songs: score = simulate_engagement(user_id, song) listening_history.append({ "user_id": user_id, "song": song["name"], "engagement_score": score }) # Store as a DataFrame and print df = pd.DataFrame(listening_history) print(df.head(20))
You have raw behavioral data — numbers between 0 and 1 representing how each user engaged with each song. But raw data is not the same as useful data. Before any model can learn from it, those raw interactions need to be transformed into features: structured, meaningful summaries that describe a person's taste rather than just listing individual events.
# Calculate skip rate per user # Skip rate = proportion of songs with engagement score of 0.1 skip_rate = df[df['engagement_score'] == 0.1].groupby('user_id').size() / df.groupby('user_id').size() # Calculate replay rate per user # Replay rate = proportion of songs with engagement score of 0.9 replay_rate = df[df['engagement_score'] == 0.9].groupby('user_id').size() / df.groupby('user_id').size() # Find each user's top genre by average engagement score genre_affinity = df.groupby(['user_id', 'genre'])['engagement_score'].mean() top_genre = genre_affinity.groupby('user_id').idxmax().apply(lambda x: x[1]) top_genre_affinity = genre_affinity.groupby('user_id').max() # Combine into a user profile table — one row per user user_profiles = pd.DataFrame({ 'skip_rate': skip_rate, 'replay_rate': replay_rate, 'top_genre': top_genre, 'top_genre_affinity': top_genre_affinity }).reset_index() print(user_profiles)
Now that each user has a feature profile, you can ask a question Spotify asks constantly: which users are naturally similar to each other? This is where clustering comes in — and where one of the most important ideas in machine learning becomes visible. The algorithm finds groups of similar users without you ever telling it what those groups should be.
Each dot's position is determined by two engineered features: skip_rate (X axis) and replay_rate (Y axis). Nobody drew these clusters or named them. The algorithm placed each user at their coordinates and the groups formed on their own — because users with similar listening behavior naturally landed near each other.
Notice something important: the clusters are behavioral, not genre-based. The green Selective Listeners (top-left) skip a lot and replay selectively — they include a Pop fan (user_01), a Hip-Hop fan (user_04), and an Indie Pop fan (user_03). What they share is how they listen, not what they listen to. The purple Enthusiastic Replayers (user_08, user_09) barely skip anything and replay over 60% of songs — one loves Pop Rock, the other Hip-Hop. Again, behavior groups them, not genre.
Click any dot to trace the chain: raw listening events → engineered features → position in taste space → cluster membership. This is the same chain Spotify runs for 751 million users. The only difference is scale.
Now you add the algorithm. Before you paste the prompt, take a minute to understand what collaborative filtering is actually doing — the code will make a lot more sense once you see the logic.
| User | Blinding Lights | Levitating | As It Was | STAY | Peaches | Heat Waves | HUMBLE. | God's Plan |
|---|---|---|---|---|---|---|---|---|
| user_01 ←you | 0.9 | 0.9 | 0.9 | 0.9 | 0.1 | 0.5 | 0.1 | 0.1 |
| user_02 | 0.9 | 0.9 | 0.5 | 0.5 | 0.9 | 0.5 | 0.1 | 0.1 |
| user_03 | 0.1 | 0.1 | 0.1 | 0.1 | 0.5 | 0.1 | 0.9 | 0.9 |
| user_04 | 0.1 | 0.1 | 0.5 | 0.1 | 0.1 | 0.5 | 0.9 | 0.9 |
# Build the user-song matrix # Rows = users, columns = songs, values = engagement scores user_song_matrix = df.pivot_table( index='user_id', columns='song', values='engagement_score', fill_value=0 ) # Calculate cosine similarity # Measures the angle between two users' rows. # Users with similar taste point in the same direction — score near 1.0 similarity = cosine_similarity(user_song_matrix) sim_df = pd.DataFrame(similarity, index=user_song_matrix.index, columns=user_song_matrix.index) # Find the 3 users most similar to User 1 similar_users = sim_df[1].sort_values(ascending=False)[1:4] # Recommend songs those users loved that User 1 hasn't heard recommendations = get_recommendations( target_user=1, similar_users=similar_users, matrix=user_song_matrix )
You have built the algorithm in Python. Now you are going to build a live interactive version of it — a web app with Skip, Play, and Replay buttons and real-time recommendations. You will do this by prompting Claude to build it for you. Once you have your version working, compare it against the demo below to see how it stacks up.
You are going to add a Wrapped feature directly to the interactive system you built in Step 6 — no new data, no separate pipeline. The same engagement scores that power your recommendations will now generate a personalized year-in-review, right inside the same HTML file.
You have built a working recommendation system, demonstrated a real AI failure mode, and generated a Wrapped. Now use Claude to turn all of that into a polished live website. The site should do three things: tell the story of what you built and why it matters, showcase a working version of your recommendation system that visitors can actually use, and present your Wrapped output.
Claude will generate a complete HTML file. You copy it, paste it into GitHub as index.html, and you have a live site.
You just built a working recommendation system. Spotify built one too — and then watched it quietly fail for four months without anyone noticing. No crash. No error message. Just subtly worse recommendations, invisible on any dashboard.
Map what you built to Spotify's real system and to the AI Factory model from the case study.
For each step above, write one sentence describing: (a) what you built in this lab that corresponds to it, and (b) what Spotify does at that step at real scale.