Recsys Music App

A full-stack music recommendation system built during AIM Labs (AI@MIT), exploring content-based similarity using deep learning on audio features.

What I Built

The system recommends similar songs based on acoustic properties rather than collaborative filtering. Given a query track, it finds sonically similar music by comparing learned embeddings of audio content.

How It Works

Audio Processing: Tracks are converted to mel spectrograms—a time-frequency representation that captures the acoustic characteristics humans perceive.
Embedding Model: A Siamese network learns to map spectrograms to a dense embedding space where similar-sounding tracks are close together.
Similarity Search: At query time, the embedding is looked up in a FAISS index for fast approximate nearest-neighbor search.
Web Interface: Users can search for songs, play previews, and explore recommendations through a clean web UI.

Technical Details

Model Architecture

The Siamese network uses a contrastive loss function trained on pairs of tracks:

Same-artist tracks as positive pairs (assuming stylistic similarity)
Random tracks as negative pairs
CNN backbone processing 3-second spectrogram windows
Output: 128-dimensional embedding vector

Training required careful data augmentation (time stretching, pitch shifting) to prevent the model from learning superficial correlations.

Backend Service

FastAPI handles the recommendation API:

Endpoints for search, recommendation, and track metadata
Background task queue for processing new uploads
Connection pooling for database and FAISS index access
Deployed on Cloud Run for auto-scaling

Frontend

Next.js frontend with:

Audio player with waveform visualization
Responsive grid layout for recommendations
Server-side rendering for initial load performance

Results / Learnings

The system achieved reasonable recommendation quality on held-out data, though evaluating music similarity is inherently subjective.

Key learnings:

Mel spectrograms capture meaningful acoustic features, but lose some information (lyrics, cultural context)
FAISS makes embedding search practical at scale—fast approximate nearest-neighbor queries on large track collections
User testing revealed that "similar" means different things to different people—some want genre similarity, others want mood or tempo
The cold-start problem is real—content-based methods help with new tracks but miss social signals