Recsys Music App
A full-stack music recommendation system built during AIM Labs (AI@MIT), exploring content-based similarity using deep learning on audio features.
What I Built
The system recommends similar songs based on acoustic properties rather than collaborative filtering. Given a query track, it finds sonically similar music by comparing learned embeddings of audio content.
How It Works
-
Audio Processing: Tracks are converted to mel spectrograms—a time-frequency representation that captures the acoustic characteristics humans perceive.
-
Embedding Model: A Siamese network learns to map spectrograms to a dense embedding space where similar-sounding tracks are close together.
-
Similarity Search: At query time, the embedding is looked up in a FAISS index for fast approximate nearest-neighbor search.
-
Web Interface: Users can search for songs, play previews, and explore recommendations through a clean web UI.
Technical Details
Model Architecture
The Siamese network uses a contrastive loss function trained on pairs of tracks:
- Same-artist tracks as positive pairs (assuming stylistic similarity)
- Random tracks as negative pairs
- CNN backbone processing 3-second spectrogram windows
- Output: 128-dimensional embedding vector
Training required careful data augmentation (time stretching, pitch shifting) to prevent the model from learning superficial correlations.
Backend Service
FastAPI handles the recommendation API:
- Endpoints for search, recommendation, and track metadata
- Background task queue for processing new uploads
- Connection pooling for database and FAISS index access
- Deployed on Cloud Run for auto-scaling
Frontend
Next.js frontend with:
- Audio player with waveform visualization
- Responsive grid layout for recommendations
- Server-side rendering for initial load performance
Results / Learnings
The system achieved reasonable recommendation quality on held-out data, though evaluating music similarity is inherently subjective.
Key learnings:
- Mel spectrograms capture meaningful acoustic features, but lose some information (lyrics, cultural context)
- FAISS makes embedding search practical at scale—fast approximate nearest-neighbor queries on large track collections
- User testing revealed that "similar" means different things to different people—some want genre similarity, others want mood or tempo
- The cold-start problem is real—content-based methods help with new tracks but miss social signals