← cs
$ cat projects/recsys-music-app.md

Recsys Music App

Web-based music similarity and recommendations using a Siamese network over mel spectrograms.

2023-12-15
PythonPyTorchFastAPINext.jsFAISSCloud Run

Recsys Music App

A full-stack music recommendation system built during AIM Labs (AI@MIT), exploring content-based similarity using deep learning on audio features.

What I Built

The system recommends similar songs based on acoustic properties rather than collaborative filtering. Given a query track, it finds sonically similar music by comparing learned embeddings of audio content.

How It Works

  1. Audio Processing: Tracks are converted to mel spectrograms—a time-frequency representation that captures the acoustic characteristics humans perceive.

  2. Embedding Model: A Siamese network learns to map spectrograms to a dense embedding space where similar-sounding tracks are close together.

  3. Similarity Search: At query time, the embedding is looked up in a FAISS index for fast approximate nearest-neighbor search.

  4. Web Interface: Users can search for songs, play previews, and explore recommendations through a clean web UI.

Technical Details

Model Architecture

The Siamese network uses a contrastive loss function trained on pairs of tracks:

  • Same-artist tracks as positive pairs (assuming stylistic similarity)
  • Random tracks as negative pairs
  • CNN backbone processing 3-second spectrogram windows
  • Output: 128-dimensional embedding vector

Training required careful data augmentation (time stretching, pitch shifting) to prevent the model from learning superficial correlations.

Backend Service

FastAPI handles the recommendation API:

  • Endpoints for search, recommendation, and track metadata
  • Background task queue for processing new uploads
  • Connection pooling for database and FAISS index access
  • Deployed on Cloud Run for auto-scaling

Frontend

Next.js frontend with:

  • Audio player with waveform visualization
  • Responsive grid layout for recommendations
  • Server-side rendering for initial load performance

Results / Learnings

The system achieved reasonable recommendation quality on held-out data, though evaluating music similarity is inherently subjective.

Key learnings:

  • Mel spectrograms capture meaningful acoustic features, but lose some information (lyrics, cultural context)
  • FAISS makes embedding search practical at scale—fast approximate nearest-neighbor queries on large track collections
  • User testing revealed that "similar" means different things to different people—some want genre similarity, others want mood or tempo
  • The cold-start problem is real—content-based methods help with new tracks but miss social signals