[EXTRA] Music Search API
A complete Spring Boot application for semantic music search using OpenAI embeddings and Pinecone vector database.
✅ Implementation Status
All components from the original specification have been successfully implemented:
✅ Maven Project Structure - Complete with all dependencies
✅ Track Model - Comprehensive data model with all music attributes
✅ Configuration Classes - OpenAI and Pinecone configurations
✅ CSV Loader Service - Robust CSV parsing with error handling
✅ Embedding Service - OpenAI integration with caching
✅ Pinecone Service - Vector operations and similarity search
✅ Index Controller - CSV indexing with progress tracking
✅ Search Controller - Semantic search with suggestions
✅ Application Properties - Complete configuration setup
✅ Sample Data - Sample CSV with popular tracks
🚀 Quick Start
1. Prerequisites
Java 21
Maven 3.6+
OpenAI API Key
Pinecone Account and Index
2. Environment Variables
export OPENAI_API_KEY=sk-your-openai-api-key
export PINECONE_API_KEY=your-pinecone-api-key
export PINECONE_ENVIRONMENT=gcp-starter
export PINECONE_INDEX_NAME=music-tracks3. Build and Run
# Build the application
mvn clean package
# Run the application
mvn spring-boot:run
# Or run the JAR
java -jar target/music-search-0.0.1-SNAPSHOT.jarThe API will be available at http://localhost:8080
📡 API Endpoints
Index Management
Index CSV Data
POST /api/index?csvFileName=tracks_sample.csvResponse:
{
"status": "success",
"tracksLoaded": 10,
"embeddingsGenerated": 10,
"tracksIndexed": 10,
"processingTimeMs": 15420,
"message": "Successfully indexed 10 tracks in 15.42 seconds"
}Get Index Status
GET /api/index/statusClear Embedding Cache
DELETE /api/index/cacheHealth Check
GET /api/index/healthSearch Operations
Semantic Search
POST /api/search
Content-Type: application/json
{
"query": "energetic pop music for dancing",
"topK": 5
}Response:
{
"status": "success",
"query": "energetic pop music for dancing",
"matches": [
{
"id": "6f807x0ima9a1j3VPbc7VN",
"score": 0.91,
"metadata": {
"track_name": "I Don't Care (with Justin Bieber) - Loud Luxury Remix",
"track_artist": "Ed Sheeran",
"playlist_genre": "pop",
"playlist_subgenre": "dance pop",
"energy": 0.916,
"tempo": 122.036
}
}
],
"totalResults": 1,
"searchTimeMs": 245
}Search Suggestions
GET /api/search/suggestions?q=popSearch Health Check
GET /api/search/health🎵 Sample Search Queries
Try these example searches:
"upbeat dance music" - Finds high-energy, danceable tracks
"relaxing acoustic songs" - Finds calm, acoustic tracks
"workout motivation" - Finds high-tempo, energetic tracks
"romantic ballads" - Finds emotional, slow-tempo tracks
"party anthems" - Finds upbeat, danceable party music
🏗️ Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CSV Loader │───▶│ Embedding │───▶│ Pinecone │
│ │ │ Service │ │ Service │
│ - Parse CSV │ │ │ │ │
│ - Validate data │ │ - OpenAI API │ │ - Vector ops │
│ - Error handling│ │ - Caching │ │ - Similarity │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ Controllers │
│ │
│ - Index API │
│ - Search API │
└─────────────────┘🔧 Configuration
OpenAI Settings
Model: text-embedding-3-small
Timeout: 60 seconds (configurable)
Caching: Enabled with 1000 entry limit
Pinecone Settings
Environment: Configurable (gcp-starter, aws, etc.)
Index Name: Configurable
Vector Dimensions: 1536 (OpenAI embedding size)
CSV Processing
Batch Size: 100 tracks per batch
Error Handling: Continues processing on individual errors
Field Mapping: Automatic based on header names
📊 Features
✅ Implemented Features
Semantic Search: Natural language music queries
Batch Processing: Efficient CSV indexing
Caching: Embedding result caching
Error Handling: Robust error recovery
Health Checks: Service monitoring endpoints
CORS Support: Cross-origin requests
Logging: Comprehensive logging
Configuration: Environment-based config
🔄 Data Flow
Load CSV → Parse tracks from CSV file
Generate Embeddings → Create vector representations
Store in Pinecone → Index vectors with metadata
Search Query → Generate query embedding
Find Similar → Cosine similarity search
Return Results → Formatted track information
🧪 Testing
Manual Testing
Start the application
Index the sample data:
POST /api/index?csvFileName=tracks_sample.csvSearch for music:
POST /api/searchwith various queriesCheck status:
GET /api/index/status
Sample Queries to Test
# High energy dance music
curl -X POST http://localhost:8080/api/search \
-H "Content-Type: application/json" \
-d '{"query": "high energy dance music", "topK": 3}'
# Relaxing acoustic songs
curl -X POST http://localhost:8080/api/search \
-H "Content-Type: application/json" \
-d '{"query": "relaxing acoustic songs", "topK": 3}'
# Workout motivation
curl -X POST http://localhost:8080/api/search \
-H "Content-Type: application/json" \
-d '{"query": "workout motivation music", "topK": 3}'📝 Notes
The application uses OpenAI's
text-embedding-3-smallmodel for optimal performancePinecone handles vector similarity search with high performance
The system is designed to handle large music catalogs efficiently
All API endpoints include comprehensive error handling
The application supports both development and production configurations
🚀 Next Steps
Set up your API keys in environment variables
Create your Pinecone index with 1536 dimensions
Add your music CSV file to the resources directory
Test the API endpoints with sample queries
Deploy to production with proper monitoring
The implementation is complete and ready for use! 🎉
Última actualización