music[EXTRA] Music Search API

A complete Spring Boot application for semantic music search using OpenAI embeddings and Pinecone vector database.

✅ Implementation Status

All components from the original specification have been successfully implemented:

  • Maven Project Structure - Complete with all dependencies

  • Track Model - Comprehensive data model with all music attributes

  • Configuration Classes - OpenAI and Pinecone configurations

  • CSV Loader Service - Robust CSV parsing with error handling

  • Embedding Service - OpenAI integration with caching

  • Pinecone Service - Vector operations and similarity search

  • Index Controller - CSV indexing with progress tracking

  • Search Controller - Semantic search with suggestions

  • Application Properties - Complete configuration setup

  • Sample Data - Sample CSV with popular tracks

🚀 Quick Start

1. Prerequisites

  • Java 21

  • Maven 3.6+

  • OpenAI API Key

  • Pinecone Account and Index

2. Environment Variables

3. Build and Run

The API will be available at http://localhost:8080

📡 API Endpoints

Index Management

Index CSV Data

Response:

Get Index Status

Clear Embedding Cache

Health Check

Search Operations

Response:

Search Suggestions

Search Health Check

🎵 Sample Search Queries

Try these example searches:

  1. "upbeat dance music" - Finds high-energy, danceable tracks

  2. "relaxing acoustic songs" - Finds calm, acoustic tracks

  3. "workout motivation" - Finds high-tempo, energetic tracks

  4. "romantic ballads" - Finds emotional, slow-tempo tracks

  5. "party anthems" - Finds upbeat, danceable party music

🏗️ Architecture

🔧 Configuration

OpenAI Settings

  • Model: text-embedding-3-small

  • Timeout: 60 seconds (configurable)

  • Caching: Enabled with 1000 entry limit

Pinecone Settings

  • Environment: Configurable (gcp-starter, aws, etc.)

  • Index Name: Configurable

  • Vector Dimensions: 1536 (OpenAI embedding size)

CSV Processing

  • Batch Size: 100 tracks per batch

  • Error Handling: Continues processing on individual errors

  • Field Mapping: Automatic based on header names

📊 Features

✅ Implemented Features

  • Semantic Search: Natural language music queries

  • Batch Processing: Efficient CSV indexing

  • Caching: Embedding result caching

  • Error Handling: Robust error recovery

  • Health Checks: Service monitoring endpoints

  • CORS Support: Cross-origin requests

  • Logging: Comprehensive logging

  • Configuration: Environment-based config

🔄 Data Flow

  1. Load CSV → Parse tracks from CSV file

  2. Generate Embeddings → Create vector representations

  3. Store in Pinecone → Index vectors with metadata

  4. Search Query → Generate query embedding

  5. Find Similar → Cosine similarity search

  6. Return Results → Formatted track information

🧪 Testing

Manual Testing

  1. Start the application

  2. Index the sample data: POST /api/index?csvFileName=tracks_sample.csv

  3. Search for music: POST /api/search with various queries

  4. Check status: GET /api/index/status

Sample Queries to Test

📝 Notes

  • The application uses OpenAI's text-embedding-3-small model for optimal performance

  • Pinecone handles vector similarity search with high performance

  • The system is designed to handle large music catalogs efficiently

  • All API endpoints include comprehensive error handling

  • The application supports both development and production configurations

🚀 Next Steps

  1. Set up your API keys in environment variables

  2. Create your Pinecone index with 1536 dimensions

  3. Add your music CSV file to the resources directory

  4. Test the API endpoints with sample queries

  5. Deploy to production with proper monitoring

The implementation is complete and ready for use! 🎉

Última actualización