[EXTRA] Music Search API

A complete Spring Boot application for semantic music search using OpenAI embeddings and Pinecone vector database.

✅ Implementation Status

All components from the original specification have been successfully implemented:

✅ Maven Project Structure - Complete with all dependencies
✅ Track Model - Comprehensive data model with all music attributes
✅ Configuration Classes - OpenAI and Pinecone configurations
✅ CSV Loader Service - Robust CSV parsing with error handling
✅ Embedding Service - OpenAI integration with caching
✅ Pinecone Service - Vector operations and similarity search
✅ Index Controller - CSV indexing with progress tracking
✅ Search Controller - Semantic search with suggestions
✅ Application Properties - Complete configuration setup
✅ Sample Data - Sample CSV with popular tracks

🚀 Quick Start

1. Prerequisites

Java 21
Maven 3.6+
OpenAI API Key
Pinecone Account and Index

2. Environment Variables

export OPENAI_API_KEY=sk-your-openai-api-key
export PINECONE_API_KEY=your-pinecone-api-key
export PINECONE_ENVIRONMENT=gcp-starter
export PINECONE_INDEX_NAME=music-tracks

3. Build and Run

# Build the application
mvn clean package

# Run the application
mvn spring-boot:run

# Or run the JAR
java -jar target/music-search-0.0.1-SNAPSHOT.jar

The API will be available at http://localhost:8080

📡 API Endpoints

Index Management

Index CSV Data

POST /api/index?csvFileName=tracks_sample.csv

Response:

{
  "status": "success",
  "tracksLoaded": 10,
  "embeddingsGenerated": 10,
  "tracksIndexed": 10,
  "processingTimeMs": 15420,
  "message": "Successfully indexed 10 tracks in 15.42 seconds"
}

Get Index Status

GET /api/index/status

Clear Embedding Cache

DELETE /api/index/cache

Health Check

GET /api/index/health

Search Operations

Semantic Search

POST /api/search
Content-Type: application/json

{
  "query": "energetic pop music for dancing",
  "topK": 5
}

Response:

{
  "status": "success",
  "query": "energetic pop music for dancing",
  "matches": [
    {
      "id": "6f807x0ima9a1j3VPbc7VN",
      "score": 0.91,
      "metadata": {
        "track_name": "I Don't Care (with Justin Bieber) - Loud Luxury Remix",
        "track_artist": "Ed Sheeran",
        "playlist_genre": "pop",
        "playlist_subgenre": "dance pop",
        "energy": 0.916,
        "tempo": 122.036
      }
    }
  ],
  "totalResults": 1,
  "searchTimeMs": 245
}

Search Suggestions

GET /api/search/suggestions?q=pop

Search Health Check

GET /api/search/health

🎵 Sample Search Queries

Try these example searches:

"upbeat dance music" - Finds high-energy, danceable tracks
"relaxing acoustic songs" - Finds calm, acoustic tracks
"workout motivation" - Finds high-tempo, energetic tracks
"romantic ballads" - Finds emotional, slow-tempo tracks
"party anthems" - Finds upbeat, danceable party music

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CSV Loader    │───▶│ Embedding       │───▶│   Pinecone      │
│                 │    │ Service         │    │   Service       │
│ - Parse CSV     │    │                 │    │                 │
│ - Validate data │    │ - OpenAI API    │    │ - Vector ops    │
│ - Error handling│    │ - Caching       │    │ - Similarity    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Controllers   │
                    │                 │
                    │ - Index API     │
                    │ - Search API    │
                    └─────────────────┘

🔧 Configuration

OpenAI Settings

Model: text-embedding-3-small
Timeout: 60 seconds (configurable)
Caching: Enabled with 1000 entry limit

Pinecone Settings

Environment: Configurable (gcp-starter, aws, etc.)
Index Name: Configurable
Vector Dimensions: 1536 (OpenAI embedding size)

CSV Processing

Batch Size: 100 tracks per batch
Error Handling: Continues processing on individual errors
Field Mapping: Automatic based on header names

📊 Features

✅ Implemented Features

Semantic Search: Natural language music queries
Batch Processing: Efficient CSV indexing
Caching: Embedding result caching
Error Handling: Robust error recovery
Health Checks: Service monitoring endpoints
CORS Support: Cross-origin requests
Logging: Comprehensive logging
Configuration: Environment-based config

🔄 Data Flow

Load CSV → Parse tracks from CSV file
Generate Embeddings → Create vector representations
Store in Pinecone → Index vectors with metadata
Search Query → Generate query embedding
Find Similar → Cosine similarity search
Return Results → Formatted track information

🧪 Testing

Manual Testing

Start the application
Index the sample data: POST /api/index?csvFileName=tracks_sample.csv
Search for music: POST /api/search with various queries
Check status: GET /api/index/status

Sample Queries to Test

# High energy dance music
curl -X POST http://localhost:8080/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "high energy dance music", "topK": 3}'

# Relaxing acoustic songs
curl -X POST http://localhost:8080/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "relaxing acoustic songs", "topK": 3}'

# Workout motivation
curl -X POST http://localhost:8080/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "workout motivation music", "topK": 3}'

📝 Notes

The application uses OpenAI's text-embedding-3-small model for optimal performance
Pinecone handles vector similarity search with high performance
The system is designed to handle large music catalogs efficiently
All API endpoints include comprehensive error handling
The application supports both development and production configurations

🚀 Next Steps

Set up your API keys in environment variables
Create your Pinecone index with 1536 dimensions
Add your music CSV file to the resources directory
Test the API endpoints with sample queries
Deploy to production with proper monitoring

The implementation is complete and ready for use! 🎉

AnteriorEmbedding y LLM’s en acción: un laboratorio practico con Java y OpenIA

Última actualización hace 3 meses

Buenas noches

hashtag✅ Implementation Status

hashtag🚀 Quick Start

hashtag1. Prerequisites

hashtag2. Environment Variables

hashtag3. Build and Run

hashtag📡 API Endpoints

hashtagIndex Management

hashtagIndex CSV Data

hashtagGet Index Status

hashtagClear Embedding Cache

hashtagHealth Check

hashtagSearch Operations

hashtagSemantic Search

hashtagSearch Suggestions

hashtagSearch Health Check

hashtag🎵 Sample Search Queries

hashtag🏗️ Architecture

hashtag🔧 Configuration

hashtagOpenAI Settings

hashtagPinecone Settings

hashtagCSV Processing

hashtag📊 Features

hashtag✅ Implemented Features

hashtag🔄 Data Flow

hashtag🧪 Testing

hashtagManual Testing

hashtagSample Queries to Test

hashtag📝 Notes

hashtag🚀 Next Steps