Embedding y LLM’s en acción: un laboratorio practico con Java y OpenIA
Este es un MVP de un juego tipo context.me usando Java + Spring Boot y OpenAI Embeddings para calcular la cercanía semántica entre la palabra oculta y los intentos del jugador.
🚀 Endpoints
POST /api/games→ crea una partida nueva.Body opcional:
{ "topic": "animales" }(por ahora solo informativo para elegir palabra de un set básico)Respuesta:
{ id, startedAt, topic }
POST /api/games/{id}/guess→ envía un intento.Body:
{ "word": "gato" }Respuesta:
{ similarity, percentage, hint, solved }
GET /api/games/{id}→ estado de la partida (debug simple, sin revelar la palabra).
Regla de victoria (MVP): se considera acertado con similarity ≥ 0.95 y/o cuando la palabra coincide exactamente (case-insensitive, acentos normalizados).
🧩 Estructura de paquetes
com.example.contextia
├─ ContextIaApplication.java
├─ config
│ ├─ OpenAiProperties.java
│ └─ WebConfig.java
├─ core
│ ├─ EmbeddingClient.java
│ ├─ SimilarityUtil.java
│ └─ TextNormalizer.java
├─ game
│ ├─ Game.java
│ ├─ GameService.java
│ ├─ WordPicker.java
│ └─ dto
│ ├─ CreateGameRequest.java
│ ├─ CreateGameResponse.java
│ ├─ GuessRequest.java
│ └─ GuessResponse.java
└─ web
└─ GameController.java🧱 pom.xml
pom.xml<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>context-ia</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>context-ia</name>
<description>Juego tipo context.me con Spring Boot + OpenAI Embeddings</description>
<properties>
<java.version>21</java.version>
<spring-boot.version>3.3.3</spring-boot.version>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>${spring-boot.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- HTTP client -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.12.0</version>
</dependency>
<!-- JSON -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- Swagger/OpenAPI -->
<dependency>
<groupId>org.springdoc</groupId>
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
<version>2.6.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<source>21</source>
<target>21</target>
<compilerArgs>
<arg>-parameters</arg>
</compilerArgs>
</configuration>
</plugin>
</plugins>
</build>
</project>⚙️ application.yml
application.ymlserver:
port: 8080
openai:
apiKey: ${OPENAI_API_KEY:}
embeddingsModel: text-embedding-3-small # o text-embedding-3-large
baseUrl: https://api.openai.com/v1
spring:
jackson:
serialization:
INDENT_OUTPUT: trueExporta tu API Key como variable de entorno
OPENAI_API_KEY.
🏁 ContextIaApplication.java
ContextIaApplication.javapackage com.example.contextia;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class ContextIaApplication {
public static void main(String[] args) {
SpringApplication.run(ContextIaApplication.class, args);
}
}🧰 config/OpenAiProperties.java
config/OpenAiProperties.javapackage com.example.contextia.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Configuration;
import lombok.Data;
@Data
@Configuration
@ConfigurationProperties(prefix = "openai")
public class OpenAiProperties {
private String apiKey;
private String embeddingsModel;
private String baseUrl;
}🌐 config/WebConfig.java (CORS para tu frontend)
config/WebConfig.java (CORS para tu frontend)package com.example.contextia.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
@Configuration
public class WebConfig {
@Bean
public WebMvcConfigurer corsConfigurer() {
return new WebMvcConfigurer() {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/**")
.allowedOrigins("http://localhost:5173", "http://localhost:4200")
.allowedMethods("GET", "POST", "PUT", "DELETE", "OPTIONS")
.allowCredentials(true);
}
};
}
}🧠 core/EmbeddingClient.java
core/EmbeddingClient.javapackage com.example.contextia.core;
import com.example.contextia.config.OpenAiProperties;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import okhttp3.*;
import org.springframework.http.MediaType;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
@Component
@RequiredArgsConstructor
public class EmbeddingClient {
private final OpenAiProperties props;
private final ObjectMapper mapper = new ObjectMapper();
private final OkHttpClient http = new OkHttpClient();
// 🔹 Contexto fijo que se añadirá a cada palabra
private static final String EMBEDDING_CONTEXT = "Esta es una sola palabra en el idioma español: ";
public List<Double> embed(String word) {
try {
// Envolvemos la palabra en un contexto claro y consistente
String input = EMBEDDING_CONTEXT + word;
String url = props.getBaseUrl() + "/embeddings";
var root = mapper.createObjectNode();
root.put("model", props.getEmbeddingsModel());
root.put("input", input);
RequestBody body = RequestBody.create(
root.toString(),
okhttp3.MediaType.parse("application/json")
);
Request request = new Request.Builder()
.url(url)
.header("Authorization", "Bearer " + props.getApiKey())
.post(body)
.build();
try (Response response = http.newCall(request).execute()) {
if (!response.isSuccessful()) {
throw new RuntimeException("OpenAI embeddings error: " + response.code());
}
JsonNode arr = mapper.readTree(response.body().string())
.path("data").get(0).path("embedding");
List<Double> vec = new ArrayList<>();
for (JsonNode n : arr) {
vec.add(n.asDouble());
}
return vec;
}
} catch (Exception e) {
throw new RuntimeException("Embedding request failed", e);
}
}
}🔤 core/TextNormalizer.java (acentos, minúsculas)
core/TextNormalizer.java (acentos, minúsculas)package com.example.contextia.core;
import org.apache.commons.text.StringEscapeUtils;
import java.text.Normalizer;
public class TextNormalizer {
public static String normalize(String s) {
if (s == null) return "";
String lower = s.trim().toLowerCase();
String nfkd = Normalizer.normalize(lower, Normalizer.Form.NFKD)
.replaceAll("\\p{M}", ""); // quita acentos
return StringEscapeUtils.unescapeHtml4(nfkd);
}
}📐 core/SimilarityUtil.java
core/SimilarityUtil.javapackage com.example.contextia.core;
import java.util.List;
public class SimilarityUtil {
public enum ScaleMode {
RAW, // Coseno directo [0..1]
LINEAR, // Reescalado lineal con min/max empíricos
SIGMOID // Curva en S para enfatizar extremos
}
// Coseno clásico
public static double cosine(List<Double> a, List<Double> b) {
if (a == null || b == null || a.size() != b.size()) return -1.0;
double dot = 0.0, na = 0.0, nb = 0.0;
for (int i = 0; i < a.size(); i++) {
double x = a.get(i);
double y = b.get(i);
dot += x * y;
na += x * x;
nb += y * y;
}
if (na == 0 || nb == 0) return -1.0;
return dot / (Math.sqrt(na) * Math.sqrt(nb)); // [-1..1]
}
// Convierte coseno en porcentaje según el modo elegido
public static int toPercentage(double cos, ScaleMode mode) {
if (cos < 0) cos = 0; // cortamos negativos (no relación)
switch (mode) {
case RAW -> {
// [0..1] → [0..100] directo
return (int) Math.round(cos * 100.0);
}
case LINEAR -> {
// Reescalado con min/max empíricos
double min = 0.2; // ≈ sin relación
double max = 0.85; // ≈ muy fuerte
double val = (cos - min) / (max - min);
if (val < 0) val = 0;
if (val > 1) val = 1;
return (int) Math.round(val * 100);
}
case SIGMOID -> {
// Escala en S para enfatizar extremos
double val = 1 / (1 + Math.exp(-12 * (cos - 0.5)));
return (int) Math.round(val * 100);
}
default -> throw new IllegalArgumentException("Modo no soportado: " + mode);
}
}
// Helper con modo por defecto
public static int toPercentage(double cos) {
return toPercentage(cos, ScaleMode.LINEAR); // por defecto LINEAR
}
}🎲 game/Game.java
game/Game.javapackage com.example.contextia.game;
import lombok.Builder;
import lombok.Data;
import java.time.Instant;
import java.util.List;
import java.util.UUID;
@Data
@Builder
public class Game {
private UUID id;
private String topic;
private String targetWord; // palabra oculta (normalizada internamente)
private List<Double> targetEmbedding;
private Instant startedAt;
private boolean solved;
}🧮 game/WordPicker.java (set mínimo de palabras)
game/WordPicker.java (set mínimo de palabras)package com.example.contextia.game;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
public class WordPicker {
private static final Map<String, List<String>> TOPIC_WORDS = new HashMap<String, List<String>>() {{
put("general", Arrays.asList("perro", "gato", "cielo", "mar", "montaña", "libro", "escuela", "comida", "música"));
put("animals", Arrays.asList("perro", "gato", "lobo", "zorro", "vaca", "caballo", "oveja", "tigre", "mono", "delfín"));
}};
public static String pick(String topic) {
String key = TOPIC_WORDS.containsKey(topic) ? topic : "general";
List<String> list = TOPIC_WORDS.get(key);
return list.get(new Random().nextInt(list.size()));
}
}🧠 game/GameService.java
game/GameService.javapackage com.example.contextia.game;
import com.example.contextia.core.EmbeddingClient;
import com.example.contextia.core.SimilarityUtil;
import com.example.contextia.core.TextNormalizer;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
@Service
@RequiredArgsConstructor
public class GameService {
private final EmbeddingClient embeddings;
private final Map<UUID, Game> store = new ConcurrentHashMap<>();
private final Map<String, List<Double>> cache = new ConcurrentHashMap<>(); // cache de embeddings por palabra
public Game createGame(String topic) {
String raw = WordPicker.pick(topic);
String target = TextNormalizer.normalize(raw);
List<Double> emb = cache.computeIfAbsent(target, embeddings::embed);
Game g = Game.builder()
.id(UUID.randomUUID())
.topic(topic == null ? "general" : topic)
.targetWord(target)
.targetEmbedding(emb)
.startedAt(Instant.now())
.solved(false)
.build();
store.put(g.getId(), g);
return g;
}
public Optional<Game> get(UUID id) { return Optional.ofNullable(store.get(id)); }
public GuessOutcome guess(UUID id, String word) {
Game g = store.get(id);
if (g == null) throw new NoSuchElementException("Game not found");
String normalized = TextNormalizer.normalize(word);
if (normalized.isBlank()) throw new IllegalArgumentException("La palabra no puede estar vacía");
if (normalized.equals(g.getTargetWord())) {
g.setSolved(true);
return new GuessOutcome(1.0, 100, "¡Exacto!", true);
}
List<Double> vec = cache.computeIfAbsent(normalized, embeddings::embed);
double cos = SimilarityUtil.cosine(vec, g.getTargetEmbedding());
int pct = SimilarityUtil.toPercentage(cos);
boolean solved = cos >= 0.95; // umbral de victoria por cercanía
if (solved) g.setSolved(true);
String hint = hintFor(pct);
return new GuessOutcome(cos, pct, hint, solved);
}
private String hintFor(int pct) {
if (pct >= 95) return "¡A nada de acertar!";
if (pct >= 85) return "Muy, muy cerca";
if (pct >= 70) return "Cerca";
if (pct >= 55) return "Caliente";
if (pct >= 40) return "Tibio";
if (pct >= 25) return "Frío";
return "Muy frío";
}
public record GuessOutcome(double similarity, int percentage, String hint, boolean solved) {}
}📦 DTOs game/dto/*.java
game/dto/*.javaCreateGameRequest.java
package com.example.contextia.game.dto;
import jakarta.validation.constraints.Size;
import lombok.Data;
@Data
public class CreateGameRequest {
@Size(max = 40)
private String topic; // opcional: "animales", "general"
}CreateGameResponse.java
package com.example.contextia.game.dto;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.time.Instant;
import java.util.UUID;
@Data
@AllArgsConstructor
public class CreateGameResponse {
private UUID id;
private Instant startedAt;
private String topic;
}GuessRequest.java
package com.example.contextia.game.dto;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import lombok.Data;
@Data
public class GuessRequest {
@NotBlank
@Size(max = 60)
private String word;
}GuessResponse.java
package com.example.contextia.game.dto;
import lombok.AllArgsConstructor;
import lombok.Data;
@Data
@AllArgsConstructor
public class GuessResponse {
private double similarity; // coseno [-1..1]
private int percentage; // 0..100
private String hint;
private boolean solved;
}🌍 web/GameController.java
web/GameController.javapackage com.example.contextia.web;
import com.example.contextia.game.Game;
import com.example.contextia.game.GameService;
import com.example.contextia.game.dto.CreateGameRequest;
import com.example.contextia.game.dto.CreateGameResponse;
import com.example.contextia.game.dto.GuessRequest;
import com.example.contextia.game.dto.GuessResponse;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.UUID;
@RestController
@RequestMapping("/api/games")
@RequiredArgsConstructor
public class GameController {
private final GameService service;
@PostMapping
public ResponseEntity<CreateGameResponse> create(@RequestBody(required = false) CreateGameRequest req) {
String topic = (req != null && req.getTopic() != null) ? req.getTopic() : "general";
Game g = service.createGame(topic);
return ResponseEntity.ok(new CreateGameResponse(g.getId(), g.getStartedAt(), g.getTopic()));
}
@GetMapping("/{id}")
public ResponseEntity<?> get(@PathVariable UUID id) {
return service.get(id)
.<ResponseEntity<?>>map(g -> ResponseEntity.ok(Map.of(
"id", g.getId(),
"startedAt", g.getStartedAt(),
"topic", g.getTopic(),
"solved", g.isSolved()
)))
.orElse(ResponseEntity.notFound().build());
}
@PostMapping("/{id}/guess")
public ResponseEntity<GuessResponse> guess(@PathVariable UUID id, @Valid @RequestBody GuessRequest body) {
var out = service.guess(id, body.getWord());
return ResponseEntity.ok(new GuessResponse(out.similarity(), out.percentage(), out.hint(), out.solved()));
}
@ExceptionHandler({NoSuchElementException.class, IllegalArgumentException.class})
public ResponseEntity<?> handle(RuntimeException ex) {
return ResponseEntity.badRequest().body(Map.of("error", ex.getMessage()));
}
}▶️ Cómo ejecutar
Configura tu API Key:
setx OPENAI_API_KEY "sk-..." # Windows # o export OPENAI_API_KEY="sk-..." # macOS/LinuxLevanta la app:
mvn spring-boot:runPrueba con cURL/Postman:
# Crear partida curl -X POST http://localhost:8080/api/games -H "Content-Type: application/json" -d '{"topic":"animales"}' # Enviar intento curl -X POST http://localhost:8080/api/games/{ID}/guess -H "Content-Type: application/json" -d '{"word":"gato"}'
💡 Notas de diseño
Cache de embeddings en memoria para evitar llamadas repetidas.
Normalización de texto: minúsculas y sin acentos para comparaciones exactas.
Umbral de victoria ajustable (0.93–0.97 suele ir bien con
text-embedding-3-small).Temas/Dominios: amplía
WordPickerleyendo un diccionario desde BD o archivo.Anti-trampas: filtra intentos iguales consecutivos y limita 3–5 rps por usuario.
Internacionalización: usa el mismo modelo; embeddings funcionan bien en español.
🧭 Extensiones siguientes
Ranking global (persistencia con Postgres).
Historial por partida (lista de intentos con puntaje y ordenado por cercanía).
"Pistas inteligentes" usando un LLM (opcional) para generar hints temáticos según proximidad.
Frontend (Angular/React) con barra de progreso y top-10 intentos más cercanos.
¿Quieres que agregue persistencia con JPA (PostgreSQL) y un historial de intentos + leaderboard? También puedo dejarte un Dockerfile y docker-compose para levantar todo con una base en segundos.
Última actualización