Build a Semantic Cache Service with Jina AI Embedding and TiDB (2024)

In the rapidly evolving landscape of machine learning and database technologies, combining the strengths of different tools can lead to innovative solutions. One such powerful combination is using Jina AI’s embedding capabilities with TiDB’s vector search functionality. This blog will guide you through building a semantic cache service using Jina AI Embeddings and TiDB Vector.

What is a Semantic Cache?

A semantic cache stores the results of expensive queries and reuses them when the same or similar queries are made. This type of cache uses semantic understanding rather than exact key matching, making it particularly useful in applications requiring natural language processing or similar complex data retrieval tasks.

Why Jina AI and TiDB?

Jina AI: Provides robust embedding capabilities, converting text into high-dimensional vectors that capture semantic meaning.
TiDB Vector: Extends the TiDB database to support efficient vector operations, enabling fast similarity searches on high-dimensional data.

Setting Up the Environment

Prerequisites

Ensure you have the following installed:

Python 3.8 or higher
TiDB Serverless cluster setup and running
An API key from Jina AI

Step-by-Step Implementation

1.Configuration

First, set up your environment configuration. Create a .env file to store your database URI and TTL (Time to Live) settings.

DATABASE_URI=mysql+pymysql://<username>:<password>@<host>:<port>/<database>?ssl_mode=VERIFY_IDENTITY&ssl_ca=/etc/ssl/cert.pemTIME_TO_LIVE=604800 # Default is 1 week

2.Install Required Libraries

Install the necessary Python packages:

3.Define the Cache Model

Use SQLModel to define your cache model, incorporating vector fields and automatic timestamping.

from sqlmodel import SQLModel, Field, Column, DateTime, String, Textfrom sqlalchemy import funcfrom tidb_vector.sqlalchemy import VectorTypefrom typing import Optionalfrom datetime import datetimeclass Cache(SQLModel, table=True): __table_args__ = { # Setting the TTL (Time to Live) for the cache entries 'mysql_TTL': f'created_at + INTERVAL {TIME_TO_LIVE} SECOND', } id: Optional[int] = Field(default=None, primary_key=True) key: str = Field(sa_column=Column(String(255), unique=True, nullable=False)) key_vec: Optional[list[float]] = Field( sa_column=Column( VectorType(768), # Define the vector type with 768 dimensions default=None, comment="hnsw(distance=l2)", # Using HNSW (Hierarchical Navigable Small World) algorithm for distance calculation nullable=False, ) ) value: Optional[str] = Field(sa_column=Column(Text)) created_at: datetime = Field( sa_column=Column(DateTime, server_default=func.now(), nullable=False) ) updated_at: datetime = Field( sa_column=Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False) )

4.Create the Database Engine

Create the engine and the database schema.

from sqlmodel import create_engine# Create the engine using the database URIengine = create_engine(DATABASE_URI)# Create all tables in the databaseSQLModel.metadata.create_all(engine)

5.FastAPI Setup

Set up the FastAPI application and endpoints for setting and getting cache entries.

from fastapi import FastAPI, Dependsfrom fastapi.security import HTTPBearer, HTTPAuthorizationCredentialsfrom sqlmodel import Session, select# Initialize FastAPI appapp = FastAPI()security = HTTPBearer()@app.post("/set")def set_cache( credentials: HTTPAuthorizationCredentials = Depends(security), cache: Cache): # Generate embeddings for the given key using Jina AI cache.key_vec = generate_embeddings(credentials.credentials, cache.key) with Session(engine) as session: session.add(cache) session.commit() return {'message': 'Cache has been set'}@app.get("/get/{key}")def get_cache( credentials: HTTPAuthorizationCredentials = Depends(security), key: str, max_distance: Optional[float] = 0.1,): # Generate embeddings for the given key using Jina AI key_vec = generate_embeddings(credentials.credentials, key) # The max value of distance is 0.3 max_distance = min(max_distance, 0.3) with Session(engine) as session: result = session.exec( select( Cache, Cache.key_vec.cosine_distance(key_vec).label('distance') ).order_by( 'distance' ).limit(1) ).first() if result is None: return {"message": "Cache not found"}, 404 cache, distance = result if distance > max_distance: return {"message": "Cache not found"}, 404 return { "key": cache.key, "value": cache.value, "distance": distance }

6.Generate Embeddings

Implement a function to get embeddings from Jina AI.

import requestsimport osfrom dotenv import load_dotenvload_dotenv()def generate_embeddings(jinaai_api_key: str, text: str): JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings' JINAAI_HEADERS = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {jinaai_api_key}' } JINAAI_REQUEST_DATA = { 'input': [text], 'model': 'jina-embeddings-v2-base-en' # Use the Jina Embeddings model with 768 dimensions } response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA) # Extract and return the embedding from the response return response.json()['data'][0]['embedding']

How to Use This App

Prerequisites

A running TiDB Serverless cluster with vector search enabled
Python 3.8 or later
Jina AI API key from Jina AI

Run the example

Conclusion

By combining Jina AI’s powerful embedding capabilities with TiDB’s efficient vector operations, you can build a robust semantic cache service. This service is ideal for applications requiring fast, intelligent caching and retrieval of semantically similar data. Start experimenting with this setup to explore its full potential in your projects.

More Demos

There are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.

OpenAI Embedding: use the OpenAI embedding model to generate vectors for text data, store them in TiDB Vector, and search for similar text.
Image Search: use the OpenAI CLIP model to generate vectors for image and text, store them in TiDB Vector, and search for similar images.
LlamaIndex RAG with UI: use the LlamaIndex to build an RAG(Retrieval-Augmented Generation) application.
Chat with URL: use LlamaIndex to build an RAG(Retrieval-Augmented Generation) application that can chat with a URL.
GraphRAG: 20 lines code of using TiDB Serverless to build a Knowledge Graph based RAG application.
GraphRAG Step by Step Tutorial: Step by step tutorial to build a Knowledge Graph based RAG application with Colab notebook. In this tutorial, you will learn how to extract knowledge from a text corpus, build a Knowledge Graph, store the Knowledge Graph in TiDB Serverless, and search from the Knowledge Graph.
Vector Search Notebook with SQLAlchemy: use SQLAlchemy to interact with TiDB Serverless: connect db, index&store data and then search vectors.
Build RAG with Jina AI Embeddings: use Jina AI to generate embeddings for text data, store the embeddings in TiDB Vector Storage, and search for similar embeddings.

Happy coding!