SQLite as a Vector Database
Implementing Similarity Search and ML Embeddings
SQLite is not a vector database. However, it can store embeddings and perform similarity search reliably for small to medium datasets.
This article shows how to use SQLite as a simple, real vector store using:
Proper ML embeddings
Cosine similarity
Plain SQL
No external services
The goal is practical execution, not theory.
When SQLite Makes Sense for Vector Search
SQLite works well when:
Your data fits on a single machine
You want zero infrastructure
You already use SQLite
You need predictable, inspectable behavior
SQLite does not make sense when:
You need ANN indexes (HNSW, IVF)
You are working at million+ vector scale
You require very low latency at high concurrency
This approach targets prototypes, internal tools, embedded apps, and small semantic search systems.
What Embeddings Are
An embedding converts text into a numeric vector such that similar text produces similar vectors.
Typical text embedding sizes:
384 dimensions
768 dimensions
1536 dimensions
We will use cosine similarity, which is standard for text embeddings.
Step 1: Install Requirements
You need:
Python 3.9+
numpy
sentence-transformers
pip install numpy sentence-transformersThis blog uses a real embedding model:all-MiniLM-L6-v2
Step 2: Database Schema
SQLite has no vector type.
Vectors are stored as BLOBs.
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY,
content TEXT NOT NULL,
embedding BLOB NOT NULL
);Why BLOBs
Compact
Fast
No JSON parsing
Direct
float32deserialization
This is the correct choice for SQLite.
Step 3: Create the Database and Table
import sqlite3
conn = sqlite3.connect("vectors.db")
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("""
CREATE TABLE IF NOT EXISTS embeddings (
id INTEGER PRIMARY KEY,
content TEXT NOT NULL,
embedding BLOB NOT NULL
);
""")
conn.commit()Step 4: Generate Real Embeddings
This uses a real ML model.
The output vectors are meaningful and usable.
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
def embed_text(text: str) -> np.ndarray:
vector = model.encode([text], normalize_embeddings=True)
return np.asarray(vector[0], dtype=np.float32)Important:
normalize_embeddings=Trueensures cosine similarity works correctlyOutput is always
float32
Step 5: Store Embeddings in SQLite
documents = [
"SQLite WAL mode improves concurrency",
"Embeddings represent text as vectors",
"Cosine similarity is used for semantic search",
"SQLite can store vectors as BLOBs",
"Approximate nearest neighbor search improves scale"
]
for doc in documents:
vec = embed_text(doc)
conn.execute(
"INSERT INTO embeddings (content, embedding) VALUES (?, ?)",
(doc, vec.tobytes())
)
conn.commit() All vectors must have the same dimension.
Step 6: Register Cosine Similarity in SQLite
SQLite does not include vector math.
We add it from Python.
import math
import numpy as np
def cosine_similarity(blob1, blob2):
v1 = np.frombuffer(blob1, dtype=np.float32)
v2 = np.frombuffer(blob2, dtype=np.float32)
dot = float(np.dot(v1, v2))
n1 = float(np.dot(v1, v1))
n2 = float(np.dot(v2, v2))
if n1 == 0.0 or n2 == 0.0:
return 0.0
return dot / (math.sqrt(n1) * math.sqrt(n2))
conn.create_function("cosine_similarity", 2, cosine_similarity) This enables similarity search directly in SQL.
Step 7: Perform Similarity Search
Embed the query, then rank results.
query = "How do I store embeddings in SQLite?"
query_vec = embed_text(query).tobytes()
rows = conn.execute("""
SELECT content,
cosine_similarity(embedding, ?) AS score
FROM embeddings
ORDER BY score DESC
LIMIT 5;
""", (query_vec,)).fetchall()Print results:
for rank, (content, score) in enumerate(rows, start=1):
print(f"{rank}. score={score:.4f} | {content}")This performs a full table scan.
That is expected.
Performance Characteristics
This approach is acceptable when:
Dataset is up to tens of thousands of vectors
Reads dominate writes
You enable WAL mode
Key points:
SQLite evaluates similarity row by row
There is no vector index
Latency grows linearly with row count
Approximate Nearest Neighbor in SQLite
SQLite does not support ANN natively.
Practical workarounds:
Filter rows by metadata before similarity search
Partition data into smaller tables
Reduce vector dimensionality
Cache frequent queries
These reduce comparisons, not complexity.
When to Move Away from SQLite
You should migrate when:
Query latency becomes unacceptable
Dataset grows beyond memory limits
ANN indexing becomes a requirement
SQLite is a starting point, not a dead end.
Final Notes
SQLite can be used as a vector database when used intentionally.
This setup:
Uses real embeddings
Produces meaningful similarity results
Requires no infrastructure
Is easy to debug and inspect
Within its limits, this approach is practical, reliable, and effective.
Subscribe Now
Join thousands of developers mastering advanced SQLite techniques! Get exclusive insights on virtual tables, query optimization, and cutting-edge database patterns delivered weekly to your inbox.
What you’ll get:
Deep technical tutorials on SQLite internals and extensions
Real-world implementation examples with production-ready code
Early access to advanced topics like WASM integration and distributed SQLite
Performance optimization strategies from edge to enterprise


