SQLite as a Vector Database

Implementing Similarity Search and ML Embeddings

Jan 27, 2026

SQLite is not a vector database. However, it can store embeddings and perform similarity search reliably for small to medium datasets.

Developers collaborating as data streams visualize vector search and embeddings in a modern digital space.

This article shows how to use SQLite as a simple, real vector store using:

Proper ML embeddings
Cosine similarity
Plain SQL
No external services

The goal is practical execution, not theory.

When SQLite Makes Sense for Vector Search

SQLite works well when:

Your data fits on a single machine
You want zero infrastructure
You already use SQLite
You need predictable, inspectable behavior

SQLite does not make sense when:

You need ANN indexes (HNSW, IVF)
You are working at million+ vector scale
You require very low latency at high concurrency

This approach targets prototypes, internal tools, embedded apps, and small semantic search systems.

What Embeddings Are

An embedding converts text into a numeric vector such that similar text produces similar vectors.

Typical text embedding sizes:

384 dimensions
768 dimensions
1536 dimensions

We will use cosine similarity, which is standard for text embeddings.

Step 1: Install Requirements

You need:

Python 3.9+
numpy
sentence-transformers

pip install numpy sentence-transformers

This blog uses a real embedding model:
all-MiniLM-L6-v2

Step 2: Database Schema

SQLite has no vector type.
Vectors are stored as BLOBs.

CREATE TABLE embeddings (
  id INTEGER PRIMARY KEY,
  content TEXT NOT NULL,
  embedding BLOB NOT NULL
);

Why BLOBs

Compact
Fast
No JSON parsing
Direct float32 deserialization

This is the correct choice for SQLite.

Step 3: Create the Database and Table

import sqlite3

conn = sqlite3.connect("vectors.db")
conn.execute("PRAGMA journal_mode=WAL;")

conn.execute("""
CREATE TABLE IF NOT EXISTS embeddings (
  id INTEGER PRIMARY KEY,
  content TEXT NOT NULL,
  embedding BLOB NOT NULL
);
""")

conn.commit()

Step 4: Generate Real Embeddings

This uses a real ML model.
The output vectors are meaningful and usable.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

def embed_text(text: str) -> np.ndarray:
    vector = model.encode([text], normalize_embeddings=True)
    return np.asarray(vector[0], dtype=np.float32)

Important:

normalize_embeddings=True ensures cosine similarity works correctly
Output is always float32

Step 5: Store Embeddings in SQLite

documents = [
    "SQLite WAL mode improves concurrency",
    "Embeddings represent text as vectors",
    "Cosine similarity is used for semantic search",
    "SQLite can store vectors as BLOBs",
    "Approximate nearest neighbor search improves scale"
]

for doc in documents:
    vec = embed_text(doc)
    conn.execute(
        "INSERT INTO embeddings (content, embedding) VALUES (?, ?)",
        (doc, vec.tobytes())
    )

conn.commit()

All vectors must have the same dimension.

Step 6: Register Cosine Similarity in SQLite

SQLite does not include vector math.
We add it from Python.

import math
import numpy as np

def cosine_similarity(blob1, blob2):
    v1 = np.frombuffer(blob1, dtype=np.float32)
    v2 = np.frombuffer(blob2, dtype=np.float32)

    dot = float(np.dot(v1, v2))
    n1 = float(np.dot(v1, v1))
    n2 = float(np.dot(v2, v2))

    if n1 == 0.0 or n2 == 0.0:
        return 0.0

    return dot / (math.sqrt(n1) * math.sqrt(n2))

conn.create_function("cosine_similarity", 2, cosine_similarity)

This enables similarity search directly in SQL.

Step 7: Perform Similarity Search

Embed the query, then rank results.

query = "How do I store embeddings in SQLite?"
query_vec = embed_text(query).tobytes()

rows = conn.execute("""
SELECT content,
       cosine_similarity(embedding, ?) AS score
FROM embeddings
ORDER BY score DESC
LIMIT 5;
""", (query_vec,)).fetchall()

Print results:

for rank, (content, score) in enumerate(rows, start=1):
    print(f"{rank}. score={score:.4f} | {content}")

This performs a full table scan.
That is expected.

Performance Characteristics

This approach is acceptable when:

Dataset is up to tens of thousands of vectors
Reads dominate writes
You enable WAL mode

Key points:

SQLite evaluates similarity row by row
There is no vector index
Latency grows linearly with row count

Approximate Nearest Neighbor in SQLite

SQLite does not support ANN natively.

Practical workarounds:

Filter rows by metadata before similarity search
Partition data into smaller tables
Reduce vector dimensionality
Cache frequent queries

These reduce comparisons, not complexity.

When to Move Away from SQLite

You should migrate when:

Query latency becomes unacceptable
Dataset grows beyond memory limits
ANN indexing becomes a requirement

SQLite is a starting point, not a dead end.

Final Notes

SQLite can be used as a vector database when used intentionally.

This setup:

Uses real embeddings
Produces meaningful similarity results
Requires no infrastructure
Is easy to debug and inspect

Within its limits, this approach is practical, reliable, and effective.

Subscribe Now

Join thousands of developers mastering advanced SQLite techniques! Get exclusive insights on virtual tables, query optimization, and cutting-edge database patterns delivered weekly to your inbox.

What you’ll get:

Deep technical tutorials on SQLite internals and extensions
Real-world implementation examples with production-ready code
Early access to advanced topics like WASM integration and distributed SQLite
Performance optimization strategies from edge to enterprise

SQLite Forum

Discussion about this post

Ready for more?