How SQLite Uses Statistics Tables for Query Planning

Understanding sqlite_stat1, sqlite_stat4, and ANALYZE for Better Query Performance

Jun 09, 2026

In our previous guide on VACUUM, Fragmentation, and Database File Maintenance, we explored how SQLite maintains efficient storage structures and reclaims unused space.

SQLite query planner analyzing statistics to choose the fastest path through a digital library archive.

Now we’ll examine another important optimization system:

How does SQLite decide which query execution plan to use?

When a query is submitted, SQLite does not blindly execute it.

Instead, SQLite’s query planner evaluates multiple possible execution strategies and attempts to choose the most efficient one.

To make intelligent decisions, SQLite relies on statistical information about the data stored inside the database.

This information is stored in special tables such as:

sqlite_stat1
sqlite_stat4

These tables are populated by the ANALYZE command and help SQLite estimate:

Table sizes
Index selectivity
Row distribution
Query costs

In this guide, we’ll explore:

Why statistics matter
How ANALYZE works
The purpose of sqlite_stat1
The role of sqlite_stat4
How statistics influence query planning
Best practices for maintaining accurate statistics

Why Query Planning Matters

Many SQL statements can be executed in multiple ways.

Consider:

SELECT *
FROM orders
WHERE customer_id = 100;

SQLite might choose:

A full table scan
An index lookup
A covering index scan

Each approach has different performance characteristics.

The query planner’s job is to select the lowest-cost option.

To do that effectively, SQLite needs information about the data.

The Problem Without Statistics

Imagine a table containing:

10 Million Rows

Suppose an index exists on:

customer_id

If SQLite does not know how data is distributed, it may incorrectly estimate:

How many rows match
Whether the index is useful
The overall query cost

Poor estimates can result in:

Inefficient index usage
Unnecessary table scans
Slower queries

Statistics help avoid these mistakes.

What is ANALYZE?

The ANALYZE command collects information about:

Tables
Indexes
Data distribution

Example:

ANALYZE;

SQLite scans database structures and stores statistical information in special internal tables.

Afterward, the query planner can make more informed decisions.

Where Statistics Are Stored

The primary statistics tables are:

sqlite_stat1
sqlite_stat4

These are system tables maintained by SQLite.

They are not typically modified manually.

Instead:

ANALYZE populates them
SQLite reads them during query planning

Understanding sqlite_stat1

The most common statistics table is:

sqlite_stat1

This table contains summary information about:

Tables
Indexes
Row counts

Viewing sqlite_stat1

After running ANALYZE:

SELECT *
FROM sqlite_stat1;

You may see results similar to:

orders idx_customer 1000000 100
products idx_category 50000 50

The values help SQLite estimate:

Table size
Index selectivity
Expected row counts

What Does Selectivity Mean?

Selectivity describes how effectively an index narrows results.

Consider two columns:

High Selectivity

email

Each value is usually unique.

Example:

[email protected]
[email protected]

An index on email is highly selective.

Low Selectivity

status

Possible values:

active
inactive

Many rows share the same value.

An index on status is less selective.

Why Selectivity Matters

SQLite uses selectivity estimates to determine:

Whether an index should be used
Which index is most efficient
Join order selection

Without accurate selectivity information:

SQLite may choose inefficient plans.

Understanding sqlite_stat4

While sqlite_stat1 provides summary information, SQLite can gather more detailed statistics using:

sqlite_stat4

This table stores sampled index values.

Why sqlite_stat4 Exists

Imagine an index:

CREATE INDEX idx_city
ON customers(city);

Suppose the data distribution is:

New York      500,000
Los Angeles   200,000
Chicago        50,000
Smalltown          10

A simple average does not accurately represent reality.

Some values are extremely common.

Others are rare.

sqlite_stat4 helps SQLite understand these differences.

How sqlite_stat4 Improves Estimates

Instead of relying only on averages:

SQLite can examine sampled values and estimate:

Range sizes
Distribution patterns
Value frequencies

This produces better query plans for skewed datasets.

A Practical Example

Consider:

SELECT *
FROM customers
WHERE city = 'Smalltown';

Without detailed statistics:

SQLite might assume:

Thousands of rows match

In reality:

Only 10 rows match

With sqlite_stat4:

SQLite can make a far better estimate.

Statistics and Index Selection

Suppose a table contains:

customer_id
status
created_at

And indexes exist on all three columns.

The planner must decide:

Which index is most efficient?

Statistics help estimate:

Rows returned
Index traversal cost
Disk access requirements

The result is often a significantly faster plan.

Statistics and Join Planning

Statistics become even more important during joins.

Example:

SELECT *
FROM customers c
JOIN orders o
ON c.id = o.customer_id;

SQLite must determine:

Which table to access first
Which indexes to use
How many rows will participate

Poor estimates can dramatically increase execution time.

How SQLite Uses Statistics Internally

The query planner performs cost calculations.

For each possible plan:

SQLite estimates:

Rows examined
Index lookups
Page reads
CPU work

The lowest estimated cost usually wins.

Statistics provide the foundation for those estimates.

When Statistics Become Outdated

Statistics are not automatically refreshed after every change.

Over time:

New rows are inserted
Old rows are deleted
Data distribution changes

Eventually:

Stored statistics may no longer reflect reality.

When to Run ANALYZE

ANALYZE is most beneficial after:

Large data imports
Significant deletions
Bulk updates
Major application growth

These events can change query behavior substantially.

Example Workflow

Imagine:

Database Size: 100,000 rows

ANALYZE is executed.

Later:

Database Size: 10 million rows

Statistics may no longer represent current data.

Running:

ANALYZE;

refreshes the planner’s information.

Targeting Specific Tables

You can analyze a single table:

ANALYZE orders;

Or a specific index:

ANALYZE idx_customer;

This can reduce maintenance overhead in large databases.

Viewing Query Planner Decisions

SQLite provides:

EXPLAIN QUERY PLAN

Example:

EXPLAIN QUERY PLAN
SELECT *
FROM orders
WHERE customer_id = 100;

This reveals:

Index usage
Table scans
Planner choices

It’s one of the best ways to observe the impact of ANALYZE.

Potential Downsides of ANALYZE

Although ANALYZE is beneficial, it has costs.

Data Collection Time

Large databases require:

Table scanning
Index scanning

Analysis can take time.

Storage Overhead

Statistics tables consume additional space.

Usually this overhead is very small.

Maintenance Requirements

Statistics become stale over time.

Periodic updates may be necessary.

Best Practices

Run ANALYZE After Large Data Changes

Major imports and deletions often change data distribution.

Monitor Query Plans

Use:

EXPLAIN QUERY PLAN

to verify planner behavior.

Focus on Frequently Queried Tables

Not every table requires constant analysis.

Prioritize important workloads.

Understand Data Distribution

Highly skewed datasets often benefit most from detailed statistics.

This is where sqlite_stat4 can provide significant value.

Closing Thoughts

SQLite’s query planner depends heavily on statistical information to make intelligent decisions.

Key takeaways:

ANALYZE collects database statistics
sqlite_stat1 stores table and index summaries
sqlite_stat4 stores detailed sample information
Statistics improve row-count estimation
Better estimates lead to better query plans
Periodically refreshing statistics helps maintain performance

As databases grow, understanding how SQLite uses statistics becomes increasingly important. Query optimization is not only about creating indexes, it’s also about giving the query planner the information it needs to use those indexes effectively.

In the next guide, we’ll explore SQLite’s cost-based query optimizer and how execution plans are selected internally.

Subscribe Now

If you want practical, real-world SQLite architecture tutorials, subscribe to SQLite Forum. Subscribe to receive new articles directly.

SQLite Forum

Discussion about this post

Ready for more?