How SQLite Uses Statistics Tables for Query Planning
Understanding sqlite_stat1, sqlite_stat4, and ANALYZE for Better Query Performance
In our previous guide on VACUUM, Fragmentation, and Database File Maintenance, we explored how SQLite maintains efficient storage structures and reclaims unused space.
Now we’ll examine another important optimization system:
How does SQLite decide which query execution plan to use?
When a query is submitted, SQLite does not blindly execute it.
Instead, SQLite’s query planner evaluates multiple possible execution strategies and attempts to choose the most efficient one.
To make intelligent decisions, SQLite relies on statistical information about the data stored inside the database.
This information is stored in special tables such as:
sqlite_stat1sqlite_stat4
These tables are populated by the ANALYZE command and help SQLite estimate:
Table sizes
Index selectivity
Row distribution
Query costs
In this guide, we’ll explore:
Why statistics matter
How ANALYZE works
The purpose of sqlite_stat1
The role of sqlite_stat4
How statistics influence query planning
Best practices for maintaining accurate statistics
Why Query Planning Matters
Many SQL statements can be executed in multiple ways.
Consider:
SELECT *
FROM orders
WHERE customer_id = 100;SQLite might choose:
A full table scan
An index lookup
A covering index scan
Each approach has different performance characteristics.
The query planner’s job is to select the lowest-cost option.
To do that effectively, SQLite needs information about the data.
The Problem Without Statistics
Imagine a table containing:
10 Million RowsSuppose an index exists on:
customer_idIf SQLite does not know how data is distributed, it may incorrectly estimate:
How many rows match
Whether the index is useful
The overall query cost
Poor estimates can result in:
Inefficient index usage
Unnecessary table scans
Slower queries
Statistics help avoid these mistakes.
What is ANALYZE?
The ANALYZE command collects information about:
Tables
Indexes
Data distribution
Example:
ANALYZE;SQLite scans database structures and stores statistical information in special internal tables.
Afterward, the query planner can make more informed decisions.
Where Statistics Are Stored
The primary statistics tables are:
sqlite_stat1
sqlite_stat4These are system tables maintained by SQLite.
They are not typically modified manually.
Instead:
ANALYZE populates them
SQLite reads them during query planning
Understanding sqlite_stat1
The most common statistics table is:
sqlite_stat1This table contains summary information about:
Tables
Indexes
Row counts
Viewing sqlite_stat1
After running ANALYZE:
SELECT *
FROM sqlite_stat1;You may see results similar to:
orders idx_customer 1000000 100
products idx_category 50000 50The values help SQLite estimate:
Table size
Index selectivity
Expected row counts
What Does Selectivity Mean?
Selectivity describes how effectively an index narrows results.
Consider two columns:
High Selectivity
emailEach value is usually unique.
Example:
[email protected]
[email protected]An index on email is highly selective.
Low Selectivity
statusPossible values:
active
inactiveMany rows share the same value.
An index on status is less selective.
Why Selectivity Matters
SQLite uses selectivity estimates to determine:
Whether an index should be used
Which index is most efficient
Join order selection
Without accurate selectivity information:
SQLite may choose inefficient plans.
Understanding sqlite_stat4
While sqlite_stat1 provides summary information, SQLite can gather more detailed statistics using:
sqlite_stat4This table stores sampled index values.
Why sqlite_stat4 Exists
Imagine an index:
CREATE INDEX idx_city
ON customers(city);Suppose the data distribution is:
New York 500,000
Los Angeles 200,000
Chicago 50,000
Smalltown 10A simple average does not accurately represent reality.
Some values are extremely common.
Others are rare.
sqlite_stat4 helps SQLite understand these differences.
How sqlite_stat4 Improves Estimates
Instead of relying only on averages:
SQLite can examine sampled values and estimate:
Range sizes
Distribution patterns
Value frequencies
This produces better query plans for skewed datasets.
A Practical Example
Consider:
SELECT *
FROM customers
WHERE city = 'Smalltown';Without detailed statistics:
SQLite might assume:
Thousands of rows matchIn reality:
Only 10 rows matchWith sqlite_stat4:
SQLite can make a far better estimate.
Statistics and Index Selection
Suppose a table contains:
customer_id
status
created_atAnd indexes exist on all three columns.
The planner must decide:
Which index is most efficient?
Statistics help estimate:
Rows returned
Index traversal cost
Disk access requirements
The result is often a significantly faster plan.
Statistics and Join Planning
Statistics become even more important during joins.
Example:
SELECT *
FROM customers c
JOIN orders o
ON c.id = o.customer_id;SQLite must determine:
Which table to access first
Which indexes to use
How many rows will participate
Poor estimates can dramatically increase execution time.
How SQLite Uses Statistics Internally
The query planner performs cost calculations.
For each possible plan:
SQLite estimates:
Rows examined
Index lookups
Page reads
CPU work
The lowest estimated cost usually wins.
Statistics provide the foundation for those estimates.
When Statistics Become Outdated
Statistics are not automatically refreshed after every change.
Over time:
New rows are inserted
Old rows are deleted
Data distribution changes
Eventually:
Stored statistics may no longer reflect reality.
When to Run ANALYZE
ANALYZE is most beneficial after:
Large data imports
Significant deletions
Bulk updates
Major application growth
These events can change query behavior substantially.
Example Workflow
Imagine:
Database Size: 100,000 rowsANALYZE is executed.
Later:
Database Size: 10 million rowsStatistics may no longer represent current data.
Running:
ANALYZE;refreshes the planner’s information.
Targeting Specific Tables
You can analyze a single table:
ANALYZE orders;Or a specific index:
ANALYZE idx_customer;This can reduce maintenance overhead in large databases.
Viewing Query Planner Decisions
SQLite provides:
EXPLAIN QUERY PLANExample:
EXPLAIN QUERY PLAN
SELECT *
FROM orders
WHERE customer_id = 100;This reveals:
Index usage
Table scans
Planner choices
It’s one of the best ways to observe the impact of ANALYZE.
Potential Downsides of ANALYZE
Although ANALYZE is beneficial, it has costs.
Data Collection Time
Large databases require:
Table scanning
Index scanning
Analysis can take time.
Storage Overhead
Statistics tables consume additional space.
Usually this overhead is very small.
Maintenance Requirements
Statistics become stale over time.
Periodic updates may be necessary.
Best Practices
Run ANALYZE After Large Data Changes
Major imports and deletions often change data distribution.
Monitor Query Plans
Use:
EXPLAIN QUERY PLANto verify planner behavior.
Focus on Frequently Queried Tables
Not every table requires constant analysis.
Prioritize important workloads.
Understand Data Distribution
Highly skewed datasets often benefit most from detailed statistics.
This is where sqlite_stat4 can provide significant value.
Closing Thoughts
SQLite’s query planner depends heavily on statistical information to make intelligent decisions.
Key takeaways:
ANALYZE collects database statistics
sqlite_stat1 stores table and index summaries
sqlite_stat4 stores detailed sample information
Statistics improve row-count estimation
Better estimates lead to better query plans
Periodically refreshing statistics helps maintain performance
As databases grow, understanding how SQLite uses statistics becomes increasingly important. Query optimization is not only about creating indexes, it’s also about giving the query planner the information it needs to use those indexes effectively.
In the next guide, we’ll explore SQLite’s cost-based query optimizer and how execution plans are selected internally.
Subscribe Now
If you want practical, real-world SQLite architecture tutorials, subscribe to SQLite Forum. Subscribe to receive new articles directly.


