Shard Sizing, Node Count, and Query Fan-Out in Elasticsearch
Most Elasticsearch performance problems aren't about hardware. They're about how data is distributed across shards and what happens when a query has to touch all of them.
The Fan-Out Problem
Every Elasticsearch query fans out to every shard in the target index. A coordinating node sends the request to all relevant shards, waits for responses, merges results, and returns the final answer.
This works well when:
- You have a reasonable number of shards (dozens, not thousands)
- Your result set is small (top 10-100 hits)
- Each shard can respond quickly
It breaks down when any of these assumptions fail.
Why Shard Count Matters
Each shard is a self-contained Lucene index with its own memory overhead, file handles, and thread requirements. More shards means:
- Higher coordination overhead: The coordinating node must manage more concurrent requests and merge more partial results.
- More heap pressure: Each shard consumes memory for segment metadata, caches, and query execution state.
- Longer tail latencies: Your query is only as fast as the slowest shard. More shards means more chances for one to be slow.
The common advice—aim for shards between 10GB and 50GB—exists precisely because smaller shards multiply these costs without proportional benefit.
Node Count and Shard Distribution
Shards are distributed across nodes. If you have 30 shards and 3 nodes, each node holds 10 primary shards. Every query hits every node.
Adding nodes doesn't reduce fan-out—it spreads the load. With 6 nodes, each holds 5 shards, but queries still touch all 6 nodes. You've halved per-node work but doubled coordination traffic.
The relationship matters:
- Fewer shards than nodes: Some nodes sit idle for that index. Wasted capacity.
- Many more shards than nodes: Each node handles multiple shards per query. High per-node load.
- Balanced ratio: 1-3 shards per node per index gives reasonable distribution without excessive overhead.
Large Result Sets: Where It Gets Expensive
Standard pagination with from and size has a hard limit: you can't paginate beyond 10,000 results by default (controlled by index.max_result_window). This limit exists because deep pagination is expensive.
To fetch result 10,001, Elasticsearch must:
- Ask each shard for its top 10,001 results
- Merge all those results on the coordinating node
- Discard the first 10,000
- Return result 10,001
With 30 shards, that's 300,030 documents sorted and merged just to return one page. Heap usage and latency explode.
search_after and Point-in-Time (PIT)
For large result sets, search_after with a Point-in-Time (PIT) is the correct pattern. Instead of tracking absolute position, you provide the sort values of the last document you received, and Elasticsearch efficiently seeks to that position.
Key benefits:
- Constant memory cost: Each page fetches only the documents it needs. No accumulating heap pressure.
- Consistent results: The PIT freezes index state, so concurrent writes don't cause documents to shift between pages.
- No depth limit: You can paginate through millions of results without hitting the 10,000 result window.
The tradeoff: you must include a tiebreaker sort field (usually _shard_doc or a unique ID) to ensure deterministic ordering. And PIT contexts consume resources—they should be closed when you're done.
Practical Guidance
When planning index architecture:
- Start with fewer shards. You can always split later. Merging oversharded indices is painful.
- Size shards for your query patterns, not just storage. If you're doing heavy aggregations, larger shards (30-50GB) often perform better than many small ones.
- Match shard count to node count. For hot indices, aim for 1-2 primary shards per node.
- Use search_after for exports and large scans. If your use case involves iterating beyond a few hundred results, don't fight the system—use the pattern designed for it.
- Monitor fan-out in slow logs. If queries consistently hit many shards with high merge times, your shard strategy needs review.
The Bottom Line
Shard sizing isn't a one-time decision—it's an ongoing balance between storage efficiency, query performance, and operational complexity. Get the fundamentals right, use the appropriate patterns for your access needs, and you'll avoid the most common performance traps.
Need help optimizing your Elasticsearch shard strategy?
Schedule a Consultation