Statistics the planner doesn't have that it really needs
The planner/optimizer is a voracious consumer of information, no matter how much we feed it it always wants more and better data. Some things we really need to make better decisions:
- "clusteredness" metric to replace the use of "correlation" for estimating how much random access and cache hit rate for index scans
- Cross-column dependencies to deal with selectivity estimates for clauses
on multiple columns. Currently we assume they're independent which can lead to overly optimistic estimates.
- Estimating n_distinct is a hard problem and
probably requires scanning much larger samples to get good data. A good algorithm was [? published].