Articles Home

Database Management: Optimizer, Statistics and Performance

Ask yourself: What is the purpose of optimization statistics? It is to get statistics for the data in the RDBMS so the optimizer can create a query plan to find a fast path to the data, it is part of database management.

Statistics are created by running 'ANALYZE' in Oracle, 'update statistics' in Informix or 'vacuum' in Postgresql and other utilities in other RDBMS. But a performance problem may occur if the processing to get the statistics takes a significant amount of time in a large application. Sometimes people see large increases in performance from statistics and think that they must run the statistics jobs to reflect every change in the data. This is usually not needed, the optimizer will probably pick the same method to get data from a table that has a million rows as when it has a million and a half or even ten million. So, if the time to process statistics is crippling the application performance to improve performance, what to do?

If table sizes do not change drastically and the distribution of values do not change drastically the easiest way to optimize runtime of the statistics job is to run it fewer times. Instead of five times a week, run it once a week, or once a month or split the job to run a few tables per day. Or just once if the size and distribution of the data is more or less static and no indexes are changed.

If a table is fluctuating in size by magnitudes of "X" and/or attribute value distribution has a "large" difference over time run the statistics job on that particular table more often if it makes a difference in the performance of the application. If an index is added to a table, also run the statistics on that table.

Some RDBMS utilities can get statistics on a sample of the data, or some subset of statistics. The statistics job will run faster. The idea is that some statistics, even if not complete or totally accurate, will help the optimizer enough to get a fast path to the data.