Tuesday, February 1, 2011

MySQL Updating Index Statistics

The MySQL query optimizer uses two API calls to ask the storage engines how index values are distributed when deciding how to use indexes. The first is the records_in_ range( ) call, which accepts range end points and returns the (possibly estimated) number of records in that range. The second is info( ), which can return various types of data, including index cardinality (how many records there are for each key value).

When the storage engine doesn’t provide the optimizer with accurate information about the number of rows a query will examine, the optimizer uses the index statistics,which you can regenerate by running ANALYZE TABLE, to estimate the number of rows. MySQL’s optimizer is cost-based, and the main cost metric is how much data the query will access. If the statistics were never generated, or if they are out of date,the optimizer can make bad decisions. The solution is to run ANALYZE TABLE.

Each storage engine implements index statistics differently, so the frequency with which you’ll need to run ANALYZE TABLE differs, as does the cost of running the statement:

 - The Memory storage engine does not store index statistics at all.
 - MyISAM stores statistics on disk, and ANALYZE TABLE performs a full index scan to compute cardinality. The entire table is locked during this process.
 - InnoDB does not store statistics on disk, but rather estimates them with random index dives the first time a table is opened. ANALYZE TABLE uses random dives for InnoDB, so InnoDB statistics are less accurate, but they may not need manual updates unless you keep your server running for a very long time. Also, ANALYZE TABLE is nonblocking and relatively inexpensive in InnoDB, so you can update the statistics online without affecting the server much.

You can examine the cardinality of your indexes with the SHOW INDEX FROM command. For example:

mysql> SHOW INDEX FROM albums

*************************** 1. row ***************************
Table: albums
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 250
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:

*************************** 2. row ***************************
Table: albums
Non_unique: 1
Key_name: reg_id
Seq_in_index: 1
Column_name: last_name
Collation: A
Cardinality: 250
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:

This command gives quite a lot of index information, which the MySQL manual explains in detail. We do want to call your attention to the Cardinality column,though. This shows how many distinct values the storage engine estimates are in the index. You can also get this data from the INFORMATION_SCHEMA.STATISTICS table in MySQL 5.0 and newer, which can be quite handy. For example, you can write queries against the INFORMATION_SCHEMA tables to find indexes with very low electivity.

No comments:

Post a Comment