On partial indexes for string columns

After reading Fernando Ipar’s interesting post on partial indexes for string columns, there were two things I wanted to note:

First, this trick works quite well, but only if your like clauses only ever use the wildcard on the right hand side (or not at all). MySQL will not be able to use the index if the like contains a wildcard on the left.

Consider the following table definition:

mysql> show create table people\G
*************************** 1. row ***************************
Table: people
Create Table: CREATE TABLE `people` (
`person_id` int(15) NOT NULL default '0',
`username` varchar(255) default NULL,
`email` varchar(255) default NULL,
PRIMARY KEY (`person_id`),
KEY `people_username` (`username`(5))
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Now, see the following queries:


mysql> explain select username from people where username like 'jo%';
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+-------------+
| 1 | SIMPLE | people | range | people_username | people_username | 8 | NULL | 394 | Using where |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+-------------+
1 row in set (0.00 sec)


mysql> explain select username from people where username like '%jo';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | people | ALL | NULL | NULL | NULL | NULL | 128928 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)


mysql> explain select username from people where username like '%jo%';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | people | ALL | NULL | NULL | NULL | NULL | 128928 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)

As you can see, the first one nicely uses the index. The second and third though, cannot use it at all.
Now, we add an index on the full username field:


mysql> create index people_username_full on people(username);
Query OK, 128928 rows affected (6.72 sec)
Records: 128928 Duplicates: 0 Warnings: 0

Then, rerunning the queries we see the result changing:


mysql> explain select username from people where username like 'jo%';
+----+-------------+--------+-------+--------------------------------------+----------------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+--------------------------------------+----------------------+---------+------+------+--------------------------+
| 1 | SIMPLE | people | range | people_username,people_username_full | people_username_full | 258 | NULL | 143 | Using where; Using index |
+----+-------------+--------+-------+--------------------------------------+----------------------+---------+------+------+--------------------------+
1 row in set (0.01 sec)


mysql> explain select username from people where username like '%jo';
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
| 1 | SIMPLE | people | index | NULL | people_username_full | 258 | NULL | 128928 | Using where; Using index |
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)


mysql> explain select username from people where username like '%jo%';
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
| 1 | SIMPLE | people | index | NULL | people_username_full | 258 | NULL | 128928 | Using where; Using index |
+----+-------------+--------+-------+---------------+----------------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

Notice how all queries are now using the index. Even the first query now uses the longer but more detailed index.

The second suggestion I wanted to make is one that is exploiting this index option for right-oriented ‘like’ searches. A common use is for email addresses. Often, they are being searched by e.g. domain name. If you add an extra field to your table that stores the reverse of the email field, you can then use a partial index to index the email field from the right. You will need the MySQL function REVERSE() for this:


mysql> alter table people add email_reverse varchar(255) null;
Query OK, 128928 rows affected (4.63 sec)
Records: 128928 Duplicates: 0 Warnings: 0

mysql> desc people;
+---------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| person_id | int(15) | NO | PRI | 0 | |
| username | varchar(255) | YES | MUL | NULL | |
| email | varchar(255) | YES | | NULL | |
| email_reverse | varchar(255) | YES | | NULL | |
+---------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)


mysql> update people set email_reverse = REVERSE(email);
Query OK, 128928 rows affected (1.36 sec)
Rows matched: 128928 Changed: 128928 Warnings: 0


mysql> create index people_email_rev on people(email_rev(10));
Query OK, 128928 rows affected (6.72 sec)
Records: 128928 Duplicates: 0 Warnings: 0

Now, you can execute queries with right handed ‘like’ clauses using the partial index:


mysql> explain select username from people where email_reverse like 'moc.oohay%';
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | people | range | people_email_rev | people_email_rev | 13 | NULL | 1 | Using where |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select username from people where email_reverse like REVERSE('%yahoo.com');
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | people | range | people_email_rev | people_email_rev | 13 | NULL | 1 | Using where |
+----+-------------+--------+-------+------------------+------------------+---------+------+------+-------------+
1 row in set (0.00 sec)

One thought on “On partial indexes for string columns”

  1. That’s a great use of the reverse function, I never thought of it :)

    As for the full index, you had me testing that query for a while until I realized it only uses the index if you only project that very field, i.e., your very specific query is the one that uses the index (select from table where LIKE ‘%string%’).

    Of course, it still needs to process all of the tuples (if I don’t misunderstand explains output, and I hope I don’t since I don’t think there’s a way for any database to optimize those kind of queries unless we go for something like a FULLTEXT index, or move it out of the database altogether with Sphinx or Lucene), but doing it with an index is certainly more efficient.

    Now, in order to take advantage of this index while projecting other columns from the result, this is all I came up with (I had to change the example because I already had data on this table):

    mysql> explain select * from tasks where city in (select city from tasks where city like ‘%fi%’)\G
    *************************** 1. row ***************************
    id: 1
    select_type: PRIMARY
    table: tasks
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 150000
    Extra: Using where
    *************************** 2. row ***************************
    id: 2
    select_type: DEPENDENT SUBQUERY
    table: tasks
    type: index_subquery
    possible_keys: city,idx_full_city
    key: city
    key_len: 253
    ref: func
    rows: 1
    Extra: Using index; Using where
    2 rows in set (0.00 sec)

    What do you think? Is it any use? Is there a better way? :)

    Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>