Assume you have two varchar (or Text) columns named L and U which are
identical except that the charset for L is latin1 and the charset for
U is utf8. All the records in L and U are identical in terms of
content, consisting of only 7 bit ASCII characters. Both columns have
indexes of the same type (e.g. assume Unique indexes if you want).
Here's my question: Will the fact that column U has a utf8 charset
make select queries run slower on that column? For example, will the
query
Select * from table where U='blahblah'
run slower than the query
Select * from table where L='blahblah'
??
Significantly slower?
I'm thinking that a query on the latin1 column would go faster since
the program knows upfront that one byte equals one character, and
vice-versa; whereas in the same query on a utf8 column the program has
a lot more "overhead" because it has to constantly be determining how
many bytes represent a character. Since queries on string columns are
case insensitive, the program can't just do a byte-for-byte
comparision; rather, it has to compare *characters*, and sometimes
convert a character from upper to lower case, or vice versa, in order
to do the case-insensitive comparison.
The actual column in question is going to store URLs, so it should
only need to hold 7 bit ascii characters (in theory at least). So, in
terms of content, it shouldn't matter whether I make the column latin1
or utf8. But in terms of query speed....on, say, a few million
records...??
I would like to do everything in utf8 (web pages, forms, mysql
database columns, etc.). But since that one column might be heavily
queried, maybe I should make an exception and do it in latin1?? I wish
the mysql docs would speak to these issues.... Thanks for any help.
Paul
(ps, if you know of any good websites or books that deal with this
issue, let me know....thanks).