Thanks for responding Atli,
But:
Having experimented in this area for the past week or so, I have come to be under the impression that utf8 (if that is what you refer to by unicode) is _not_ the solution.
To demonstrate:
First create a database:
-
CREATE DATABASE `test2swe`
-
DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci;
-
-
use test2swe;
-
-
CREATE TABLE `test2swe`.`t1` (
-
`id` INT NOT NULL AUTO_INCREMENT ,
-
`c` VARCHAR( 128 ) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL ,
-
PRIMARY KEY ( `id` )
-
) ENGINE = MYISAM
-
-
INSERT INTO `test2swe`.`t1` (`id` , `c`)
-
VALUES (NULL , 'foo'),
-
(NULL , 'backs\\'),
-
(NULL , 'jälkiruoka'),
-
(NULL , 'backs\\ash'),
-
(NULL , 'bar');
-
Then try the following two queries:
-
SELECT c FROM `t1` WHERE `c` LIKE '%ä%';
-
SELECT c FROM `t1` WHERE `c` LIKE '%Ä%';
-
Both return ==> jälkiruoka
Ok, fine so far.
When we try
-
SELECT c FROM `t1` WHERE `c` LIKE '%\\\\%';
-
We get an empty resultset because (I think)
http://www.collation-charts.org/mysq...wedish_ci.html
doesn't include the backslash character and it is thus left out.
When trying out
-
SELECT c FROM `t1` WHERE `c` LIKE '%\\\\%' COLLATE utf8_general_ci;
-
we get both 'backs\' and 'backs\ash' in the resultset.
But if we try the same:
-
SELECT c FROM `t1` WHERE `c` LIKE '%ä%' COLLATE utf8_general_ci;
-
We get the results backs\,backs\ash,bar and ,jälkiruoka
Everythin with an a in it because in
http://www.collation-charts.org/mysq....european.html
ä = a
To summarize: utf8 does not seem to be of much help to me.
The problem is that I have a db with potentially 100 000+ entries,
the product names of which may contain just about anything and
be searchable via user-inputed text (i.e. '%usertexthere%').
What would you Atli and the other experts out there suggest I adopt
as my approach to this problem?
Sincerely,
Hesekiel