Connecting Tech Pros Worldwide Help | Site Map

In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around

Newbie
 
Join Date: Sep 2009
Posts: 6
#1: Sep 30 '09
Has anyone come across a way to get around the
limitation that in the MySQL collation
latin1_swedish_ci the backslash character is
considered equal to a-umlaut?
http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html

5C=C4=E4

I.e. the following queries result in the same set of rows
when latin1_swedish_ci is in use, those containing either character:

SELECT * FROM products WHERE name LIKE '%\\\\%'

SELECT * FROM products WHERE name LIKE '%Ä%'

I have tried in vain to locate an answer other than transition
to utf8 in the net, since I do not necessarily have the option
of modifying the structure of the database tables or their columns
to utf8.

Thanks in advance for any help,

Hesekiel
Newbie
 
Join Date: Sep 2009
Posts: 6
#2: Sep 30 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


Regarding the original phrasing of my question:

I must stress the fact that other than the treatment of the backslash character the latin1_swedish_ci collation matches my needs exactly regarding sort order and filtering. My only problem is with the treatment of the backslash character therein.

Hesekiel
Atli's Avatar
Moderator
 
Join Date: Nov 2006
Location: Iceland
Posts: 3,745
#3: Oct 1 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


Hey.

I'm not seeing a solution to this if you are unwilling to change the charset.
This is a limitation, which you are subject to if you choose to use this particular charset.

You should really consider changing to Unicode.
Other then using a bit more disk-space (which is cheap, by the way), there is really no downside to using Unicode.
At least none that I can see.

Why not upgrade?
Newbie
 
Join Date: Sep 2009
Posts: 6
#4: Oct 1 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


Thanks for responding Atli,

But:

Having experimented in this area for the past week or so, I have come to be under the impression that utf8 (if that is what you refer to by unicode) is _not_ the solution.

To demonstrate:

First create a database:

Expand|Select|Wrap|Line Numbers
  1. CREATE DATABASE `test2swe`
  2. DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci;
  3.  
  4. use test2swe;
  5.  
  6. CREATE TABLE `test2swe`.`t1` (
  7.   `id` INT NOT NULL AUTO_INCREMENT ,
  8.   `c` VARCHAR( 128 ) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL ,
  9. PRIMARY KEY ( `id` ) 
  10. ) ENGINE = MYISAM
  11.  
  12. INSERT INTO `test2swe`.`t1` (`id` , `c`)
  13. VALUES (NULL , 'foo'),
  14.        (NULL , 'backs\\'),
  15.        (NULL , 'jälkiruoka'),
  16.        (NULL , 'backs\\ash'),
  17.        (NULL , 'bar');
  18.  
Then try the following two queries:

Expand|Select|Wrap|Line Numbers
  1. SELECT c FROM `t1` WHERE `c` LIKE '%ä%';
  2. SELECT c FROM `t1` WHERE `c` LIKE '%Ä%';
  3.  
Both return ==> jälkiruoka

Ok, fine so far.

When we try

Expand|Select|Wrap|Line Numbers
  1. SELECT c FROM `t1` WHERE `c` LIKE '%\\\\%';
  2.  
We get an empty resultset because (I think)
http://www.collation-charts.org/mysq...wedish_ci.html
doesn't include the backslash character and it is thus left out.

When trying out

Expand|Select|Wrap|Line Numbers
  1. SELECT c FROM `t1` WHERE `c` LIKE '%\\\\%' COLLATE utf8_general_ci;
  2.  
we get both 'backs\' and 'backs\ash' in the resultset.

But if we try the same:

Expand|Select|Wrap|Line Numbers
  1. SELECT c FROM `t1` WHERE `c` LIKE '%ä%' COLLATE utf8_general_ci;
  2.  
We get the results backs\,backs\ash,bar and ,jälkiruoka

Everythin with an a in it because in

http://www.collation-charts.org/mysq....european.html

ä = a

To summarize: utf8 does not seem to be of much help to me.

The problem is that I have a db with potentially 100 000+ entries,
the product names of which may contain just about anything and
be searchable via user-inputed text (i.e. '%usertexthere%').

What would you Atli and the other experts out there suggest I adopt
as my approach to this problem?

Sincerely,

Hesekiel
Newbie
 
Join Date: Sep 2009
Posts: 6
#5: Oct 1 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


Sorry about partially directing my question to the moderator who had replied to me in the final portion of my previous message.

Inadvertedly ignored posting guidelines,

Hesekiel
Newbie
 
Join Date: Sep 2009
Posts: 6
#6: Oct 1 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


corr: Inadvertently

/Hesekiel
Newbie
 
Join Date: Sep 2009
Posts: 6
#7: Oct 2 '09

re: In latin1_swedish_ci the backslash character is equal to a-umlaut, how to get around


To summarize: I am looking for a solution which would act in all respects (sort order etc.) like the latin1_swedish_ci but for the treatment of the backslash character. MySQL being one of the most used databases one would expect such a solution to exist or have been invented by someone.

After all, for example: Windows pathnames may be stored in databases, contain backslashes, nowadays may contain strange characters and anything stored in a database may and will be searched for. So I think this is an issue of wider interrest.

Hesekiel, still looking into collations and everything possible
Reply