By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,715 Members | 1,815 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,715 IT Pros & Developers. It's quick & easy.

UTF-8 problems with 4.1.1

P: n/a
i just thought i'd shoot out a quick email on problems i've been having with
utf-8 in moving from 4.1.0 to 4.1.1. (please note that because i am using
UTF-8 as my default character set, i compiled from source rather than using
a premade binary)

i know that i'm working with alpha software - this is more of a
warning/sharing of knowledge than a complaint.

BASIC ISSUE: i am unable to use UTF-8 with mysql 4.1.1 and connector/j
3.0.9.

PROBLEM #1: bad default character set in 4.1.0.
when i created the database originally, i used the following command:
mysql> create database foo default character set utf8 collate 'binary';
unfortunately, when i use "show create database foo;" immediately
afterwards, i get:
CREATE DATABASE 'foo' /*!40100 DEFAULT CHARACTER SET binary */
(i had to use a binary collation so that a unique index would be able to
tell words apart even if they were identical except for accents)

as it turns out, this worked just fine. i was able to create code that did
what i wanted given this setup.

PROBLEM #2: my old tables don't work with 4.1.1.
whether i just copy the old tables over to the new directory, or use
mysqldump on the old tables to bring them over, the UTF-8 data comes out as
garbage. more specifically, i am accessing the database using connector/j,
and the query "select name from my_user_table" will return data in the
following way:
- single-byte characters are returned correctly
- multi-byte characters are returned as individual bytes OR'd with
0xFF00. so, for instance, the one character, three byte string represented
by 0xAB 0xCD 0xEF is returned as 0xFFAB 0xFFCD 0xFFEF.

PROBLEM #3: i can't enter new UTF-8 data into a 4.1.1 table using
connector/j
using the statement "insert into my_table values ('a','b','c')" works only
if the values are all completely single-byte. otherwise i get a
SQLException: errorCode=1064, message=Syntax error or access violation,
message from server: "You have an error in your SQL syntax. Check the manual
that corresponds to your MySQL server version for the right syntax to use
near 'c' at line 1". This only occurs if one of the values contains
multi-byte characters.

NOTE 1: i have checked the contents of the database using "select hex(name)
from my_user_table" and determined that the values appear to be correct.
so, i wonder if the problem might be with connector/j?

QUESTION: is anyone else out there using 4.1.1 with UTF-8 data? if so, i'd
be curious to find out how you're doing it. obviously i'd be overjoyed if
things just worked, but if i could fix problem #3 then i'd be able to create
a conversion program and migrate.

any thoughts?

daniel

p.s. here is how i compiled both versions:
$ CFLAGS="-O3 -mcpu=pentiumpro" CXX=gcc
CXXFLAGS="-O3 -mcpu=pentiumpro -felide-constructors -fno-exceptions -fno-rtt
i"
../configure --prefix=/usr/local/mysql --with-charset=utf8 --enable-thread-sa
fe-client --enable-local-infile --enable-assembler --disable-shared
$ make
Jul 19 '05 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.