473,326 Members | 2,192 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

Hello All,

This post is essentially a reply a previous post/thread
here on this mailing.database.myodbc group titled:

MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

[This version has a couple subtle edits from the orginial I posted
on mailing.database.myodbc - I'm cross posting here on this
topic/subject related newsgroup]

I was wondering if anybody has experienced the same issues
challenges I'm experiencing I'll describe shortly. Once
resolved some fascinating and powerful multi-lingual
apps incorporating non-English/latin character sets can be
realized by many developers.

I have a Unicode utf8 English - Arabic - Hebrew - Greek (and
several other languages) database in Microsoft Excel. I KNOW
that it is Unicode utf8 data because MySQL tells me it
recognizes the encoding as such but not in the context I want.

Allow me to explain ...

I can search the Unicode utf8 encoding with no problem in
Excel. While in Excel I highlight a complete word or a
partial string of an Arabic word copy it to the clipboard
(i.e. memory). I then do a find and the process is the
same successful result as if it was an English string.

MySQL 5.0 is supposed to handle Unicode utf8

I created a MySQL database I named: languages

CREATE DATABASE languages ;

and I implemented the following command on a MySQL
command prompt:

ALTER DATABASE languages DEFAULT CHARACTER SET utf8;

No problem (so far) MySQL seemingly recognized utf8 and
accepted it. My understanding is with the ALTER command
the tables I create against languages will be utf8.

I now created a table I named mainlang which denotes it
will be the main table for my languages.

mysql>CREATE TABLE mainlang
->(
->langNumID varchar(30),
->colB varchar(30),
->colC varchar(30),
->primary key (langNumID, colB)
->);

Again so far no problem: Table successfully created.
My third column 'colC' is where the Unicode data
will be stored.

I now attempt to import the database from my
Excel file into my MySQL database as follows:

mysql>load data infile 'c:\\arabicdictionary.csv'
->into table mainlang
->fields terminated by ','
->lines terminated by '\n'
->(langNumID, colB, colC);
ERROR 1406 (22001): Data too long for 'colC' at row 1

So what to do? I did a search and found other
people seemingly had the same problem and someone
suggested:

ALTER DATABASE languages DEFAULT CHARACTER SET cp1250;

I dropped mainlang, recreated it, redid the load and
Lo and behold ... it seemed to work. No Data too long
error occurred and when I did the following query:

mysql>select langNumID, colB, colC
->from mainlang
->where colB = '4994';

I see colA have a correct numeric value, colB a
correct numeric value (4994) and for colC a string of
unintelligible characters with diacritical marks,
oomlats etc. which I know is the cp1250 encoding
interpretation of the Unicode utf8 data which is
similarly unintelligible in its own regard.

Now what I try is: do a copy of the obscure colC
cp1250 character string into the clipboard/memory
and then do the following tweak on the original
select statement to see if I can search on the
(now) cp1250 character string:

mysql>select langNumID, colB, colC
->from mainlang
->where colc = 'paste of the cp1250 character string';

The computer would not allow a paste unless I pressed
the escape key. On initiating this select command
I got an empty set (no match)

My questions are:

Has anyone been successful creating a Unicode utf8
MySQL database that accepts Arabic?

If yes, how did you get around or not encounter the
Data too long issue?

Have you tried the cp1250 (or cp1251 - same mechanics
same results) work around as I have? Are you
able to search the cp1250 character string (my colC)?
If yes, how did you successfully manage to do it?

Lastly, if I take the cp1250 encoded string and paste
it into Excel ... I can string search the cp1250
encoding with no problem.

Also, here's how I know my Unicode utf-8 data is
correct apart from my own manual cross-referencing
and being recognized by MySQL in some respect:

When I copy the Unicode utf8 encoding and try to
paste it into the select command to see what would
happen I get the following error:

ERROR 1257 (HY000): Illegal mix of collations
(cp1250_general_ci, IMPLICIT) and
(utf8_general_ci, COERCIBLE) for operation '='

So what I have here is a situation where MySQL
is recognizing Unicode utf8 encoding but not
from the respect of packing a table!

Go Figure ...

Anyone wishing to share any insight or solution would
be GREATLY appeciated. I promise if I find a solution
I will share it.

Thank you Very Much, Shukran Jiddan, Todah Rabah,
Muchos Gracias ...

Joel S
(585) 255-0997
jrs_14618 at yahoo.com

Jun 13 '06 #1
1 4812
jr*******@yahoo.com wrote:
Hello All,

This post is essentially a reply a previous post/thread
here on this mailing.database.myodbc group titled:

MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

[This version has a couple subtle edits from the orginial I posted
on mailing.database.myodbc - I'm cross posting here on this
topic/subject related newsgroup]

I was wondering if anybody has experienced the same issues
challenges I'm experiencing I'll describe shortly. Once
resolved some fascinating and powerful multi-lingual
apps incorporating non-English/latin character sets can be
realized by many developers.

I have a Unicode utf8 English - Arabic - Hebrew - Greek (and
several other languages) database in Microsoft Excel. I KNOW
that it is Unicode utf8 data because MySQL tells me it
recognizes the encoding as such but not in the context I want.

Allow me to explain ...

I can search the Unicode utf8 encoding with no problem in
Excel. While in Excel I highlight a complete word or a
partial string of an Arabic word copy it to the clipboard
(i.e. memory). I then do a find and the process is the
same successful result as if it was an English string.

MySQL 5.0 is supposed to handle Unicode utf8

I created a MySQL database I named: languages

CREATE DATABASE languages ;

and I implemented the following command on a MySQL
command prompt:

ALTER DATABASE languages DEFAULT CHARACTER SET utf8;

No problem (so far) MySQL seemingly recognized utf8 and
accepted it. My understanding is with the ALTER command
the tables I create against languages will be utf8.

I now created a table I named mainlang which denotes it
will be the main table for my languages.

mysql>CREATE TABLE mainlang
->(
->langNumID varchar(30),
->colB varchar(30),
->colC varchar(30),
->primary key (langNumID, colB)
->);

Again so far no problem: Table successfully created.
My third column 'colC' is where the Unicode data
will be stored.

I now attempt to import the database from my
Excel file into my MySQL database as follows:

mysql>load data infile 'c:\\arabicdictionary.csv'
->into table mainlang
->fields terminated by ','
->lines terminated by '\n'
->(langNumID, colB, colC);
ERROR 1406 (22001): Data too long for 'colC' at row 1

So what to do? I did a search and found other
people seemingly had the same problem and someone
suggested:

ALTER DATABASE languages DEFAULT CHARACTER SET cp1250;

I dropped mainlang, recreated it, redid the load and
Lo and behold ... it seemed to work. No Data too long
error occurred and when I did the following query:

mysql>select langNumID, colB, colC
->from mainlang
->where colB = '4994';

I see colA have a correct numeric value, colB a
correct numeric value (4994) and for colC a string of
unintelligible characters with diacritical marks,
oomlats etc. which I know is the cp1250 encoding
interpretation of the Unicode utf8 data which is
similarly unintelligible in its own regard.

Now what I try is: do a copy of the obscure colC
cp1250 character string into the clipboard/memory
and then do the following tweak on the original
select statement to see if I can search on the
(now) cp1250 character string:

mysql>select langNumID, colB, colC
->from mainlang
->where colc = 'paste of the cp1250 character string';

The computer would not allow a paste unless I pressed
the escape key. On initiating this select command
I got an empty set (no match)

My questions are:

Has anyone been successful creating a Unicode utf8
MySQL database that accepts Arabic?

If yes, how did you get around or not encounter the
Data too long issue?

Have you tried the cp1250 (or cp1251 - same mechanics
same results) work around as I have? Are you
able to search the cp1250 character string (my colC)?
If yes, how did you successfully manage to do it?

Lastly, if I take the cp1250 encoded string and paste
it into Excel ... I can string search the cp1250
encoding with no problem.

Also, here's how I know my Unicode utf-8 data is
correct apart from my own manual cross-referencing
and being recognized by MySQL in some respect:

When I copy the Unicode utf8 encoding and try to
paste it into the select command to see what would
happen I get the following error:

ERROR 1257 (HY000): Illegal mix of collations
(cp1250_general_ci, IMPLICIT) and
(utf8_general_ci, COERCIBLE) for operation '='

So what I have here is a situation where MySQL
is recognizing Unicode utf8 encoding but not
from the respect of packing a table!

Go Figure ...

Anyone wishing to share any insight or solution would
be GREATLY appeciated. I promise if I find a solution
I will share it.

Thank you Very Much, Shukran Jiddan, Todah Rabah,
Muchos Gracias ...

Joel S
(585) 255-0997
jrs_14618 at yahoo.com


No idea, Joel. Why don't you try asking in a mysql database newsgroup - such as
comp.databases.mysql. This newsgroup is for PHP programming.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jun 14 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tarbuza Smith | last post by:
I was told that it is easy to create a table in MYSQL but it is hard to create a table with relationship like in Oracle or SQL Server by using Relationship or Query by Design or whatever wysiwyg...
74
by: John Wells | last post by:
Yes, I know you've seen the above subject before, so please be gentle with the flamethrowers. I'm preparing to enter a discussion with management at my company regarding going forward as either...
39
by: Mairhtin O'Feannag | last post by:
Hello, I have a client (customer) who asked the question : "Why would I buy and use UDB, when MySql is free?" I had to say I was stunned. I have no experience with MySql, so I was left sort...
4
by: rass.elma | last post by:
Hi all I fetch the following value from a string (VCAHR(250))colmun in a MySql table: "300000000000000000000000000000000000000000000000000" When I write it out using echo() I get : 3E+50 ...
1
by: marcfischman | last post by:
Please help. I have a website running on a linux/apache/mysql/php server. I receive about 8,000-10,000 visitors a day with about 200,000 to 300,000 page views. The server is a RedHat Linux...
0
by: Vineeta | last post by:
Volt is the leader in providing workforce design and is currently the 5th largest staffing firm nationwide. Volt is an E.O.E. Our client is a top entertainment/community network and a large-scale...
0
by: yeahuh | last post by:
Quick and dirty version. Godaddy server using MySQL 4.0.24 I’m trying a left join to obtain id’s in table A(cars) that are NOT in car_id in table B(newspaper): *This is a cut down version...
0
by: cwho.work | last post by:
Hi! We are using apache ibatis with our MySQL 5.0 database (using innodb tables), in our web application running on Tomcat 5. Recently we started getting a number of errors relating to...
4
omerbutt
by: omerbutt | last post by:
hi every one I am A new Bee to php mysql and i was surfing through the net to learn about how to secure the mysql when you are working in a web environment while working with php html and javascript...
2
by: paulysa | last post by:
Hi there. I am experimenting with Apache/MySQL/PHP to set up a contact list. Background Am a bit of a dabbler. Downloaded MySQL 5.0.67 for windows and Apache 2.2.9 with OpenSSL-0.9.8h-r2 and PHP...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.