Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old January 24th, 2006, 04:45 PM
ralphie
Guest
 
Posts: n/a
Default php5 mssql utf-8 problem

hi all

since nearly 2 days i fight with mssql and utf-8 as i need to store and
retrieve arabic characters.

i tried the com approach

http://groups.google.com/group/maili...1aa0bd48de23f0

i tried as well the odbtp library, odbc and obviously the mssql native
extension.. but i never succeded so far to store utf-8 data in the
mssql server (column type nvarchar). if anybody has an idea, an
approach or thinks he knows what i might have done wrong.. it is all
highly appreciated.

ou yes the content type is specified as follows in the document

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

cheers ralphie

  #2  
Old January 24th, 2006, 07:05 PM
Benzari.Alex@gmail.com
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

I was fighting with this problem today(i tested romanian deutch and
russian characters).
The main requiremnts are:
1. Your document must be UTF-8
2. Your database table or column must be UTF-8
3. Before calling any queries you should execute the following query

SET NAMES 'utf8'

this will set conection to mysql in utf-8 mode if it wasn't set by the
default MySql config.
4. Just Insert your data in the table and then you will be able to
select it.

Hope this will help.

P.S. I tested all this stuff on PHP 5 and Mysql 4.1 so if your config
is different sorry.

  #3  
Old January 24th, 2006, 09:15 PM
ralphie
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

thanks alex i will try to do something similar tomorrow.. but am not
really hopeful, because i use microsoft sql server.. and the connection
between php and mssql seems to be realy unfriendly ot utf-8 :-(

  #4  
Old January 25th, 2006, 06:25 AM
Chung Leong
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

ralphie wrote:[color=blue]
> hi all
>
> since nearly 2 days i fight with mssql and utf-8 as i need to store and
> retrieve arabic characters.
>
> i tried the com approach
>
> http://groups.google.com/group/maili...1aa0bd48de23f0
>
> i tried as well the odbtp library, odbc and obviously the mssql native
> extension.. but i never succeded so far to store utf-8 data in the
> mssql server (column type nvarchar). if anybody has an idea, an
> approach or thinks he knows what i might have done wrong.. it is all
> highly appreciated.
>
> ou yes the content type is specified as follows in the document
>
> <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
>
> cheers ralphie[/color]

At work, we also have to deal with the Arabic script on a daily basis
and we also happen to use MSSQL. I have found is that there's no good
way to make use of the nvarchar, ntext column type. Maybe there's a
secret knob somewhere. I didn't find it in any event. So what we do is
just store the text as UTF-8 in regular varchar and text columns.

Varchar columns are a bit problematic, since you risk chopping a
character in half when truncation happens. Retrieving varchar columns
wider than 255 is also somewhat tricky using the extension (COM on the
other hand, leaks memory). So we usually use a text column when there
is no natural limit on text length.

MSSQL will sometimes complain when you put a UTF-8 string into a query
because it doesn't think that what you have is valid text. What you
have to do is just CONVERT() or CAST() to force it to accept the
string.

  #5  
Old January 25th, 2006, 10:15 AM
ralphie
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

thanks chung, i'm really happy to find a mate fighting the same
problems.

you suggest using normal varchar or text fields. how to store utf-8 in
them? well i know if i insert it through php it works and it turns out
in some wired characters (at least displayed like that in the
enterprise manager). the nice thing is if i fetch it again via php it
comes back as arabic text. that so far is cool. the problem arises if i
want to import data via the enterprise manager, this one would convert
the arabic signs to ?? only. if i want to copy past the data into it,
it does not accept it at all in normal fields but displays it correctly
in the n-types.. the ohter thing i wonder about is the sorting and
stuff.. if the data are stored in this wired (sorry don't know the name
for what that is, just some symbols) format i wonder about the
collation (sorting, etc) stuff? would it still work fine?

any further advise is greatly appreciated!

  #6  
Old January 25th, 2006, 03:55 PM
Chung Leong
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

ralphie wrote:[color=blue]
> thanks chung, i'm really happy to find a mate fighting the same
> problems.
>
> you suggest using normal varchar or text fields. how to store utf-8 in
> them? well i know if i insert it through php it works and it turns out
> in some wired characters (at least displayed like that in the
> enterprise manager). the nice thing is if i fetch it again via php it
> comes back as arabic text. that so far is cool. the problem arises if i
> want to import data via the enterprise manager, this one would convert
> the arabic signs to ?? only. if i want to copy past the data into it,
> it does not accept it at all in normal fields but displays it correctly
> in the n-types.. the ohter thing i wonder about is the sorting and
> stuff.. if the data are stored in this wired (sorry don't know the name
> for what that is, just some symbols) format i wonder about the
> collation (sorting, etc) stuff? would it still work fine?[/color]

Yeah, that's the problem with storing it as UTF-8: the text will show
up as garbage in Enterprise Manager and Query Analyser. Collation will
also be weird. If you set the column to do binary sort, the result
would be somewhat reasonable, as the alef will still come before the
alef maksura encoded in UTF-8. The digits come after the letters
though, which might not be what you'd want. The order would be
incorrect too if you need to use pe or gaf.

The n- columns offer the best solution. I would give ADODB another
shot. COM is problematic in PHP4. Support might have improved in PHP5.
In theory, if you pass CP_UTF8 as codepage to COM(), you'd get UTF-8
text out of the resultsets. I remember that when I tried it though,
nothing happened. This was a couple years ago. Maybe newer versions of
ADODB would work better.

  #7  
Old January 25th, 2006, 07:35 PM
ralphie
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

after another day of attempts without success (i think i succeded to
write utf-8 into mssql, showed only ??? though but this is probably
because of enterprise manager not beeing able to properly display) i
give it up (never got proper utf-8 out of the db, only straight after
import but that was mostlikely not even utf-8.. whatever i'm tired and
will join your way to use the "strange sign" method and write manually
an import script as it works only if i stay whithin php.. whatever
thanks for walking along with me and your inputs chung.

  #8  
Old January 25th, 2006, 08:55 PM
Chung Leong
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem


ralphie wrote:[color=blue]
> after another day of attempts without success (i think i succeded to
> write utf-8 into mssql, showed only ??? though but this is probably
> because of enterprise manager not beeing able to properly display) i
> give it up (never got proper utf-8 out of the db, only straight after
> import but that was mostlikely not even utf-8.. whatever i'm tired and
> will join your way to use the "strange sign" method and write manually
> an import script as it works only if i stay whithin php.. whatever
> thanks for walking along with me and your inputs chung.[/color]

Yeah, the lack of support is rather disappointing. One of these days
I'm going to sit down and right an improved version of the mssql
extension.

  #9  
Old January 26th, 2006, 10:45 AM
ralphie
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

if you get to this point let me know me at ralphie.ch yeah? maybe its
easier to improve the odbtp though as it seems to be a pretty straight
forward thing, but still didnt help me.. well the neatest is definetly
if the native extension works anyway.. or we just need to leave
mssql.. but probably its bound to customers in your case as well..

  #10  
Old January 26th, 2006, 07:35 PM
Chung Leong
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem


ralphie wrote:[color=blue]
> if you get to this point let me know me at ralphie.ch yeah? maybe its
> easier to improve the odbtp though as it seems to be a pretty straight
> forward thing, but still didnt help me.. well the neatest is definetly
> if the native extension works anyway.. or we just need to leave
> mssql.. but probably its bound to customers in your case as well..[/color]

Well, since things are working the way they're now, I won't start
working on it anytime soon. My thinking is to develop something with
identical API to the current mssql extension, but takes to the server
using OLE-DB instead of the old DB library.

  #11  
Old January 26th, 2006, 07:45 PM
ralphie
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

i see.. eventhough i'd rather have proper utf8 in the database what?
whatever somethimes you have to cut down your expectation to reality :-)

  #12  
Old January 27th, 2006, 12:35 AM
Chung Leong
Guest
 
Posts: n/a
Default Re: php5 mssql utf-8 problem

ralphie wrote:[color=blue]
> i see.. eventhough i'd rather have proper utf8 in the database what?
> whatever somethimes you have to cut down your expectation to reality :-)[/color]

Ok, after playing around with this a bit more I have somewhat of a
solution. It's a ugly, nasty hack and quite limited. Basically, since
the mssql extension won't return the data as UTF-8, we will ask MSSQL
to return the text as binary, and then we convert it to UTF-8 with
iconv.

To insert some text, we first convert from UTF-8 to UCS-2, little
endian:

$ucs2 = iconv("UTF-8", "UCS-2LE", $utf8);

Then we put it into binary representation:

$a = unpack('H*hex', $s);
$hex = '0x' . $a['hex'];

Now stick that into a INSERT statement. MSSQL will do an implicit
conversion to nvarchar if that's the type of the column:

mssql_query("INSERT INTO table (nvarchar_col) VALUES($hex)");


Retrieval is a bit tricky, as the mssql extension is too dumb to get
varbinary bigger than 255. It handles image columns with no problem
however. So what we do is cast from nvarchar to varbinary, then from
there to image:

mssql_query("SELECT CAST(CAST(nvarchar_col AS varbinary(8000)) AS
image) AS col FROM table");

The data comes out in UCS-2, so we have to convert it to UTF-8:

$utf8 = iconv("UCS-2LE", "UTF-8", $ucs2);

It's not a great solution but seems to work in my tests. Conversion is
only possible between nvarchar and varbinary. You can't cast a ntext
column to image. So you're limited to nvarchar's 4000 maximum size.

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles