471,316 Members | 1,063 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,316 software developers and data experts.

convert string function and built-in conversions

It seems to me that these values should be the same:

select 'lydia eugenia treviño', convert('lydia eugenia treviño' using
ascii_to_utf_8);

but they seem to be different. What am I missing?

culley

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #1
3 3426
On Sun, 19 Oct 2003, culley harrelson wrote:
It seems to me that these values should be the same:

select 'lydia eugenia treviño', convert('lydia eugenia treviño' using
ascii_to_utf_8);

but they seem to be different. What am I missing?


I don't think the marked n is a valid ascii character (it might be
extended ascii, but that's different and not really standard afaik).
You're probably getting the character associated with the lower 7 bits.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #2
It is one of the extended characters in iso-8859-1. This data was taken
from a text field in a SQL_ASCII database. Basically what I am trying to
do is migrate data from a SQL_ASCII database to a UNICODE database by
running all the data through an external script that does something like:

select convert(my_field using ascii_to_utf_8) from my_table;

then inserts the selected text into an identical table in the unicode
database. All the data goes across, but extended characters such as ñ
are getting munged. The docs indicate that ascii_to_utf_8 is for
SQL_ASCII -> UNICODE... Are you saying that ñ isn't really an ASCII
character even though it is valid in a SQL_ASCII database? I have found
that all extended characters of the various LATIN encodings will work
just fine in my SQL_ASCII database.

This project is a big can of worms... Every 6 months I open the can,
stir the worms around a bit, wrinkle my nose then promptly close the can
again and stuff it away for another 6 months. :) Wish I could figure it
out.

On Sun, 19 Oct 2003 00:31:43 -0700 (PDT), "Stephan Szabo"
<ss****@megazone.bigpanda.com> said:
On Sun, 19 Oct 2003, culley harrelson wrote:
It seems to me that these values should be the same:

select 'lydia eugenia treviño', convert('lydia eugenia treviño' using
ascii_to_utf_8);

but they seem to be different. What am I missing?


I don't think the marked n is a valid ascii character (it might be
extended ascii, but that's different and not really standard afaik).
You're probably getting the character associated with the lower 7 bits.


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3
On Sun, 19 Oct 2003, culley harrelson wrote:
It is one of the extended characters in iso-8859-1. This data was taken
from a text field in a SQL_ASCII database. Basically what I am trying to
do is migrate data from a SQL_ASCII database to a UNICODE database by
running all the data through an external script that does something like:

select convert(my_field using ascii_to_utf_8) from my_table;

then inserts the selected text into an identical table in the unicode
database. All the data goes across, but extended characters such as ñ
are getting munged. The docs indicate that ascii_to_utf_8 is for
SQL_ASCII -> UNICODE... Are you saying that ñ isn't really an ASCII
character even though it is valid in a SQL_ASCII database? I have found
that all extended characters of the various LATIN encodings will work
just fine in my SQL_ASCII database.


I would guess that it's not actually forcing/checking the characters for 7
bitness in SQL_ASCII, but that the conversions are treating them as if you
had actually only put in valid 7 bit values (as they appear to be doing
an & 0x7F in at least the routines I looked at).

If you're actually putting iso-8859-1 (latin1) in there, try the
conversion from iso-8859-1 to utf8. It doesn't appear to display properly
in my iso-8859-1 terminal, but taking that string and inserting it into a
unicode database and then setting my client_encoding to iso-8859-1 gives
me the original string back when I select it.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Mark S Pryor | last post: by
6 posts views Thread by Red Devil | last post: by
4 posts views Thread by Jason Huang | last post: by
4 posts views Thread by simon | last post: by
19 posts views Thread by simon | last post: by
3 posts views Thread by priyanka | last post: by
3 posts views Thread by mamul | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.