469,356 Members | 1,930 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,356 developers. It's quick & easy.

How to decode a string

Lad
To be able to decode a string successfully, I need to know what coding
it is in.
The string can be coded in utf8 or in windows-1250 or in another
coding.
Is there a method how to find out the string coding.
Thank you for help
L.

Aug 21 '06 #1
8 2771
Lad wrote:
To be able to decode a string successfully, I need to know what coding
it is in.
ask whoever provided the string.
The string can be coded in utf8 or in windows-1250 or in another
coding. Is there a method how to find out the string coding.
in general, no. if you have enough text, you may guess, but the right
approach for that depends on the application.

</F>

Aug 21 '06 #2
Lad

Fredrik Lundh wrote:
Lad wrote:
To be able to decode a string successfully, I need to know what coding
it is in.

ask whoever provided the string.
The string can be coded in utf8 or in windows-1250 or in another
coding. Is there a method how to find out the string coding.

in general, no. if you have enough text, you may guess, but the right
approach for that depends on the application.

</F>
Fredrik,
Thank you for your reply
The text is from Mysql table field that uses utf8_czech_ci collation,
but when I try
`RealName`.decode('utf8'),where RealName is that field of MySQL

I will get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3:
ordinal
not in range(128)

Can you please suggest the solution?
Thank you
L.

Aug 21 '06 #3
In <11**********************@m73g2000cwd.googlegroups .com>, Lad wrote:
The text is from Mysql table field that uses utf8_czech_ci collation,
but when I try
`RealName`.decode('utf8'),where RealName is that field of MySQL

I will get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3:
ordinal
not in range(128)

Can you please suggest the solution?
Do you get this from converting the value from the database or from trying
to print the unicode string? Can you give us the output of

print repr(RealName)

Ciao,
Marc 'BlackJack' Rintsch
Aug 21 '06 #4
Lad

Marc 'BlackJack' Rintsch wrote:
In <11**********************@m73g2000cwd.googlegroups .com>, Lad wrote:
The text is from Mysql table field that uses utf8_czech_ci collation,
but when I try
`RealName`.decode('utf8'),where RealName is that field of MySQL

I will get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3:
ordinal
not in range(128)

Can you please suggest the solution?

Do you get this from converting the value from the database or from trying
to print the unicode string? Can you give us the output of

print repr(RealName)

Ciao,
Marc 'BlackJack' Rintsch

for
print repr(RealName) command
I will get

P?ibylov\xe1 Ludmila
where instead of ? should be also a character
Thank you for help
L.

Aug 22 '06 #5
Lad wrote:
for
print repr(RealName) command
I will get

P?ibylov\xe1 Ludmila
where instead of ? should be also a character
that's not very likely; repr() always includes quotes, always escapes
non-ASCII characters, and optionally includes a Unicode prefix.

please try this

print "*", repr(RealName), type(RealName), "*"

and post the entire output; that is, *everything* between the asterisks.

</F>

Aug 22 '06 #6
Lad
Fredrik Lundh wrote:
Lad wrote:
for
print repr(RealName) command
I will get

P?ibylov\xe1 Ludmila
where instead of ? should be also a character

that's not very likely; repr() always includes quotes, always escapes
non-ASCII characters, and optionally includes a Unicode prefix.

please try this

print "*", repr(RealName), type(RealName), "*"

and post the entire output; that is, *everything* between the asterisks.
The result of print "*", repr(RealName), type(RealName), "*" is

* 'Fritschov\xe1 Laura' <type 'str'*
Best regards,
L

Aug 22 '06 #7
"Lad" wrote:
The result of print "*", repr(RealName), type(RealName), "*" is

* 'Fritschov\xe1 Laura' <type 'str'*
looks like the MySQL interface is returning 8-bit strings using ISO-8859-1
encoding (or some variation of that; \xE1 is "LATIN SMALL LETTER A
WITH ACUTE" in 8859-1).

have you tried passing "use_unicode=True" to the connect() call ?

</F>

Aug 22 '06 #8
Lad

Fredrik Lundh wrote:
"Lad" wrote:
The result of print "*", repr(RealName), type(RealName), "*" is

* 'Fritschov\xe1 Laura' <type 'str'*

looks like the MySQL interface is returning 8-bit strings using ISO-8859-1
encoding (or some variation of that; \xE1 is "LATIN SMALL LETTER A
WITH ACUTE" in 8859-1).

have you tried passing "use_unicode=True" to the connect() call ?

</F>
Frederik,
Thank you for your reply.
I found out that if I do not decode the string at all, it looks
correct. But I do not know why it is ok without decoding.
I use Django and I do not use use_unicode=True" to the connect() call.

Aug 22 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Simon | last post: by
3 posts views Thread by Guoqi Zheng | last post: by
3 posts views Thread by Tim Arnold | last post: by
15 posts views Thread by glacier | last post: by
1 post views Thread by Eric S. Johansson | last post: by
3 posts views Thread by d-fan | last post: by
1 post views Thread by anonymous | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.