Connecting Tech Pros Worldwide Forums | Help | Site Map

ConfigParser and Unicode

thehaas@binary.net
Guest
 
Posts: n/a
#1: Jul 18 '05
I'm trying to read a config file with Unicode characters via
ConfigParser with Python 2.3 and am having some problems. The file
looks like:

[DEFAULT]
goodProcRef=PYTHON,RSS,Grüß,LDAP

This is how I'm trying to read it:
config = ConfigParser()
config.read("work.cfg")
goodProcRef = config.get(section,"goodprocref").split(",")

goodProcRef now looks like this:[color=blue][color=green][color=darkred]
>>> goodProcRef[/color][/color][/color]
['PYTHON', 'RSS', 'Gr\xfc\xdf', 'LDAP']

Obviously, 'Grüß'!='Gr\xfc\xdf' . If I change the split to use u","
instead, I get the following:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "work.py", line 165, in parseConfig
goodProcRef = config.get(section,"goodprocref").split(u",")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 13: ordinal not in range(128)

Any ideas on how I can get the correct value?


--
Mike Hostetler
thehaas@binary.net
http://www.binary.net/thehaas

Martin v. Löwis
Guest
 
Posts: n/a
#2: Jul 18 '05

re: ConfigParser and Unicode


thehaas@binary.net wrote:[color=blue]
> Obviously, 'Grüß'!='Gr\xfc\xdf' .[/color]

It is not at all obvious that they are different. In fact, they
are the same, assuming the second string is encoding in Latin-1.
[color=blue]
> Any ideas on how I can get the correct value?[/color]

Pray tell: what is the correct value?

Regards,
Martin

Richard Brodie
Guest
 
Posts: n/a
#3: Jul 18 '05

re: ConfigParser and Unicode



<thehaas@binary.net> wrote in message news:kbl6c.29656$wg.22030@okepread01...[color=blue]
>
> Obviously, 'Grüß'!='Gr\xfc\xdf' .[/color]
[color=blue][color=green][color=darkred]
>>> 'Grüß' != 'Gr\xfc\xdf'[/color][/color][/color]
False


thehaas@binary.net
Guest
 
Posts: n/a
#4: Jul 18 '05

re: ConfigParser and Unicode


"Martin v. Löwis" <martin@v.loewis.de> wrote:[color=blue]
> thehaas@binary.net wrote:[color=green]
> > Obviously, 'Grüß'!='Gr\xfc\xdf' .[/color][/color]
[color=blue]
> It is not at all obvious that they are different. In fact, they
> are the same, assuming the second string is encoding in Latin-1.[/color]
[color=blue][color=green]
> > Any ideas on how I can get the correct value?[/color][/color]
[color=blue]
> Pray tell: what is the correct value?[/color]

The correct value is 'Grüß', or at least have it equal to that.

Maybe I should back up -- I'm interfacing into a Windows API. In that API, I see 'Grüß' as:[color=blue][color=green][color=darkred]
>>> plist[-1].Reference[/color][/color][/color]
u'Gr\xfc\xdf'

My value in goodProcList is:[color=blue][color=green][color=darkred]
>>> goodProcRef[18][/color][/color][/color]
'Gr\xfc\xdf'

(yeah, goodProcList isn't in Unicode -- that's probably the cause of all this)

When I test their equality:
[color=blue][color=green][color=darkred]
>>> goodProcRef[18] == plist[-1].Reference[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal
not in range(128)

If I try to manually encode goodProcRef[18], I get the same thing:
[color=blue][color=green][color=darkred]
>>> goodProcRef[18].encode('utf-8')[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal not in range(128)

--
Mike Hostetler
thehaas@binary.net
http://www.binary.net/thehaas
Riccardo Galli
Guest
 
Posts: n/a
#5: Jul 18 '05

re: ConfigParser and Unicode


On Thu, 18 Mar 2004 19:10:08 +0000, thehaas wrote:
[color=blue]
> "Martin v. Löwis" <martin@v.loewis.de> wrote:[color=green]
>> thehaas@binary.net wrote:[color=darkred]
>> > Obviously, 'Grüß'!='Gr\xfc\xdf' .[/color][/color]
>[color=green]
>> It is not at all obvious that they are different. In fact, they
>> are the same, assuming the second string is encoding in Latin-1.[/color]
>[color=green][color=darkred]
>> > Any ideas on how I can get the correct value?[/color][/color]
>[color=green]
>> Pray tell: what is the correct value?[/color]
>
> The correct value is 'Grüß', or at least have it equal to that.
>
> Maybe I should back up -- I'm interfacing into a Windows API. In that API, I see 'Grüß' as:[color=green][color=darkred]
> >>> plist[-1].Reference[/color][/color]
> u'Gr\xfc\xdf'
>
> My value in goodProcList is:[color=green][color=darkred]
> >>> goodProcRef[18][/color][/color]
> 'Gr\xfc\xdf'
>
> (yeah, goodProcList isn't in Unicode -- that's probably the cause of all this)
>
> When I test their equality:
>[color=green][color=darkred]
>>>> goodProcRef[18] == plist[-1].Reference[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal
> not in range(128)
>
> If I try to manually encode goodProcRef[18], I get the same thing:
>[color=green][color=darkred]
> >>> goodProcRef[18].encode('utf-8')[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal not in range(128)[/color]

by experience, you must first decode your string to encode it

so[color=blue][color=green][color=darkred]
>>> goodProcRef='Gr\xfc\xdf'.decode('latin-1')
>>> goodProcRef[/color][/color][/color]
u'Gr\xfc\xdf'

now you could compare goodProcRef and plist[-1].Reference and get "True"

When strings are unicode strings, then you can encode them easily
[color=blue][color=green][color=darkred]
>>> goodProcRef.encode('UTF8')[/color][/color][/color]
'Gr\xc3\xbc\xc3\x9f'[color=blue][color=green][color=darkred]
>>> plist[-1].Reference.encode('UTF8')[/color][/color][/color]
'Gr\xc3\xbc\xc3\x9f'

Hope it can help,
Riccardo

--
-=Riccardo Galli=-

_,e.
s~ ``
~@. ideralis Programs
.. ol
`**~ http://www.sideralis.net
thehaas@binary.net
Guest
 
Posts: n/a
#6: Jul 18 '05

re: ConfigParser and Unicode


Riccardo Galli <riccardo_cut-me@cut.me.sideralis.net> wrote:
[snip][color=blue]
> by experience, you must first decode your string to encode it[/color]
[color=blue]
> so[color=green][color=darkred]
> >>> goodProcRef='Gr\xfc\xdf'.decode('latin-1')
> >>> goodProcRef[/color][/color]
> u'Gr\xfc\xdf'[/color]
[color=blue]
> now you could compare goodProcRef and plist[-1].Reference and get "True"[/color]

Why yes, that works! Thanks Riccardo . . .

This was my first real experience with Unicode in CPython. I am
learning that I have much to learn. . . .

--
Mike Hostetler
thehaas@binary.net
http://www.binary.net/thehaas
Thomas Heller
Guest
 
Posts: n/a
#7: Jul 18 '05

re: ConfigParser and Unicode


thehaas@binary.net writes:
[color=blue]
> "Martin v. Löwis" <martin@v.loewis.de> wrote:[color=green]
>> thehaas@binary.net wrote:[color=darkred]
>> > Obviously, 'Grüß'!='Gr\xfc\xdf' .[/color][/color]
>[color=green]
>> It is not at all obvious that they are different. In fact, they
>> are the same, assuming the second string is encoding in Latin-1.[/color]
>[color=green][color=darkred]
>> > Any ideas on how I can get the correct value?[/color][/color]
>[color=green]
>> Pray tell: what is the correct value?[/color]
>
> The correct value is 'Grüß', or at least have it equal to that.
>
> Maybe I should back up -- I'm interfacing into a Windows API. In that
> API, I see 'Grüß' as:[color=green][color=darkred]
> >>> plist[-1].Reference[/color][/color]
> u'Gr\xfc\xdf'
>
> My value in goodProcList is:[color=green][color=darkred]
> >>> goodProcRef[18][/color][/color]
> 'Gr\xfc\xdf'[/color]

Try this:
[color=blue][color=green][color=darkred]
>>> "Gr\xfc\xdf".decode("latin-1")[/color][/color][/color]
u'Gr\xfc\xdf'[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]

Thomas


Closed Thread