By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 1,963 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

Locale case change not working

P: n/a
When using unicode the case change works:
>>print u''.lower()


But when using the pt_BR.utf-8 locale it doesn't:
>>locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8')
'pt_BR.utf-8'
>>locale.getlocale()
('pt_BR', 'utf')
>>print ''.lower()


What am I missing? I'm in Fedora Core 5 and Python 2.4.3.

# cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

Regards, Clodoaldo Pinto Neto

May 24 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Clodoaldo wrote:
When using unicode the case change works:
>>>print u'É'.lower()
é

But when using the pt_BR.utf-8 locale it doesn't:
>>>locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8')
'pt_BR.utf-8'
>>>locale.getlocale()
('pt_BR', 'utf')
>>>print 'É'.lower()
É

What am I missing? I'm in Fedora Core 5 and Python 2.4.3.

# cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

Regards, Clodoaldo Pinto Neto
str.lower() operates on bytes and therefore doesn't handle encodings with
multibyte characters (like utf-8) properly:
>>u"É".encode("utf8")
'\xc3\x89'
>>u"É".encode("latin1")
'\xc9'
>>import locale
locale.setlocale(locale.LC_ALL, "de_DE.utf8")
'de_DE.utf8'
>>print unicode("\xc3\x89".lower(), "utf8")
É
>>locale.setlocale(locale.LC_ALL, "de_DE.latin1")
'de_DE.latin1'
>>print unicode("\xc9".lower(), "latin1")
é

I recommend that you forget about byte strings and use unicode throughout.

Peter
May 24 '07 #2

P: n/a
On May 24, 6:40 am, Peter Otten <__pete...@web.dewrote:
Clodoaldo wrote:
When using unicode the case change works:
>>print u''.lower()
But when using the pt_BR.utf-8 locale it doesn't:
>>locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8')
'pt_BR.utf-8'
>>locale.getlocale()
('pt_BR', 'utf')
>>print ''.lower()
What am I missing? I'm in Fedora Core 5 and Python 2.4.3.
# cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
Regards, Clodoaldo Pinto Neto

str.lower() operates on bytes and therefore doesn't handle encodings with
multibyte characters (like utf-8) properly:
>u"".encode("utf8")
'\xc3\x89'
>u"".encode("latin1")
'\xc9'
>import locale
locale.setlocale(locale.LC_ALL, "de_DE.utf8")
'de_DE.utf8'
>print unicode("\xc3\x89".lower(), "utf8")
>locale.setlocale(locale.LC_ALL, "de_DE.latin1")
'de_DE.latin1'
>print unicode("\xc9".lower(), "latin1")



I recommend that you forget about byte strings and use unicode throughout.
Now I understand it. Thanks.

Regards, Clodoaldo Pinto Neto

May 24 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.