Hi!
I have a set of strings (all letters are capitalized) at utf-8,
russian language. I need to lower it, but
my_string.lower(). Doesn't work.
See sample script:
# -*- coding: utf-8 -*-
[skip]
s1 = self.title
s2 = self.title.lower()
print s1 == s2
returns true.
I have no problems with lower() for english letters:, or with
something like this:
u'russian_letters_here'.lower(), but I don't need constants, I need to
modify variables, but there is no any changs, when I apply lower()
function to mine strings. 4 7797
Alexey Moskvin schrieb:
Hi!
I have a set of strings (all letters are capitalized) at utf-8,
russian language. I need to lower it, but
my_string.lower(). Doesn't work.
See sample script:
# -*- coding: utf-8 -*-
[skip]
s1 = self.title
s2 = self.title.lower()
print s1 == s2
returns true.
I have no problems with lower() for english letters:, or with
something like this:
u'russian_letters_here'.lower(), but I don't need constants, I need to
modify variables, but there is no any changs, when I apply lower()
function to mine strings.
Can you give a concrete example? I doubt that there is anything
different between lowering a unicode object given as literal or acquired
somewhere else. And because my russian skills equal my chinese - total
of zero - I can't create a test myself :)
I have a set of strings (all letters are capitalized) at utf-8,
That's the problem. If these are really utf-8 encoded byte strings,
then .lower likely won't work. It uses the C library's tolower API,
which works on a byte level, i.e. can't work for multi-byte encodings.
What you need to do is to operate on Unicode strings. I.e. instead
of
s.lower()
do
s.decode("utf-8").lower()
or (if you need byte strings back)
s.decode("utf-8").lower().encode("utf-8")
If you find that you write the latter, I recommend that you redesign
your application. Don't use byte strings to represent text, but use
Unicode strings all the time, except at the system boundary (where
you decode/encode as appropriate).
There are some limitations with Unicode .lower also, but I don't
think they apply to Russian (specifically, SpecialCasing.txt is
not considered).
HTH,
Martin
Martin, thanks for fast reply, now anything is ok!
On Oct 6, 1:30 am, "Martin v. Löwis" <mar...@v.loewis.dewrote:
I have a set of strings (all letters are capitalized) at utf-8,
That's the problem. If these are really utf-8 encoded byte strings,
then .lower likely won't work. It uses the C library's tolower API,
which works on a byte level, i.e. can't work for multi-byte encodings.
What you need to do is to operate on Unicode strings. I.e. instead
of
s.lower()
do
s.decode("utf-8").lower()
or (if you need byte strings back)
s.decode("utf-8").lower().encode("utf-8")
If you find that you write the latter, I recommend that you redesign
your application. Don't use byte strings to represent text, but use
Unicode strings all the time, except at the system boundary (where
you decode/encode as appropriate).
There are some limitations with Unicode .lower also, but I don't
think they apply to Russian (specifically, SpecialCasing.txt is
not considered).
HTH,
Martin
On Oct 6, 8:39 am, Alexey Moskvin <d...@inbox.ruwrote:
Martin, thanks for fast reply, now anything is ok!
On Oct 6, 1:30 am, "Martin v. Löwis" <mar...@v.loewis.dewrote:
I have a set of strings (all letters are capitalized) at utf-8,
That's the problem. If these are really utf-8 encoded byte strings,
then .lower likely won't work. It uses the C library's tolower API,
which works on a byte level, i.e. can't work for multi-byte encodings.
What you need to do is to operate on Unicode strings. I.e. instead
of
s.lower()
do
s.decode("utf-8").lower()
or (if you need byte strings back)
s.decode("utf-8").lower().encode("utf-8")
If you find that you write the latter, I recommend that you redesign
your application. Don't use byte strings to represent text, but use
Unicode strings all the time, except at the system boundary (where
you decode/encode as appropriate).
There are some limitations with Unicode .lower also, but I don't
think they apply to Russian (specifically, SpecialCasing.txt is
not considered).
HTH,
Martin
Alexey,
if your strings stored in some text file you can use "codecs" package
import codecs
handler = codecs.open('somefile', 'r', 'utf-8')
# ... do the job
handler.close()
I prefer this way to deal with russian in utf-8.
Konstantin. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Jonathon Blake |
last post by:
All:
Question
Python is currently Unicode Compliant.
What happens when strings are read in from text files that were
created using GB 2312-1980, or KPS 9566-2003, or other, equally...
|
by: Andrew L |
last post by:
Hello all,
What strategy should I use in solving the following problem? I have a list
of unicode strings which I would like to compare with its English language
'equivalent.' eg
"reykjavík"...
|
by: Neil Schemenauer |
last post by:
python-dev@python.org.]
The PEP has been rewritten based on a suggestion by Guido to change
str() rather than adding a new built-in function. Based on my
testing, I believe the idea is...
|
by: Jamie |
last post by:
I have a file that was written using Java and the file has unicode
strings. What is the best way to deal with these in C? The file
definition reads:
Data Field Description
CHAR File...
|
by: srikant |
last post by:
I am writing a client in C# that needs to communicate over the network to a legacy C++ application that uses Unicode strings. I realize that C# strings are already in Unicode, however, how do I...
|
by: Alexander S. |
last post by:
There is bug in 7.4.2, concerning unicode and russian letters. For db
in unicode russian data doesn`t order in alphabetical order (rows group
with the same first letter but not in alphabetical...
|
by: Ron Garret |
last post by:
>>> u'\xbd'
u'\xbd'
>>> print _
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in...
|
by: Dennis Benzinger |
last post by:
Hi!
The following program in an UTF-8 encoded file:
# -*- coding: UTF-8 -*-
FIELDS = ("Fächer", )
FROZEN_FIELDS = frozenset(FIELDS)
FIELDS_SET = set(FIELDS)
|
by: erikcw |
last post by:
Hi,
I'm parsing xml data with xml.sax and I need to perform some
arithmetic on some of the xml attributes. The problem is they are all
being "extracted" as unicode strings, so whenever I try to...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |