473,378 Members | 1,555 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Unicode drives me crazy...

Hi !

I want to get the WMI infos from Windows machines.
I use Py from HU (iso-8859-2) charset.

Then I wrote some utility for it, because I want to write it to an XML file.

def ToHU(s,NoneStr='-'):
if s==None: s=NoneStr
if not (type(s) in [type(''),type(u'')]):
s=str(s)
if type(s)<>type(u''):
s=unicode(s)
s=s.replace(chr(0),' ');
s=s.encode('iso-8859-2')
return s

This fn is working, but I have been got an error with this value:
'Kommunik\xe1ci\xf3s port (COM1)'

This routine demonstrates the problem

s='Kommunik\xe1ci\xf3s port (COM1)'
print s
print type(s)
print type(u'aaa')
s=unicode(s) # error !

This is makes me mad.
How to I convert every objects to string, and convert (encode) them to
iso-8859-2 (if needed) ?

Please help me !

Thanx for help:
ft


Jul 21 '05 #1
4 2010
fo***********@citromail.hu enlightened us with:
I want to get the WMI infos from Windows machines.
I use Py from HU (iso-8859-2) charset.


Why not use Unicode for everything?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Jul 21 '05 #2


fo***********@citromail.hu wrote:
Hi !

I want to get the WMI infos from Windows machines.
I use Py from HU (iso-8859-2) charset.

Then I wrote some utility for it, because I want to write it to an XML file.

def ToHU(s,NoneStr='-'):
if s==None: s=NoneStr
if not (type(s) in [type(''),type(u'')]):
s=str(s)
if type(s)<>type(u''):
s=unicode(s)
s=s.replace(chr(0),' ');
s=s.encode('iso-8859-2')
return s

This fn is working, but I have been got an error with this value:
'Kommunik\xe1ci\xf3s port (COM1)'

This routine demonstrates the problem

s='Kommunik\xe1ci\xf3s port (COM1)'
print s
print type(s)
print type(u'aaa')
s=unicode(s) # error !

This is makes me mad.
How to I convert every objects to string, and convert (encode) them to
iso-8859-2 (if needed) ?

s is a 'byte string' - a series of characters encoded in bytes. (As is
every string on some level). In order to convert that to a unicdoe
object, Python needs to know what encoding is used. In other words it
needs to know what character each byte represents.

See this :

t = s.decode('iso-8859-1')
t
u'Kommunik\xe1ci\xf3s port (COM1)'
print t
Kommunikációs port (COM1)
print type(s)
<type 'str'>
print type(t)
<type 'unicode'>

The decode instruction converts s into a unicode string - where Python
knows what every character is. If you call unicdoe with no encoding
specified, Python reverts to the system default - which is *probably*
'ascii'. You string contains characters which have *no meaning* in the
ascii codec - so it reports an error....

Does this help ?

Once you 'get unicode', Python support for it is pretty easy. It's a
slightly complicated subject though. Basically you need to *know* what
encoding is being used, and whenever you convert between unicode and
byte-strings you need to specify it.

What can complicate matters is that there are lot's of times an
*implicit* conversion can take place. Adding strings to unicode
objects, printing strings, or writing them to a file are the usual
times implicit conversion can happen. If you haven't specified an
encoding, then Python has to use the system default or the file object
default (sys.stdout often has a different default encoding than the one
returned by sys.getdefaultencoding()). It is these implicit conversions
that often cause the 'UnicodeDecodeError's and 'UnicodeEncodeError's.

HTH

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python
Please help me !

Thanx for help:
ft


Jul 21 '05 #3
At some point you have to convert - esp. when writing data out to file.
If you receive data as a byte string and have to store it as a byte
string, it is sometimes convenient to *not* convert in the middle.

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python

Jul 21 '05 #4
<fo***********@citromail.hu> wrote in message
news:ma***************************************@pyt hon.org...
Hi !

I want to get the WMI infos from Windows machines.
I use Py from HU (iso-8859-2) charset.

Then I wrote some utility for it, because I want to write it to an XML
file.

def ToHU(s,NoneStr='-'):
if s==None: s=NoneStr
if not (type(s) in [type(''),type(u'')]):
s=str(s)
if type(s)<>type(u''):
s=unicode(s)
s=s.replace(chr(0),' ');
s=s.encode('iso-8859-2')
return s

This fn is working, but I have been got an error with this value:
'Kommunik\xe1ci\xf3s port (COM1)'

This routine demonstrates the problem

s='Kommunik\xe1ci\xf3s port (COM1)'
print s
print type(s)
print type(u'aaa')
s=unicode(s) # error !

This is makes me mad.
How to I convert every objects to string, and convert (encode) them to
iso-8859-2 (if needed) ?

Please help me !
As Tim Golden already explained, you're getting a unicode
object from the WMI interface. The best design help I can
give is to either convert it to iso-8859-2 at the point you
get the object and do your entire program with iso-8859-2
encoded strings, or do your entire program with unicode
objects and encode them as iso-8859-2 strings whenever
you want to write them out. Trying to do your conversion
in the middle will lead to excessive complexity, with the
resulting debugging problems.

If you do go the unicode route, you must remember that
any method or function that's defined to return a string will
most likely throw an exception. This includes str()! Whether
or not the print statement will work depends on a number
of factors in how your Python installation was set up.

HTH

John Roth

Thanx for help:
ft


Jul 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
14
by: Thomas Heller | last post by:
I was trying to track down a bug in py2exe where the executable did not work when it is in a directory containing japanese characters. Then, I discovered that part of the problem is in the...
11
by: doltharz | last post by:
Please Help me i'm doing something i though was to be REALLY EASY but it drives me crazy The complete code is at the end of the email (i mean newsgroup article), i always use Option...
22
by: Keith MacDonald | last post by:
Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on...
1
by: Klaubator | last post by:
Hi, A simple task is driving me crazy, just can figure out how to programatically write unicode characters to a SVG (XML) document. With an editor it is easy to write Unicode characters like...
0
by: Pavel aka crazy | last post by:
hi, all! how can i use unicode in postgresql tables? "createdb -E UNICODE dbname" work correctly, but when i put data and then read it i see only garbage instead cyrillic letters. what i do wrong?...
1
by: markww | last post by:
Hi, I'm just trying to print the contents of a notepad file I saved with unicode encoding on my win xp machine. I keep getting strange characters printed out though. Why doesn't this work? ...
1
by: HOWARD MYERS | last post by:
I am developing an Access application which creates text based XML instance files and saves these on the hard drives. How do I programatically save them as Unicode text files and not as the default...
24
by: Donn Ingle | last post by:
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.