473,765 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

usage of <string>.encode ('utf-8','xmlcharrefr eplace')?

Well, as usual I am confused by unicode encoding errors.

I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I am trying to fix things with
<string>.enco de
>>s = 'he Company\xef\xbf \xbds ticker'
print s
he Company�s ticker
>>>
Trying for an encode:
>>print s.encode('utf-8')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

OK, that's pretty much as expected, I know this is not valid utf-8.
But I should be able to fix this with the errors parameter of the
encode method.
>>error_repla ce = 'xmlcharrefrepl ace'
>>print s.encode('utf-8',error_replac e)
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

Same exact error I got without the errors parameter.

Did I mistype the error handler name? Nope.
>>codecs.lookup _error(error_re place)
<built-in function xmlcharrefrepla ce_errors>

Same results with 'ignore' as an error handler.
>>print s.encode('utf-8','ignore')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

And with a bogus error handler:

print s.encode('utf-8','bogus')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

This all looks unusually complicated for Python.
Am I missing something incredibly obvious?
How does one use the errors parameter on strings' encode method?

Also, why are the exceptions above complaining about the 'ascii' codec
if I am asking for 'utf-8' conversion?

Version and environment below. Should I try to update my python from
somewhere?

./$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2

Cheers
Feb 19 '08 #1
4 8251
On Mon, 18 Feb 2008 21:36:17 -0800 (PST), J Peyret wrote
Well, as usual I am confused by unicode encoding errors.

I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I am trying to fix things with
<string>.enco de
>s = 'he Company\xef\xbf \xbds ticker'
print s
he [UTF-8?]Company�s ticker
>>

Trying for an encode:
>print s.encode('utf-8')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)

OK, that's pretty much as expected, I know this is not valid utf-8.
Actually, the string *is* valid UTF-8, but you're confused about encoding and
decoding. Encoding is the process of turning a Unicode object into a byte
string. Decoding is the process of turning a byte string into a Unicode object.

You need to decode your byte string into a Unicode object, and then encode the
result to a byte string in a different encoding. For example:
>>s = 'he Company\xef\xbf \xbds ticker'
s.decode("u tf-8").encode("asc ii", "xmlcharrefrepl ace")
'he Company�s ticker'

By the way, whether this is the correct fix for your PostgreSQL error is not
clear, since you kept that error message a secret for some reason. There could
be a better solution than transcoding the string in this way, but we won't
know until you show us the actual error you're trying to fix. At the moment,
it's like showing you the best way to inflate a tire with a hammer.

Hope this helps,

--
Carsten Haese
http://informixdb.sourceforge.net

Feb 19 '08 #2
To clarify a couple of points:

On Feb 18, 11:38*pm, 7stud <bbxx789_0...@y ahoo.comwrote:
>*A unicode string looks like this:

s = u'\u0041'

but your string looks like this:

s = 'he Company\xef\xbf \xbds ticker'

Note that there is no 'u' in front of your string. *
That means your string is a regular string.

If a python function requires a unicode string and a unicode string
isn't provided..
For example: encode().
One last point: you can't display a unicode string. The very act of
trying to print a unicode string causes it to be converted to a
regular string. If you try to display a unicode string without
explicitly encode()'ing it first, i.e. converting it to a regular
string using a specified secret code--a so called 'codec', python will
implicitly attempt to convert the unicode string to a regular string
using the default codec, which is usually set to ascii.

Feb 19 '08 #3
On Feb 18, 10:54 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
One last point: you can't display a unicode string. The very act of
trying to print a unicode string causes it to be converted to a
regular string. If you try to display a unicode string without
explicitly encode()'ing it first, i.e. converting it to a regular
string using a specified secret code--a so called 'codec', python will
implicitly attempt to convert the unicode string to a regular string
using the default codec, which is usually set to ascii.
Yes, the string above was obtained by printing, which got it into
ASCII format, as you picked up.
Something else to watch out for when posting unicode issues.

The solution I ended up with was

1) Find out the encoding in the data file.

In Ubuntu's gedit editor, menu 'Save As...' displays the encoding at
the bottom of the save prompt dialog.

ISO-8859-15 in my case.

2) Look up encoding corresponding to ISO-8859-15 at

http://docs.python.org/lib/standard-encodings.html

3) Applying the decode/encode recipe suggested previously, for which I
do understand the reason now.

#converting rawdescr
#from ISO-8859-15 (from the file)
#to UTF-8 (what postgresql wants)
#no error handler required.
decodeddescr = rawdescr.decode ('iso8859_15'). encode('utf-8')

postgresql insert is done using decodeddescr variable.

Postgresql is happy, I'm happy.
Feb 19 '08 #4
On Feb 19, 12:15*am, J Peyret <jpey...@gmail. comwrote:
On Feb 18, 10:54 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
One last point: you can't display a unicode string. *The very act of
trying to print a unicode string causes it to be converted to a
regular string. *If you try to display a unicode string without
explicitly encode()'ing it first, i.e. converting it to a regular
string using a specified secret code--a so called 'codec', python will
implicitly attempt to convert the unicode string to a regular string
using the default codec, which is usually set to ascii.

Yes, the string above was obtained by printing, which got it into
ASCII format, as you picked up.
Something else to watch out for when posting unicode issues.

The solution I ended up with was

1) Find out the encoding in the data file.

In Ubuntu's gedit editor, menu 'Save As...' displays the encoding at
the bottom of the save prompt dialog.

ISO-8859-15 in my case.

2) Look up encoding corresponding to ISO-8859-15 at

http://docs.python.org/lib/standard-encodings.html

3) Applying the decode/encode recipe suggested previously, for which I
do understand the reason now.

#converting rawdescr
#from ISO-8859-15 (from the file)
#to UTF-8 (what postgresql wants)
#no error handler required.
decodeddescr = rawdescr.decode ('iso8859_15'). encode('utf-8')

postgresql insert is done using decodeddescr variable.

Postgresql is happy, I'm happy.
Or, you can cheat. If you are reading from a file, you can make set
it up so any string that you read from the file automatically gets
converted from its encoding to another encoding. You don't even have
to be aware of the fact that a regular string has to be converted into
a unicode string before it can be converted to a regular string with a
different encoding. Check out the codecs module and the EncodedFile()
function:

import codecs

s = 'he Company\xef\xbf \xbds ticker'

f = open('data2.txt ', 'w')
f.write(s)
f.close()

f = open('data2.txt ')
f_special = codecs.EncodedF ile(f, 'utf-8', 'iso8859_15') #file, new
encoding, file's encoding
print f_special.read( ) #If your display device understands utf-8, you
will see the troublesome character displayed.
#Are you sure that character is legitimate?

f.close()
f_special.close ()


Feb 19 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6879
by: Mark McKay | last post by:
I have a thread which is used for updating a display window. The normal paint message queue is being bypassed in favor of drawing on demand by this thread. This thread is passed a Graphics context, draws it's image, and then sleeps for a small number of milliseconds. I'm on windows 2000 and watching the CPU usage graph when I run my program. For the most part, CPU usage corelates to the length of the sleep delay in my thread. The...
8
3674
by: rbt | last post by:
Would a Python process consume more memory on a PC with lots of memory? For example, say I have the same Python script running on two WinXP computers that both have Python 2.4.0. One computer has 256 MB of Ram while the other has 2 GB of Ram. On the machine with less Ram, the process takes about 1 MB of Ram. On the machine with more Ram, it uses 9 MB of Ram. Is this normal and expected behavior?
2
460
by: tomvr | last post by:
Hello I have noticed some 'weird' memory usage in a vb.net windows app The situation is as follows I have an app (heavy on images) with 2 forms (actually there are more forms and on starting the app I load some things into memory for global use of the app but I'll use only 2 starting forms to explain the situation) situation 1 start app with form 1 (72mb memory usage), show form 2 and hide form 1 (89 mb memory usage
2
422
by: Jarvis | last post by:
I've made a testing program to test the memory usage of some Data Forms. I create a MDI parent form with one single MDI child form, which is a Data Form generated by .NET Data Form Wizard. To test the stuff, I keep to open that child data form for about 10 times. the memory usage shown in GC and task manager both increase. Then I close all those forms. and perform GC collect. The memory usage shown in GC falls, however, the memory...
3
4148
by: Ian Taite | last post by:
Hello, I'm exploring why one of my C# .NET apps has "high" memory usage, and whether I can reduce the memory usage. I have an app that wakes up and processes text files into a database periodically. What happens, is that the app reads the contents of a text file line by line into an ArrayList. Each element of the ArrayList is a string representing a record from the file. The ArrayList is then processed, and the arraylist goes out of...
20
4239
by: Philip Carnstam | last post by:
How come .Net applications use so much memory? Every application I compile uses at least 10 MB of memory, even the ones consisting of only a form and nothing else. If I minimize them though the memory usage drops to a couple hundred KB. Why? Is there anything I should to to prevent this? I have compiled in release and deactivated all forms of debugging, I think! Thanks, Philip
10
2099
by: rdemyan via AccessMonster.com | last post by:
My app contains utility meter usage. One of the things we have to deal with is when a usage is clearly incorrect. Perhaps someone wrote the meter reading down incorrectly or made a factor of 10 error when entering the reading, etc. At other times the usage is zero or somehow was entered as a negative number. So I'm thinking about adding functionality to search for such anomalies. For instance, show months where the meter reading is...
3
9403
by: Sirisha | last post by:
I am using the following code to get the CPU usage PerformanceCounter myCounter; myCounter = new PerformanceCounter(); myCounter.CategoryName = "Processor"; myCounter.CounterName = "% Processor Time"; myCounter.InstanceName = "_Total"; for(int i=0; i < 20; i++)
2
2468
by: jld | last post by:
Hi, I developed an asp.net based eCommerce Website for a client and it is hosted at discount asp. The site is quite interactive, queries a database a lot and uses ajax.asp.net to spice up interactivity. The service suffers from a lot of restarts since discountasp enforces a 100mb per worker thread limit and when you top it, the service gets restarted. When there is a lot of traffic on the site, this happens
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9951
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9832
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7378
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5421
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3531
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2805
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.