473,473 Members | 2,170 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

[2.5.1] "UnicodeDecodeError: 'ascii' codec can't decode byte"?

Hello

I'm getting this error while downloading and parsing web pages:

=====
title = m.group(1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
48: ordinal not in range(128)
=====

From what I understand, it's because some strings are Unicode, and
hence contain characters that are illegal in ASCII.

Does someone know how to solve this error?

Thank you.
Oct 29 '08 #1
3 14617
Gilles Ganault wrote:
I'm getting this error while downloading and parsing web pages:

=====
title = m.group(1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
48: ordinal not in range(128)
=====

From what I understand, it's because some strings are Unicode, and
hence contain characters that are illegal in ASCII.
You just need to use a codec according to the encoding of the webpage. Take
a look at
http://wiki.python.org/moin/Python3UnicodeDecodeError
It is about Python 3, but the principles apply nonetheless. In any case,
throwing the error at a websearch will turn up lots of solutions.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

Oct 29 '08 #2
Ulrich Eckhardt wrote:
Gilles Ganault wrote:
>I'm getting this error while downloading and parsing web pages:

=====
title = m.group(1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
48: ordinal not in range(128)
=====

From what I understand, it's because some strings are Unicode, and
hence contain characters that are illegal in ASCII.

You just need to use a codec according to the encoding of the webpage. Take
a look at
http://wiki.python.org/moin/Python3UnicodeDecodeError
It is about Python 3, but the principles apply nonetheless. In any case,
throwing the error at a websearch will turn up lots of solutions.
I won't believe that statement is producing the error until I see a
traceback. As far as I'm aware the re module can handle Unicode. Getting
a UnicodeDecodeError in an assignment would be unusual to say the least.
Though it's not, I suppose, impossible that calling the .group() method
of a match object might, it seems unlikely.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Oct 29 '08 #3
Ulrich Eckhardt wrote:
Gilles Ganault wrote:
>I'm getting this error while downloading and parsing web pages:

=====
title = m.group(1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
48: ordinal not in range(128)
=====

From what I understand, it's because some strings are Unicode, and
hence contain characters that are illegal in ASCII.

You just need to use a codec according to the encoding of the webpage. Take
a look at
http://wiki.python.org/moin/Python3UnicodeDecodeError
It is about Python 3, but the principles apply nonetheless. In any case,
throwing the error at a websearch will turn up lots of solutions.
I won't believe that statement is producing the error until I see a
traceback. As far as I'm aware the re module can handle Unicode. Getting
a UnicodeDecodeError in an assignment would be unusual to say the least.
Though it's not, I suppose, impossible that calling the .group() method
of a match object might, it seems unlikely.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Oct 29 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: - Steve - | last post by:
I've been given a school assignment and while everything else is easy there's one topic I'm completley lost on. I've been given an ASCII file that looks like this. During start-up, the program...
22
by: bq | last post by:
Hello, Two questions related to floating point support: What C compilers for the wintel (MS Windows + x86) platform are C99 compliant as far as <math.h> and <tgmath.h> are concerned? What...
13
by: baumann.Pan | last post by:
when define char *p = " can not modify"; p ='b' ;is not allowed, but if you declare p as char p = "can modify"; p = 'b'; is ok? why?
6
by: Kai Bøhli | last post by:
Hi all ! I've got a lot of feedback from (the always helpful) Jon Skeet on this subject before. Dispite this I'm still not there - due to my own lack of knowledge of course. Anyway, I'm...
5
by: _BNC | last post by:
I've converted " byte" to "byte *" at times, using 'unsafe' and fixed { .... }, but the reverse does not seem to work. In this case, a C++ DLL returns a byte * and a length. What is the best...
2
by: Chris Wood | last post by:
In C#, I am calling a method implemented in Managed C++ that returns an array of booleans. This method in turn calls unto unmanaged C++ code that returns an unsigned byte array, which is...
3
by: mr | last post by:
How can i 'force' c++ to interprete "blabla" strings as unicode string instead of ascii string (i just don't want to add 'L' before the thousands strings that are on my projects...), as all my...
5
by: Achim Domma | last post by:
Hi, I have to convert a string to its "best possible" ascii representation. It's clear to me that this is not possible or sense full for all unicode characters. But for most European characters...
8
by: jeffpierce12 | last post by:
Hello, I am trying to send some characters to a scanner that I have hooked up to the COM 1 port on my PC. I am running Linux operating system, and I have the following sample program: ...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.