By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,587 Members | 1,734 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,587 IT Pros & Developers. It's quick & easy.

playing with pyGoogle - strange codec error

P: n/a
Hello,

I am playing around with pyGoogle and encountered an error that I have
never seen, and I am unsure how to correct for it. Here is a code
snippet:

for r in data.results:
print 'Title: ',r.title
print 'URL: ',r.URL
print 'Summary: ',r.snippet
print

Everything works fine until I get to r.snippet. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\ua9' in
position 119: ordinal not in range(128)

Any help is appreciated.

Thanks,
Brian
--
Nail a post to the Spalted Board. Free WW'ing software and forums.
Regular freebies! http://www.spaltedboard.com

Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a

"Brian Blazer" <br***@brianandkate.com> wrote in message
news:2005040410062316807%brian@brianandkatecom...
Everything works fine until I get to r.snippet. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\ua9' in
position 119: ordinal not in range(128)


You have a character there (the copyright sign) that isn't in the ASCII
set. If you have anything other than plain ASCII, you need to consider
encoding. This reference might help:

http://diveintopython.org/xml_processing/unicode.html
Jul 18 '05 #2

P: n/a
On 2005-04-05 07:32:12 -0500, "Richard Brodie" <R.******@rl.ac.uk> said:

"Brian Blazer" <br***@brianandkate.com> wrote in message
news:2005040410062316807%brian@brianandkatecom...
Everything works fine until I get to r.snippet. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\ua9' in
position 119: ordinal not in range(128)


You have a character there (the copyright sign) that isn't in the ASCII
set. If you have anything other than plain ASCII, you need to consider
encoding. This reference might help:

http://diveintopython.org/xml_processing/unicode.html


Thank you. That is a great reference.

Brian

--
Nail a post to the Spalted Board. Free WW'ing software and forums.
Regular freebies! http://www.spaltedboard.com

Jul 18 '05 #3

P: n/a
On 2005-04-04 10:06:23 -0500, Brian Blazer <br***@brianandkate.com> said:

<snip>

You know, I am beginning to think that I MAY have stumbled on a bug
here. At first I was thinking that this issue was related to the
offending character being out of range for the Mac. Then I tried it on
A MS machine and a linux box; all with the same error.

This does not happen when I wrote the same script in java. This is
making me wonder if there is an issue with the wrapper for the google
api that was originally done in java.

For the sake of it, here is the full code (minus my google key). It is
going to look wierd, but those print statements are there so that I
dont have to open the file it is writing to every time I want to see
stuff. it has my name hard coded into the search query. The commented
r.snippet.encode(mac_roman) was there to see if by changing the
encoding, I could make it work (no luck). I also tried putting

#-*- coding: utf-8 -*-

right after the shebang (as listed here:
http://www.python.org/peps/pep-0263.html). Again, no help.

Anyway, here is the code ------------------------>

import google

google.LICENSE_KEY = 'insertKeyHere'
#print google.doSpellingSuggestion('helllo')
data = google.doGoogleSearch('Brian Blazer')
print 'Found %d results' % len(data.results)

searchData = open('searchData.txt','w')

for r in data.results:
# r.snippet.encode('mac_roman')
searchData.write ('Title: ' + r.title + '\n' + '\n')
searchData.write ('URL: ' + r.URL + '\n' + '\n')
searchData.write ('Snippet: ' + r.snippet + '\n' + '\n'+'\n')
print r.URL
print r.title
print r.snippet

--
Nail a post to the Spalted Board. Free WW'ing software and forums.
Regular freebies! http://www.spaltedboard.com

Jul 18 '05 #4

P: n/a
Brian Blazer wrote:
You know, I am beginning to think that I MAY have stumbled on a bug
here. At first I was thinking that this issue was related to the
offending character being out of range for the Mac. Then I tried it on
A MS machine and a linux box; all with the same error.
The problem, common to all three, is that you're using a terminal whose
default encoding doesn't specify a valid encoding for the copyright
character (in the first case, the default encoding is 'ascii'; it is
likely the case for the others, as well).

When you print a Unicode string, by default it is encoded to your
default encoding. The problem is this cannot be done faithfully with a
string containing a non-ASCII symbol (like the copyright character which
is actually triggering it for you). So, consequently, the encoding is
failing with an error.

What you probably want here is either to use another encoding, or to
specify what to do in the case that the encoding is not possible.
Either encode to a different encoding (one which you know your terminal
supports even though it is not detected, e.g., 'latin-1'), or specify
what to do with errors in the encoding (e.g., 'ignore', which removes
the offending characters, or 'replace', which replaces them with
question marks):

aUnicodeString.decode('latin-1')
aUnicodeString.decode('ascii', 'replace')
This does not happen when I wrote the same script in java. This is
making me wonder if there is an issue with the wrapper for the google
api that was originally done in java.


Java does not handle Unicode the same way.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
Drifting from woman-who-tries misconstrued / Shifting to woman-wise
-- Lamya
Jul 18 '05 #5

P: n/a
On 2005-04-05 13:55:48 -0500, Erik Max Francis <ma*@alcyone.com> said:
<snip>

Thank you, that worked.

Brian

--
Nail a post to the Spalted Board. Free WW'ing software and forums.
Regular freebies! http://www.spaltedboard.com

Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.