By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,171 Members | 776 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,171 IT Pros & Developers. It's quick & easy.

A question about unicode() function

P: n/a
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

Dec 31 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On 31 Dec 2006 05:20:10 -0800, JTree <ea*****@gmail.comwrote:
def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()
This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url):
return urllib.urlopen(url).read()

or

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content,"gbk")
This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType found
None's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Dec 31 '06 #2

P: n/a
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = content.encode('gb2312','ignore')
print content
content.close()

I used "ignore" to deal with the data lose, but it still caused a
error:

C:\WINDOWS\system32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py", line 17, in ?
content = content.encode('gb2312','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?

Thanks!
On Dec 31 2006, 9:30 pm, "Felipe Almeida Lessa"
<felipe.le...@gmail.comwrote:
On 31 Dec 2006 05:20:10 -0800, JTree <east...@gmail.comwrote:
def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url):
return urllib.urlopen(url).read()

or

funUrlFetch = lambda url:urllib.urlopen(url).read()
objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content,"gbk")This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType foundNone's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Jan 1 '07 #3

P: n/a
"JTree" <ea*****@gmail.comwrote:
>
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
Once you fix the lambda, as Felipe described, there's another issue here.
You are telling the unicode function that the string you're passing it is
an 8-bit string encoded as gbk. How do you know that? In your specific
example, www.msn.com, I can guarantee it will produce the wrong results:
www.msn.com is encoded in UTF-8.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jan 1 '07 #4

P: n/a
JTree wrote:
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = content.encode('gb2312','ignore')
Why did you change what you had before? "content" is a str, encoded in
gb2312 (according to the internal evidence). You are now pretending
that it is unicode, and trying to encode it as gb2312. However because
it is *not* unicode, Python tries to convert it to unicode first. What
you have coded above is equivalent to:
content = content.decode('ascii').encode('gb2312', 'ignore')

and of course the *decode* fails, as the error message says:
Unicode*Decode*Error: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)

It never got any where near the encode()

So:
If you want a str encoded in gb2312, leave it alone.
If you want it in unicode, do this:
ucontent = unicode(content, 'gb2312')
print content
Try print repr(content)
It's much better for diagnostic purposes.

content.close()
This will be your next problem; "content" refers to a str object or a
unicode object -- they don't have a close() method !!
>
I used "ignore" to deal with the data lose, but it still caused a
error:
What data loss???
>
C:\WINDOWS\system32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py", line 17, in ?
content = content.encode('gb2312','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?
"asian" is irrelevant. You would have got the same problem with just
about any non-ascii encoding, including cp1252 and similar encodings
commonly used in English-speaking countries and in western Europe. The
only encoding support problem with 2.4 is that it can't read your mind.
By the way, you should upgrade to 2.5, it can't read your mind either,
but it has more functionality etc :-)

HTH,
John

Jan 1 '07 #5

P: n/a
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
Jan 1 '07 #6

P: n/a
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
>Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
So... How about posting the brief working code?
Jan 2 '07 #7

P: n/a
hi,
I just removed the unicode() method from my codes.
As John Machin said, I had an wrong understanding of unicode and ascii.

Paul Watson wrote:
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

So... How about posting the brief working code?
Jan 3 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.