472,333 Members | 2,590 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,333 software developers and data experts.

A question about unicode() function

Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

Dec 31 '06 #1
7 2385
On 31 Dec 2006 05:20:10 -0800, JTree <ea*****@gmail.comwrote:
def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()
This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url):
return urllib.urlopen(url).read()

or

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content,"gbk")
This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType found
None's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Dec 31 '06 #2
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = content.encode('gb2312','ignore')
print content
content.close()

I used "ignore" to deal with the data lose, but it still caused a
error:

C:\WINDOWS\system32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py", line 17, in ?
content = content.encode('gb2312','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?

Thanks!
On Dec 31 2006, 9:30 pm, "Felipe Almeida Lessa"
<felipe.le...@gmail.comwrote:
On 31 Dec 2006 05:20:10 -0800, JTree <east...@gmail.comwrote:
def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url):
return urllib.urlopen(url).read()

or

funUrlFetch = lambda url:urllib.urlopen(url).read()
objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content,"gbk")This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType foundNone's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Jan 1 '07 #3
"JTree" <ea*****@gmail.comwrote:
>
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
Once you fix the lambda, as Felipe described, there's another issue here.
You are telling the unicode function that the string you're passing it is
an 8-bit string encoded as gbk. How do you know that? In your specific
example, www.msn.com, I can guarantee it will produce the wrong results:
www.msn.com is encoded in UTF-8.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jan 1 '07 #4
JTree wrote:
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = content.encode('gb2312','ignore')
Why did you change what you had before? "content" is a str, encoded in
gb2312 (according to the internal evidence). You are now pretending
that it is unicode, and trying to encode it as gb2312. However because
it is *not* unicode, Python tries to convert it to unicode first. What
you have coded above is equivalent to:
content = content.decode('ascii').encode('gb2312', 'ignore')

and of course the *decode* fails, as the error message says:
Unicode*Decode*Error: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)

It never got any where near the encode()

So:
If you want a str encoded in gb2312, leave it alone.
If you want it in unicode, do this:
ucontent = unicode(content, 'gb2312')
print content
Try print repr(content)
It's much better for diagnostic purposes.

content.close()
This will be your next problem; "content" refers to a str object or a
unicode object -- they don't have a close() method !!
>
I used "ignore" to deal with the data lose, but it still caused a
error:
What data loss???
>
C:\WINDOWS\system32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py", line 17, in ?
content = content.encode('gb2312','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?
"asian" is irrelevant. You would have got the same problem with just
about any non-ascii encoding, including cp1252 and similar encodings
commonly used in English-speaking countries and in western Europe. The
only encoding support problem with 2.4 is that it can't read your mind.
By the way, you should upgrade to 2.5, it can't read your mind either,
but it has more functionality etc :-)

HTH,
John

Jan 1 '07 #5
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
Jan 1 '07 #6
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
>Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
So... How about posting the brief working code?
Jan 2 '07 #7
hi,
I just removed the unicode() method from my codes.
As John Machin said, I had an wrong understanding of unicode and ascii.

Paul Watson wrote:
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url):
lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = unicode(content,"gbk")
print content
content.close()
error message:

C:\WINDOWS\system32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

So... How about posting the brief working code?
Jan 3 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Ivan Voras | last post by:
When concatenating strings (actually, a constant and a string...) i get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. ...
1
by: anantvrana | last post by:
Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets...
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8...
3
by: DurumDara | last post by:
Hi ! I need to speedup my MD5/SHA1 calculator app that working on filesystem's files. I use the Python standard modules, but I think that it can...
0
by: santhescript01 | last post by:
Unicode to non unicode conversion problem -------------------------------------------------------------------------------- Hi All, I am...
2
by: tristanlbailey | last post by:
I been scouring the Internet for an answer to my problem, and a couple of times thought I had almost found the answer, but still to no avail. I'm...
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the...
0
by: amollokhande1 | last post by:
Hi All, Currently we are facing an issue while decoding the Base64Encoded unicode data. Here is the scenario We have one custom javascript...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.