473,782 Members | 2,436 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

A question about unicode() function

Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = unicode(content ,"gbk")
print content
content.close()
error message:

C:\WINDOWS\syst em32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content ,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

Dec 31 '06 #1
7 2509
On 31 Dec 2006 05:20:10 -0800, JTree <ea*****@gmail. comwrote:
def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()
This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url ):
return urllib.urlopen( url).read()

or

funUrlFetch = lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content ,"gbk")
This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType found
None's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Dec 31 '06 #2
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = content.encode( 'gb2312','ignor e')
print content
content.close()

I used "ignore" to deal with the data lose, but it still caused a
error:

C:\WINDOWS\syst em32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py" , line 17, in ?
content = content.encode( 'gb2312','ignor e')
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?

Thanks!
On Dec 31 2006, 9:30 pm, "Felipe Almeida Lessa"
<felipe.le...@g mail.comwrote:
On 31 Dec 2006 05:20:10 -0800, JTree <east...@gmail. comwrote:
def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read() This function only creates a lambda function (that is not used or
assigned anywhere), nothing more, nothing less. Thus, it returns None
(sort of "void") no matter what is its argument. Probably you meant
something like

def funUrlFetch(url ):
return urllib.urlopen( url).read()

or

funUrlFetch = lambda url:urllib.urlo pen(url).read()
objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)content gets assigned None. Try putting "print content" before the unicode line.
content = unicode(content ,"gbk")This, equivalent to unicode(None, "gbk"), leads to
TypeError: coercing to Unicode: need string or buffer, NoneType foundNone's are not strings nor buffers, so unicode() complains.

See ya,

--
Felipe.
Jan 1 '07 #3
"JTree" <ea*****@gmail. comwrote:
>
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = unicode(content ,"gbk")
print content
content.close( )
Once you fix the lambda, as Felipe described, there's another issue here.
You are telling the unicode function that the string you're passing it is
an 8-bit string encoded as gbk. How do you know that? In your specific
example, www.msn.com, I can guarantee it will produce the wrong results:
www.msn.com is encoded in UTF-8.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jan 1 '07 #4
JTree wrote:
Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch = lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = content.encode( 'gb2312','ignor e')
Why did you change what you had before? "content" is a str, encoded in
gb2312 (according to the internal evidence). You are now pretending
that it is unicode, and trying to encode it as gb2312. However because
it is *not* unicode, Python tries to convert it to unicode first. What
you have coded above is equivalent to:
content = content.decode( 'ascii').encode ('gb2312', 'ignore')

and of course the *decode* fails, as the error message says:
Unicode*Decode* Error: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)

It never got any where near the encode()

So:
If you want a str encoded in gb2312, leave it alone.
If you want it in unicode, do this:
ucontent = unicode(content , 'gb2312')
print content
Try print repr(content)
It's much better for diagnostic purposes.

content.close()
This will be your next problem; "content" refers to a str object or a
unicode object -- they don't have a close() method !!
>
I used "ignore" to deal with the data lose, but it still caused a
error:
What data loss???
>
C:\WINDOWS\syst em32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
File "tianya.py" , line 17, in ?
content = content.encode( 'gb2312','ignor e')
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?
"asian" is irrelevant. You would have got the same problem with just
about any non-ascii encoding, including cp1252 and similar encodings
commonly used in English-speaking countries and in western Europe. The
only encoding support problem with 2.4 is that it can't read your mind.
By the way, you should upgrade to 2.5, it can't read your mind either,
but it has more functionality etc :-)

HTH,
John

Jan 1 '07 #5
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = unicode(content ,"gbk")
print content
content.close()
error message:

C:\WINDOWS\syst em32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content ,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
Jan 1 '07 #6
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
>Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = unicode(content ,"gbk")
print content
content.close( )
error message:

C:\WINDOWS\sys tem32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content ,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!
So... How about posting the brief working code?
Jan 2 '07 #7
hi,
I just removed the unicode() method from my codes.
As John Machin said, I had an wrong understanding of unicode and ascii.

Paul Watson wrote:
JTree wrote:
Thanks everyone!

Sorry for my ambiguous question.
I changed the codes and now it works fine.

JTree wrote:
Hi,all
I encountered a problem when using unicode() function to fetch a
webpage, I don't know why this happenned.
My codes and error messages are:
Code:
#!/usr/bin/python
#Filename: test.py
#Modified: 2006-12-31

import cPickle as p
import urllib
import htmllib
import re
import sys

def funUrlFetch(url ):
lambda url:urllib.urlo pen(url).read()

objUrl = raw_input('Ente r the Url:')
content = funUrlFetch(obj Url)
content = unicode(content ,"gbk")
print content
content.close()
error message:

C:\WINDOWS\syst em32\cmd.exe /c python test.py
Enter the Url:http://www.msn.com
Traceback (most recent call last):
File "test.py", line 16, in ?
content = unicode(content ,"gbk")
TypeError: coercing to Unicode: need string or buffer, NoneType found
shell returned 1
Hit any key to close this window...

Any suggestions would be appreciated!

Thanks!

So... How about posting the brief working code?
Jan 3 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2558
by: Ivan Voras | last post by:
When concatenating strings (actually, a constant and a string...) i get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1: ordinal not in range(128) Now I don't think either string is unicode, but I'm working with win32api so it might be... :) The point is: I know all values will fit in a particular code page (iso-8859-2), so how do I change the 'ascii' codec in the above error into something...
2
2632
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. PEP: 349
1
17540
by: anantvrana | last post by:
Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open File1 For Input Access Read As #1 While Not EOF(1)
24
9070
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special charcaters from an ASCII perspective. I get the following error: > SQLiteCur.execute(sql, row)
3
2705
by: DurumDara | last post by:
Hi ! I need to speedup my MD5/SHA1 calculator app that working on filesystem's files. I use the Python standard modules, but I think that it can be faster if I use C, or other module for it. I use FSUM before, but I got problems, because I "move" into "DOS area", and the parameterizing of outer process maked me very angry (not working). You will see this in this place:
0
1599
by: santhescript01 | last post by:
Unicode to non unicode conversion problem -------------------------------------------------------------------------------- Hi All, I am using C dll in macro which converts Unicode data to 8 bit encoding data ' Prototype of C function. ' extern "C" int _stdcall Uni2Eni(wchar_t * uni, unsigned char * eni, int size)
2
5500
by: tristanlbailey | last post by:
I been scouring the Internet for an answer to my problem, and a couple of times thought I had almost found the answer, but still to no avail. I'm tying to use the Rich Edit class (riched20.dll), to display unicode text. The riched20.dll file is loaded by using the LoadLibrary function, and a Rich Edit control created with the CreateWindowEx function. The text is input into a string variable from a unicode text file. The text is then inserted...
13
3929
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type Chinese characters, but neverthelss is unable to show them up on screen. The follow is some of the error message I received after I logged off the program: "Could not write output: <type "exceptions: UnicodeEncodeError'>, 'ascii' codec can't...
0
1298
by: amollokhande1 | last post by:
Hi All, Currently we are facing an issue while decoding the Base64Encoded unicode data. Here is the scenario We have one custom javascript function that encodes the unicode data using Base64 mechanism. After encoding the data on client side we are sending it back to the server. On Server side we are decoding this unicode data using microsoft framework inbuild functions as below Private Function DecodeVarHash(ByVal strEncoded As...
0
9641
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10313
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10146
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10080
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9944
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6735
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5378
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3643
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.