Simple Question : files and URLLIB

Richard Shea

Hi - I'm new to Python. I've been trying to use URLLIB and the 'tidy'
function (part of the mx.tidy package). There's one thing I'm having
real difficulties understanding. When I did this ...

finA= urllib.urlopen( 'http://www.python.org/')
foutA=open('C:\ \testout.html', 'w')
tidy(finA,foutA ,None)

I get ...

Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "mx\Tidy\Tidy.p y", line 38, in tidy
return mxTidy.tidy(inp ut, output, errors, kws)
TypeError: inputstream must be a file object or string

.... what I don't understand is surely the result of a urllib is a file
object ? Isn't it ? To quote the manual at :

http://www.python.org/doc/current/li...le-urllib.html

"If all went well, a file-like object is returned". I can make the
tidy function happy but changing the code to read ...

finA= urllib.urlopen( 'http://www.python.org/').read()

.... I haven't had time to look into this properly yet but I suspect
finA is now a string not a file handle ?

Anyway if anyone can throw light on this I would be grateful.

thanks

richard.shea.

Jul 18 '05 #1

Subscribe Reply

3994

bromden

> "If all went well, a file-like object is returned". I can make the

file-like means having similar interface to a file object (methods read,
readline, etc.), but not a real file though,

mxTidy.tidy most probably requires a real file to be passed,
just you look into Tidy.py (line 38) and you'll know for sure

--
bromden[at]gazeta.pl

Jul 18 '05 #2

Mark Carter

> finA= urllib.urlopen( 'http://www.python.org/').read()

... I haven't had time to look into this properly yet but I suspect
finA is now a string not a file handle ?

Correct. If you do:
print type(finA)
you obtain the result:
<type 'str'>

If you do:
finA= urllib.urlopen( 'http://www.python.org/')
print type(finA)
then you obtain the result:
<type 'instance'>

Compare this with:
finA = open("blah", "w")
print type(finA)
which gives the result:
<type 'file'>

According to the docs on urlopen( url[, data[, proxies]]) :
"If all went well, a file-like object is returned."
So the answer would appear to be: "close, but no cigar".

Jul 18 '05 #3

Terry Reedy

"Richard Shea" <ri*********@fa stmail.fm> wrote in message
news:28******** *************** **@posting.goog le.com...

Hi - I'm new to Python. I've been trying to use URLLIB and the 'tidy' function (part of the mx.tidy package). There's one thing I'm having
real difficulties understanding. When I did this ...

finA= urllib.urlopen( 'http://www.python.org/')
foutA=open('C:\ \testout.html', 'w')
tidy(finA,foutA ,None)

I get ...

Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "mx\Tidy\Tidy.p y", line 38, in tidy
return mxTidy.tidy(inp ut, output, errors, kws)
TypeError: inputstream must be a file object or string

... what I don't understand is surely the result of a urllib is a file object ? Isn't it ? To quote the manual at :

http://www.python.org/doc/current/li...le-urllib.html

"If all went well, a file-like object is returned".
'file-like object' is different from 'file object' From urllib.py doc
string:
"The object returned by URLopener().ope n(file) will differ per
protocol. All you know is that is has methods read(), readline(),
readlines(), fileno(), close() and info()."

Why this is not good enough for mx.tidy is a question for it's author.
I can make the tidy function happy by changing the code to read ...

finA= urllib.urlopen( 'http://www.python.org/').read()

... I haven't had time to look into this properly yet but I suspect
finA is now a string not a file handle ?

Yes. So it meets the 'file or string' requirement.

Terry J. Reedy

Jul 18 '05 #4

Richard Shea

Thanks to everyone for the info/feedback. In particular I didn't know
you could that ...

type(finA)

.... business (which shows you how new to Python I am probably) but
it'll come in handy.

As I think you realised I had misunderstood exactly what urllib was
offering however the blah.read() approach is quite good enough. Just
out of curiousity though if 'tidy' demanded a file (rather than being
prepared to take a string as it is)would the only sure approach be to
....

f1=open('C:\\wo rkfile.html','w ')
strHTML= urllib.urlopen( 'http://www.python.org/').read()
f1.write(strHTM L)
tidy(f1,strOut, None)

.... that is to take the string that results from the read on urllib
file-like object and write it back out to a file ?

Just wondering ...

Thanks again for the information on my original question.

regards

richard.

ri*********@fas tmail.fm (Richard Shea) wrote in message news:<28******* *************** ***@posting.goo gle.com>...

Hi - I'm new to Python. I've been trying to use URLLIB and the 'tidy'
function (part of the mx.tidy package). There's one thing I'm having
real difficulties understanding. When I did this ...

finA= urllib.urlopen( 'http://www.python.org/')
foutA=open('C:\ \testout.html', 'w')
tidy(finA,foutA ,None)

I get ...

Traceback (most recent call last):
File "<interacti ve input>", line 1, in ?
File "mx\Tidy\Tidy.p y", line 38, in tidy
return mxTidy.tidy(inp ut, output, errors, kws)
TypeError: inputstream must be a file object or string

... what I don't understand is surely the result of a urllib is a file
object ? Isn't it ? To quote the manual at :

http://www.python.org/doc/current/li...le-urllib.html

"If all went well, a file-like object is returned". I can make the
tidy function happy but changing the code to read ...

finA= urllib.urlopen( 'http://www.python.org/').read()

... I haven't had time to look into this properly yet but I suspect
finA is now a string not a file handle ?

Anyway if anyone can throw light on this I would be grateful.

thanks

richard.shea.

Jul 18 '05 #5

Similar topics

3593

Downloading files off Interet

by: Blaktyger | last post by:

I would like to download some mp3 files from a web site. There is to much of them and I had the idea of writing a script to do it for me. Code: import string import urllib f = urllib.urlopen(""" http://www.somemp3site.com/somemp3.com""") fic=open('mp3file.mp3','w')

Python

4540

How to batch download files from web page?

by: sj | last post by:

I wish to download hundreds of files from the University of Iowa sound archive. Doing it manually would be a daunting task especially since the files are each a few mega bytes long. Is there a standard way of using Python for such a task? I have a fair amount of programming experiance but very little of it relates to networks. For those who are intrested the University of Iowa's sound archive may be found at...

Python

2072

urllib problem (maybe bugs?)

by: Timothy Wu | last post by:

Hi, I'm trying to fill the form on page http://www.cbs.dtu.dk/services/TMHMM/ using urllib. There are two peculiarities. First of all, I am filling in incorrect key/value pairs in the parameters on purpose because that's the only way I can get it to work.. For "version" I am suppose to leave it unchecked, having value of empty string. And for name "outform" I am suppose to assign it a value of "-short". Instead, I left out

Python

1291

Filenames of files downloaded via urlretrieve that have been redirected

by: Ray Slakinski | last post by:

I got a small issue, I am using urllib.urlretreive to download files but in some cases I'm downloading from a CGI that is redirecting urlretrieve to a different url. Example: urllib.urlretreive('http://someurl.com/files.asp?file=55', 'tempFileName.tmp') Is there a way to know what filename files.asp is redirecting to so I

Python

5819

Downloading files using urllib in a for loop?

by: justsee | last post by:

Hi, I'm using Python 2.3 on Windows for the first time, and am doing something wrong in using urllib to retrieve images from urls embedded in a csv file. If I explicitly specify a url and image name it works fine(commented example in the code), but if I pass in variables in this for loop it throws errors: --- The script: import csv, urllib

Python

1487

Instead of saving text files i need as html

by: Shani | last post by:

I have the following code which takes a list of urls "http://google.com", without the quotes ofcourse, and then saves there source code as a text file. I wan to alter the code so that for the list of URLs an html file is saved. -----begin----- import urllib urlfile = open(r'c:\temp\url.txt', 'r') for lines in urlfile: try:

Python

2544

downloading files

by: Ehsan | last post by:

I foundd this code in ASPN Python Cookbook for downloading files in python but when it finished downloading files the files became corrupted and didn't open, the files in internet havn't any problem: def download(url,fileName): """Copy the contents of a file from a given URL to a local file. """ import urllib

Python

13044

urllib (54, 'Connection reset by peer') error

by: chrispoliquin | last post by:

Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib module. The problem is that after 110 pages or so the script sort of hangs and then I get the following traceback: Traceback (most recent call last):

Python

4044

How do I compare files?

by: Clay Hobbs | last post by:

I am making a program that (with urllib) that downloads two jpeg files and, if they are different, displays the new one. I need to find a way to compare two files in Python. How is this done? -- Ratfink

Python

8752

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9401

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9176

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9113

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6702

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6011

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4519

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

2635

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2157

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General