I have the following code:
>>web_page = urllib.urlopen( "http://www.python.org" ) file = open("temp.html ", "w") web_page_cont ents = web_page.read() file.write(we b_page_contents ) file.close
<built-in method close of file object at 0xb7cc76e0>
>>>
The file "temp.html" is created, but it doesn't look like the page at www.python.org. I'm guessing there are multiple frames and my code did
not get everything. Can anyone point me to a tutorial or other
reference on how to "get" all of the html contents at a particular
page?
Why did Python print the line after "file.close "?
Thanks,
Pete 10 2270
"Pete" <ha************ **@post.comwrot e in message
news:11******** **************@ i3g2000cwc.goog legroups.com...
>I have the following code:
>>>web_page = urllib.urlopen( "http://www.python.org" ) file = open("temp.html ", "w") web_page_con tents = web_page.read() file.write(w eb_page_content s) file.close
<built-in method close of file object at 0xb7cc76e0>
>>>>
The file "temp.html" is created, but it doesn't look like the page at www.python.org. I'm guessing there are multiple frames and my code did
not get everything. Can anyone point me to a tutorial or other
reference on how to "get" all of the html contents at a particular
page?
Why did Python print the line after "file.close "?
Thanks,
Pete
A. You didn't actually invoke the close method, you simply referenced it,
which is why you got the output line after file.close. Python is not VB.
To call close, you have to follow it with ()'s, as in:
file.close()
This will have the added benefit of flushing the output to temp.html,
probably containing the missing content you were looking for.
B. Don't name variables "file", or "list", "str", "dict", "int", etc. Doing
so masks global names of builtin data types. Try "tempFile" instead.
-- Paul
I have the following code:
>>web_page = urllib.urlopen( "http://www.python.org" ) file = open("temp.html ", "w") web_page_cont ents = web_page.read() file.write(we b_page_contents ) file.close
<built-in method close of file object at 0xb7cc76e0>
>>>
The file "temp.html" is created, but it doesn't look like the page at www.python.org. I'm guessing there are multiple frames and my code did
not get everything. Can anyone point me to a tutorial or other
reference on how to "get" all of the html contents at a particular
page?
Why did Python print the line after "file.close "?
Thanks,
Pete
A. You didn't actually invoke the close method, you simply referenced it,
which is why you got the output line after file.close. Python is not VB.
To call close, you have to follow it with ()'s, as in:
file.close()
Ahhhh. Thank you very much!
This will have the added benefit of flushing the output to temp.html,
probably containing the missing content you were looking for.
B. Don't name variables "file", or "list", "str", "dict", "int", etc. Doing
so masks global names of builtin data types. Try "tempFile" instead.
Oh. Thanks again!
The file "temp.html" is definitely different than the first run, but
still not anything close to www.python.org . Any other suggestions?
Thanks,
Pete
-- Paul
Pete wrote:
The file "temp.html" is definitely different than the first run, but
still not anything close to www.python.org . Any other suggestions?
If you mean that the page looks different in a browser, for one thing
you have to download the css files too. Here's the relevant extract
from the main page:
<link media="screen" href="styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet " />
<link media="scReen" href="styles/netscape4.css" type="text/css"
rel="stylesheet " />
<link media="print" href="styles/print.css" type="text/css"
rel="stylesheet " />
<link media="screen" href="styles/largestyles.css " type="text/css"
rel="alternate stylesheet" title="large text" />
<link media="screen" href="styles/defaultfonts.cs s" type="text/css"
rel="alternate stylesheet" title="default fonts" />
You may either hardcode the urls of the css files, or parse the page,
extract the css links and normalize them to absolute urls. The first is
simpler but the second is more robust, in case a new css is added or an
existing one is renamed or removed.
George
Can anyone point me to a tutorial or other reference on how to "get" all
of the html contents at a particular page?
Why not use httrack? http://www.satzbau-gmbh.de/staff/abel/httrack-py/
Sincerely,
Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.
The file "temp.html" is definitely different than the first run, but
still not anything close to www.python.org . Any other suggestions?
If you mean that the page looks different in a browser, for one thing
you have to download the css files too. Here's the relevant extract
from the main page:
<link media="screen" href="styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet " />
<link media="scReen" href="styles/netscape4.css" type="text/css"
rel="stylesheet " />
<link media="print" href="styles/print.css" type="text/css"
rel="stylesheet " />
<link media="screen" href="styles/largestyles.css " type="text/css"
rel="alternate stylesheet" title="large text" />
<link media="screen" href="styles/defaultfonts.cs s" type="text/css"
rel="alternate stylesheet" title="default fonts" />
You may either hardcode the urls of the css files, or parse the page,
extract the css links and normalize them to absolute urls. The first is
simpler but the second is more robust, in case a new css is added or an
existing one is renamed or removed.
George
Thanks for the information on CSS. I'll look into that later, but now
my question is on the first two lines of HTML code. Here's my latest
python code:
>>import urllib web_page = urllib.urlopen( "http://www.python.org" ) fileTemp = open("temp.html ", "w") web_page_cont ents = web_page.read() fileTemp.writ e(web_page_cont ents) fileTemp.clos e()
Here are the first two lines of temp.html:
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/x html1/DTD/xhtml1-transitional.dt d">
2 <html lang="en" xml:lang="en"
xmlns="http://www.w3.org/1999/xhtml">
Here are the first two lines of www.python.org as saved from Firefox:
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/x html1/DTD/xhtml1-transitional.dt d">
2 <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
lang="en"><head >
Lines one are identical. Lines two are different. Why would lines two
differ? Hmmmm...
Thanks,
Pete
Can anyone point me to a tutorial or other reference on how to "get" all
of the html contents at a particular page?
Why not use httrack?
http://www.satzbau-gmbh.de/staff/abel/httrack-py/
Sincerely,
Wolfgang Keller
--
My email-address is correct.
Do NOT remove ".nospam" to reply.
Thanks for the tip. I'll check that out. Is that your code?
--
Pete
Pete wrote:
The file "temp.html" is definitely different than the first run, but
still not anything close to www.python.org . Any other suggestions?
If you mean that the page looks different in a browser, for one thing
you have to download the css files too. Here's the relevant extract
from the main page:
<link media="screen" href="styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet " />
<link media="scReen" href="styles/netscape4.css" type="text/css"
rel="stylesheet " />
<link media="print" href="styles/print.css" type="text/css"
rel="stylesheet " />
<link media="screen" href="styles/largestyles.css " type="text/css"
rel="alternate stylesheet" title="large text" />
<link media="screen" href="styles/defaultfonts.cs s" type="text/css"
rel="alternate stylesheet" title="default fonts" />
You may either hardcode the urls of the css files, or parse the page,
extract the css links and normalize them to absolute urls. The first is
simpler but the second is more robust, in case a new css is added or an
existing one is renamed or removed.
George
Thanks for the information on CSS. I'll look into that later, but now
my question is on the first two lines of HTML code. Here's my latest
python code:
>import urllib web_page = urllib.urlopen( "http://www.python.org" ) fileTemp = open("temp.html ", "w") web_page_conte nts = web_page.read() fileTemp.write (web_page_conte nts) fileTemp.close ()
Here are the first two lines of temp.html:
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/x html1/DTD/xhtml1-transitional.dt d">
2 <html lang="en" xml:lang="en"
xmlns="http://www.w3.org/1999/xhtml">
Here are the first two lines of www.python.org as saved from Firefox:
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/x html1/DTD/xhtml1-transitional.dt d">
2 <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
lang="en"><head >
Lines one are identical. Lines two are different. Why would lines two
differ? Hmmmm...
Functionally they are the same, but third line included in Firefox.
Opera View Source command produces the same result as Python. It looks
like Firefox will do some cosmetic changes to source but nothing that
would change the way code works. Notice that attributes in second line
are re-arranged in order only?
>
Thanks,
Pete
24 Sep 2006 10:09:16 -0700, Rainy <ak@silmarill.o rg>:
Functionally they are the same, but third line included in Firefox.
Opera View Source command produces the same result as Python.
[snip]
It's better to compare with the result of a downloader-only (instead
of a parser), like wget on Unix. That way you'll get exactly the same
bytes (assuming the page is static).
--
Felipe.
Functionally they are the same, but third line included in Firefox.
Opera View Source command produces the same result as Python.
[snip]
It's better to compare with the result of a downloader-only (instead
of a parser), like wget on Unix. That way you'll get exactly the same
bytes (assuming the page is static).
--
Felipe.
Ahhhh. wget - most cool. My temp.html matches wget. Now to capture that
pesky css stuff...
Thanks,
Pete This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Leif Wessman |
last post by:
I have a php-webpage that needs the database in the beginning and in
the end of the script. In the middle there is a lot of processing that
takes several seconds - during that time I don't use the database
connection.
What is the best approach? Should I close the connection after I'm
done with it OR should I reuse the connection thru my whole script?
Where can I read more about this? Any pointers?
|
by: Dave Opstad |
last post by:
In this snippet:
d = {'x': 1}
value = d.get('x', bigscaryfunction())
the bigscaryfunction is always called, even though 'x' is a valid key.
Is there a "short-circuit" version of get that doesn't evaluate the
second argument if the first is a valid key? For now I'll code around
it, but this behavior surprised me a bit...
|
by: Harry |
last post by:
Hi All,
Can anyone clever out there tell me why the below script does not
work!
- I have a page with two radio boxes with values of "agree" and
"not_agree".
- The form is set to GET which goes to the below script for
processing.
- No matter which of the two radio boxes are selected, it always goes
to the page "/broadband/order.asp".
|
by: Pete Mahoney |
last post by:
Ok I use a textarea to store data input by the user, and then upon
them clicking the submit button I store this data to a database. The
problem is once the user inputs too much data (about 3 paragraphs or
2020 characters) when they click on the submit button nothing happens.
When I say nothing happens I mean just that, nothing at all happens
the page just sits there as if nothing at all happened. If I remove
one line for the textarea,...
|
by: Greg Heilers |
last post by:
Hola all....
I need to code a site for a friend who wants a 3-box
layout, such as this:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
| |
by: Jeff |
last post by:
Visual Studio 2003
DotNet framework 1.1
Windows 2000 Pro
I create two pages in an Asp.net application, one is html page with a form
in it:
....
<form id="testForm" method="post" action="test.aspx" runat="server">
<input type="hidden" id="hiddenTest" value="hello, world">
<input type="submit" id="btnSubmit" value="submit">
|
by: Trygve Lorentzen |
last post by:
Hi,
my webservice is running on Win2000 SP4, IIS 5.0 fully patched, connecting
to a MySQL database and mainly returning Typed DataSet's from webmethods.
After running for a while, generally a few days, the webservice stops
responding and the .NET windows app client fails with no informative error
message. When I try to run any webmethod from the webservice locally in the
browser I first get a "This page cannot be display IE error...
|
by: James MA |
last post by:
I'm now writing a small program to communicate a web server to simulate a web
client. I use te httpwebrequest to talk with the server, and it works find
for "POST" method, however, when i test other link using "GET" method, i
found that the cookies data has not included in the request.
Here is the sample:
' sURL is the URL of server page
' pCookies is a varible contain the cookies data
|
by: vvkl |
last post by:
I have readed a example code from MSDN about FormsAuthenticationTicket
calss, but there's a line I can't understand :
'strRedirect = Request;'
What's the mean in which square brackets?
Thank you!
A Chinese student.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |