473,326 Members | 2,127 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

LWP questions


I'm returning to Perl and Linux after many years away and while I
know/knew way back when about Perl and Unix I'm new to this world
today.

I'm considering using LWP as the heart of a Web application and have a
number of questions.

It appears to me that the Get method returns ONLY the content of the
single object referenced by the URL. Is this correct? To what
degree, if any, does LWP Get deal with script on the page that may be
involved in building the page content?

In the end, I need to get a page in much the same way a browser does
and then examine it, looking at the text on the page (as it would be
rendered by IE or Mozilla) for a bunch of stuff. I also need to
examine the HTML as it exist in the abstract for the page as actually
displayed for a bunch of stuff. On XP (no flame please, surely Perl
programmers can forgive an attachment to the ugly real world) the IE
object model has two objects InnerText and InnerHTML. InnerText is a
linearized version of the text as displayed on the page AFTER all
scripts have executed. InnerHTML seems to be the HTML that would
exist to create the page AFTER all scripts have executed. It is this
kind of structure that I need. Can LWP help me here? What is the
basic attack? Are there any examples in the Perl world.

Thanks for any help/clues.

R
Jul 19 '05 #1
6 3055
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@earthlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM. However, you can
get the HTML using LWP and parse that with any of the available
HTML parsers (e.g., HTML-TreeBuilder).
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
use LWP::Simple;

my $cachefile = 'mirrored.htm';

mirror('http://cpan.org', $cachefile);

my $tree = HTML::TreeBuilder->new_from_file($cachefile);

my $h1 = $tree->look_down('_tag', 'table');
print $h1->as_text if $h1;
Jul 19 '05 #2
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.

As I mentioned, I'm newly returned to Unix/Linux and Perl. Is there
something that might be more appropriate? I've some previous
experience in IE com automation under XP. Can I play the same sort of
game (or hopefully a simpler one) under Linux? What do I use for an
engine? Can I get by with wget (it seems to do a good job of
mirroring)? Will I need to work with Mozilla?

I'd appreciate any advice.

Thanks again.

R

On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x.net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@earthlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM. However, you can
get the HTML using LWP and parse that with any of the available
HTML parsers (e.g., HTML-TreeBuilder).
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
use LWP::Simple;

my $cachefile = 'mirrored.htm';

mirror('http://cpan.org', $cachefile);

my $tree = HTML::TreeBuilder->new_from_file($cachefile);

my $h1 = $tree->look_down('_tag', 'table');
print $h1->as_text if $h1;


Jul 19 '05 #3
(Top-posting reordered.)

On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rb********@earthlink.net> wrote:
On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x.net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@earthlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM.


For many of the pages I'll be working with that
includes various client side scripts and includes.

Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.
Did you already have a look at http://cpan.org ?
Jul 19 '05 #4
On 17 Mar 2004 03:12:38 GMT, Roel van der Steen <ro*******@st2x.net>
wrote:
(Top-posting reordered.)

On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rb********@earthlink.net> wrote:
On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x.net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@earthlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.

LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM.
For many of the pages I'll be working with that
includes various client side scripts and includes.

Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.


Thanks, I'll look into HTML:Display and WWW:Mechanize. I picked up
the O'Reilly books and am also checking the web on these packages, but
the learning curve right now is a bit stiff particularly when I'm not
really sure where to look or what to look at. Thanks for your help.
Did you already have a look at http://cpan.org ?


I have checked cpan. Lots of apparently good stuff there, but again
I'm faced with not knowing what is really appropriate for my needs.

I've thought about trying to automate Mozilla and accessing its DOM
object to get at what I want. Do you have any reflections on that
attack?

Thanks again for the new clues.

R

Jul 19 '05 #5
Richard Bell wrote:
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.


When LWP requests a page from a server, it is no different than any
other brower's request, in that the server will process server-side
includes.

If the HTML returned contains JavaScript, it is up to you to provide
a JavaScript interpreter. I've seen many JavaScript functions that
do things like ask the graphic brower it is running in as to the
size (in pixels) of the currently active window so that it can
decide on the layout of the text is will be writing to the
document window. Other JavaScript uses include reading or
modifying the text being displayed in a field of a form. (Think of
<input type="text" name="clock" value="12:45:00 pm">.)

In other words, to handle a full range of client-side scripts,
you will have to re-invent a very large wheel: a complete browser
with graphical display and GUI widgets.

LWP is good at getting the raw HTML from the server. Postprocessing
the HTML on the client side before, during, and after rendering is
an entirely different kettle of fish.

I certainly would not want to emulate the quirks (features, bugs) of
IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
-Joe

specific.
Jul 19 '05 #6

No one ever said it would be easy.

I'm now looking into automating Mozilla (let it do the heavy lifting),
possibly from perl, possibly using the Mozilla application
environment. Any ideas where I can get clues/examples/insight into
the issues from the perl side? I've got the O'Reilly book for the app
environment so I'm reasonably armed there.

Richard

On Sat, 20 Mar 2004 22:33:08 GMT, Joe Smith <Jo*******@inwap.com>
wrote:
Richard Bell wrote:
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.


When LWP requests a page from a server, it is no different than any
other brower's request, in that the server will process server-side
includes.

If the HTML returned contains JavaScript, it is up to you to provide
a JavaScript interpreter. I've seen many JavaScript functions that
do things like ask the graphic brower it is running in as to the
size (in pixels) of the currently active window so that it can
decide on the layout of the text is will be writing to the
document window. Other JavaScript uses include reading or
modifying the text being displayed in a field of a form. (Think of
<input type="text" name="clock" value="12:45:00 pm">.)

In other words, to handle a full range of client-side scripts,
you will have to re-invent a very large wheel: a complete browser
with graphical display and GUI widgets.

LWP is good at getting the raw HTML from the server. Postprocessing
the HTML on the client side before, during, and after rendering is
an entirely different kettle of fish.

I certainly would not want to emulate the quirks (features, bugs) of
IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
-Joe

specific.


Jul 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: softwareengineer2006 | last post by:
All Interview Questions And Answers 10000 Interview Questions And Answers(C,C++,JAVA,DOTNET,Oracle,SAP) I have listed over 10000 interview questions asked in interview/placement test papers for...
0
by: connectrajesh | last post by:
INTERVIEWINFO.NET http://www.interviewinfo.net FREE WEB SITE AND SERVICE FOR JOB SEEKERS /FRESH GRADUATES NO ADVERTISEMENT
2
by: freepdfforjobs | last post by:
Full eBook with 4000 C#, JAVA,.NET and SQL Server Interview questions http://www.questpond.com/SampleInterviewQuestionBook.zip Download the JAVA , .NET and SQL Server interview sheet and rate...
4
by: Drew | last post by:
I posted this to the asp.db group, but it doesn't look like there is much activity on there, also I noticed that there are a bunch of posts on here pertaining to database and asp. Sorry for...
8
by: Krypto | last post by:
Hi, I have used Python for a couple of projects last year and I found it extremely useful. I could write two middle size projects in 2-3 months (part time). Right now I am a bit rusty and trying...
0
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7...
1
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7...
0
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7...
0
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions...
0
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.