473,573 Members | 3,252 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

LWP questions


I'm returning to Perl and Linux after many years away and while I
know/knew way back when about Perl and Unix I'm new to this world
today.

I'm considering using LWP as the heart of a Web application and have a
number of questions.

It appears to me that the Get method returns ONLY the content of the
single object referenced by the URL. Is this correct? To what
degree, if any, does LWP Get deal with script on the page that may be
involved in building the page content?

In the end, I need to get a page in much the same way a browser does
and then examine it, looking at the text on the page (as it would be
rendered by IE or Mozilla) for a bunch of stuff. I also need to
examine the HTML as it exist in the abstract for the page as actually
displayed for a bunch of stuff. On XP (no flame please, surely Perl
programmers can forgive an attachment to the ugly real world) the IE
object model has two objects InnerText and InnerHTML. InnerText is a
linearized version of the text as displayed on the page AFTER all
scripts have executed. InnerHTML seems to be the HTML that would
exist to create the page AFTER all scripts have executed. It is this
kind of structure that I need. Can LWP help me here? What is the
basic attack? Are there any examples in the Perl world.

Thanks for any help/clues.

R
Jul 19 '05 #1
6 3103
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@ear thlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM. However, you can
get the HTML using LWP and parse that with any of the available
HTML parsers (e.g., HTML-TreeBuilder).
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuild er;
use LWP::Simple;

my $cachefile = 'mirrored.htm';

mirror('http://cpan.org', $cachefile);

my $tree = HTML::TreeBuild er->new_from_file( $cachefile);

my $h1 = $tree->look_down('_ta g', 'table');
print $h1->as_text if $h1;
Jul 19 '05 #2
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.

As I mentioned, I'm newly returned to Unix/Linux and Perl. Is there
something that might be more appropriate? I've some previous
experience in IE com automation under XP. Can I play the same sort of
game (or hopefully a simpler one) under Linux? What do I use for an
engine? Can I get by with wget (it seems to do a good job of
mirroring)? Will I need to work with Mozilla?

I'd appreciate any advice.

Thanks again.

R

On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x .net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@ear thlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM. However, you can
get the HTML using LWP and parse that with any of the available
HTML parsers (e.g., HTML-TreeBuilder).
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuild er;
use LWP::Simple;

my $cachefile = 'mirrored.htm';

mirror('http ://cpan.org', $cachefile);

my $tree = HTML::TreeBuild er->new_from_file( $cachefile);

my $h1 = $tree->look_down('_ta g', 'table');
print $h1->as_text if $h1;


Jul 19 '05 #3
(Top-posting reordered.)

On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rb********@ear thlink.net> wrote:
On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x .net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@ear thlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.


LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM.


For many of the pages I'll be working with that
includes various client side scripts and includes.

Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.
Did you already have a look at http://cpan.org ?
Jul 19 '05 #4
On 17 Mar 2004 03:12:38 GMT, Roel van der Steen <ro*******@st2x .net>
wrote:
(Top-posting reordered.)

On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rb********@ear thlink.net> wrote:
On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <ro*******@st2x .net>
wrote:
On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rb********@ear thlink.net> wrote:
I'm considering using LWP as the heart of a Web application and have a
number of questions.

LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM.
For many of the pages I'll be working with that
includes various client side scripts and includes.

Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.


Thanks, I'll look into HTML:Display and WWW:Mechanize. I picked up
the O'Reilly books and am also checking the web on these packages, but
the learning curve right now is a bit stiff particularly when I'm not
really sure where to look or what to look at. Thanks for your help.
Did you already have a look at http://cpan.org ?


I have checked cpan. Lots of apparently good stuff there, but again
I'm faced with not knowing what is really appropriate for my needs.

I've thought about trying to automate Mozilla and accessing its DOM
object to get at what I want. Do you have any reflections on that
attack?

Thanks again for the new clues.

R

Jul 19 '05 #5
Richard Bell wrote:
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.


When LWP requests a page from a server, it is no different than any
other brower's request, in that the server will process server-side
includes.

If the HTML returned contains JavaScript, it is up to you to provide
a JavaScript interpreter. I've seen many JavaScript functions that
do things like ask the graphic brower it is running in as to the
size (in pixels) of the currently active window so that it can
decide on the layout of the text is will be writing to the
document window. Other JavaScript uses include reading or
modifying the text being displayed in a field of a form. (Think of
<input type="text" name="clock" value="12:45:00 pm">.)

In other words, to handle a full range of client-side scripts,
you will have to re-invent a very large wheel: a complete browser
with graphical display and GUI widgets.

LWP is good at getting the raw HTML from the server. Postprocessing
the HTML on the client side before, during, and after rendering is
an entirely different kettle of fish.

I certainly would not want to emulate the quirks (features, bugs) of
IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
-Joe

specific.
Jul 19 '05 #6

No one ever said it would be easy.

I'm now looking into automating Mozilla (let it do the heavy lifting),
possibly from perl, possibly using the Mozilla application
environment. Any ideas where I can get clues/examples/insight into
the issues from the perl side? I've got the O'Reilly book for the app
environment so I'm reasonably armed there.

Richard

On Sat, 20 Mar 2004 22:33:08 GMT, Joe Smith <Jo*******@inwa p.com>
wrote:
Richard Bell wrote:
Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.


When LWP requests a page from a server, it is no different than any
other brower's request, in that the server will process server-side
includes.

If the HTML returned contains JavaScript, it is up to you to provide
a JavaScript interpreter. I've seen many JavaScript functions that
do things like ask the graphic brower it is running in as to the
size (in pixels) of the currently active window so that it can
decide on the layout of the text is will be writing to the
document window. Other JavaScript uses include reading or
modifying the text being displayed in a field of a form. (Think of
<input type="text" name="clock" value="12:45:00 pm">.)

In other words, to handle a full range of client-side scripts,
you will have to re-invent a very large wheel: a complete browser
with graphical display and GUI widgets.

LWP is good at getting the raw HTML from the server. Postprocessing
the HTML on the client side before, during, and after rendering is
an entirely different kettle of fish.

I certainly would not want to emulate the quirks (features, bugs) of
IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
-Joe

specific.


Jul 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
4078
by: softwareengineer2006 | last post by:
All Interview Questions And Answers 10000 Interview Questions And Answers(C,C++,JAVA,DOTNET,Oracle,SAP) I have listed over 10000 interview questions asked in interview/placement test papers for all companies between year 2000-2005 in my website http://www.geocities.com/allinterviewquestion/ So please have a look and make use of it.
0
4565
by: connectrajesh | last post by:
INTERVIEWINFO.NET http://www.interviewinfo.net FREE WEB SITE AND SERVICE FOR JOB SEEKERS /FRESH GRADUATES NO ADVERTISEMENT
2
7178
by: freepdfforjobs | last post by:
Full eBook with 4000 C#, JAVA,.NET and SQL Server Interview questions http://www.questpond.com/SampleInterviewQuestionBook.zip Download the JAVA , .NET and SQL Server interview sheet and rate yourself. This will help you judge yourself are you really worth of attending interviews. If you own a company best way to judge if the candidate is...
4
2497
by: Drew | last post by:
I posted this to the asp.db group, but it doesn't look like there is much activity on there, also I noticed that there are a bunch of posts on here pertaining to database and asp. Sorry for cross-posting. I am trying to build a "checklist", where a user can navigate to an ASP page on the intranet which shows a list of "questions" that the...
8
7963
by: Krypto | last post by:
Hi, I have used Python for a couple of projects last year and I found it extremely useful. I could write two middle size projects in 2-3 months (part time). Right now I am a bit rusty and trying to catch up again with Python. I am now appearing for Job Interviews these days and I am wondering if anybody of you appeared for a Python...
0
1480
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers7.html C# Interview Questions and Answers 6...
1
1611
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers7.html C# Interview Questions and Answers 6...
0
4475
by: ramu | last post by:
C# Interview Questions and Answers8 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers8.html C# Interview Questions and Answers7 http://allinterviewsbooks.blogspot.com/2008/07/c-interview-questions-and-answers7.html C# Interview Questions and Answers 6...
0
3417
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions http://interviewdoor.com/technical/C-Interview-Questions.htm C# Interview Questions...
0
2923
by: reema | last post by:
EJB Interview Questions http://interviewdoor.com/technical/EJB-Interview-Questions.htm CSS Interview Questions http://interviewdoor.com/technical/CSS-Interview-Questions.htm C Interview Questions http://interviewdoor.com/technical/C-Interview-Questions.htm C# Interview Questions...
0
7977
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8165
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7730
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8026
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6347
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
3686
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2163
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1256
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
984
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.