Hi,
I am a beginer in perl programming , my task is to extract content with in form tags from a HTML file.I tried doing it using Regular expression but could not get the desired result as the HTML is not properly formatted in a webpage.
can you do it using HTML-parser or HTML-Tree builder?
I found out so many tutorials but i am not able to do it using those tutorials. can some one help regarding this?
Thank you in advance,
Priscilla.
5 1772 miller 1,089
Recognized Expert Top Contributor
I would focus your efforts at HTML::Parser. There is an example for extracting the contents of a title tag directly in the pod documentation: http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm#EXAMP LES
Even more importantly though, that is actually an example that they provide for decoding the form contents of a html page. It can be found in the /eg/ directory of the cpan dist for this module. http://search.cpan.org/src/GAAS/HTML-Parser-3.55/eg/
You need to be able to do this on your own from this point though. If you have any specific trouble, feel free to ask, but you have plenty of specific resources at your disposal now to be able to solve this problem.
thank you for your suggestion
I tried running this example in the link you have given
use HTML::Parser ();
sub start_handler
{
return if shift ne "title";
my $self = shift;
$self->handler(text => sub { print shift }, "dtext");
$self->handler(end => sub { shift->eof if shift eq "title"; },
"tagname,self") ;
}
my $p = HTML::Parser->new(api_versio n => 3);
$p->handler( start => \&start_handler , "tagname,self") ;
$p->parse_file(shi ft || die) || die $!;
print "\n";
its giving me a message " died at line 15"
Can you please tell me what shift here means?
I am unable to understand where this program is taking the input to get the content of the title elment
miller 1,089
Recognized Expert Top Contributor
The line that is dying for you is this: -
$p->parse_file(shift || die) || die $!;
-
This is because this example is meant to be run as a script with a html file as a parameter.
IE:
./yourScript.pl yourHtmlFile.ht ml
The shift command in the above code is shifting the first element off the @ARGV array and parsing that file.
miller 1,089
Recognized Expert Top Contributor
Today is your lucky day. As a learning project I decided to try to get a working version of this code for you. The below script will accept an html file as a parameter, and then parses out the raw text of all forms found within that file. It saves them in the @forms array, which is then printed out at the end of the form.
You'll have to decode how this is done on your own, and of course adapt it to your own purposes since you did not more explicitly state what your end goal was. If you have some quick questions, I might answer them, but I will not be waste me time trying to teach you what this does. I was able to figure it out by simply going through all of the examples that they provided, and of course by reading the documentation. Although, I admit it could definitely use a little more verbose explaining. http://search.cpan.org/src/GAAS/HTML-Parser-3.55/eg/ http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm -
use HTML::Parser;
-
-
use strict;
-
-
my $file = shift || '20061101form.html';
-
-
my @forms = ();
-
-
sub start_form {
-
my ($tagname, $self, $text) = @_;
-
-
return if $tagname ne 'form';
-
-
# Setup Handlers
-
# - No longer look for start conditions, instead let the
-
# default handler pick those up.
-
$self->handler(start => undef);
-
$self->handler(default => \&save_form, "text");
-
$self->handler(end => \&end_form, "tagname,self,text");
-
-
# Start New Form
-
push @forms, '';
-
save_form($text);
-
}
-
-
sub save_form {
-
# Save all raw text in the current form.
-
$forms[-1] .= shift;
-
}
-
-
sub end_form {
-
my ($tagname, $self, $text) = @_;
-
-
save_form($text);
-
-
# End Processing, Wait for new Start Form
-
if ($tagname eq 'form') {
-
$self->handler(start => \&start_form, "tagname,self,text");
-
$self->handler(default => undef);
-
$self->handler(end => undef);
-
}
-
}
-
-
-
my $p = HTML::Parser->new(api_version => 3);
-
$p->handler( start => \&start_form, "tagname,self,text");
-
$p->parse_file($file) || die $!;
-
-
# Prints all found forms.
-
print @forms;
-
-
1;
-
-
__END__
-
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: Gary |
last post by:
Hi,
Given a regular formed html document, what would be the easiest way to
grab 3 parts of the file?
1- everything from beginning of file up to and including the <body>
tag
2- everything between the <body> and </body> tags
3- everything from and including the </body> tag to the end of file
Storing the contents of the file into an array I think would be best
|
by: Markus Ernst |
last post by:
Hello
I have a regex problem, spent about 7 hours on this now, but I don't find
the answer in the manual and googling, though I think this must have been
discussed before.
I try to simply extract the title and meta tags of a valid HTML page as an
array:
function extract_html($filename)
|
by: Will Stuyvesant |
last post by:
There seems to be no XML parser that can do validation in
the Python Standard Libraries. And I am stuck with Python
2.1.1. until my web master upgrades (I use Python for
CGI). I know pyXML has validating parsers, but I can not
compile things on the (unix) webserver. And even if I
could, the compiler I have access to would be different
than what was used to compile python for CGI.
I need to write a CGI script that does XML validation...
|
by: Robert Fentress |
last post by:
I'm loading an xml data file and then trying to take a particular node
and add it, as html, to an element on my page using inner HTML. The
xml is like what is below, with the ... representing eliminated
detail:
<?xml version="1.0" encoding="iso-8859-1"?>
<table>
<descriptions>
<fielddesc id="1">
...
|
by: baustin75 |
last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie
only when debugging in php designer 2005
--------------------------------------------------------------------------------
Hello,
I have a very simple problem but cannot seem to figure it out. I have a
very simple php script that sends a test email to myself. When I debug
it in PHP designer, it works with no problems, I get the test email. If
| |
by: Chris Millar |
last post by:
Can anyone help me on converting this vb asp page to C#,
thanks in advance.
chris.
<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.0 transitional//EN">
<%
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
''
|
by: Une Bévue |
last post by:
the purpose :
avoid all banners and unusefull contents of an html document the leaves
intact the part from start to body and inside the body leave only the
part where user has clicked (by mousedown -- mousemove --mouseup)).
for example a schematic document as input :
<html><title>...<meta<<link to csss, javascript ect>
<body...>
|
by: Werner |
last post by:
Hi,
I try to read (and extract) some "self extracting" zipefiles on a
Windows system. The standard module zipefile seems not to be able to
handle this.
False
Is there a wrapper or has some one experience with other libaries to
|
by: Sutharsan Nagasun |
last post by:
Hi, I am new to Perl. I need help with file search for the following scenario.
Currently as part of the archiving process, we have archived the files under
/$rootdir/Archive/yyyy directory where yyyy is year.
During the archiving process, for each day, yyyymmdd_trn.lst file and yyyymmdd_trn.tar.gz files are created where lst file will contain the names of all files that have been archived under yyyymmdd_trn.tar.gz file.
I am in the...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |