Today is your lucky day. As a learning project I decided to try to get a working version of this code for you. The below script will accept an html file as a parameter, and then parses out the raw text of all forms found within that file. It saves them in the @forms array, which is then printed out at the end of the form.
You'll have to decode how this is done on your own, and of course adapt it to your own purposes since you did not more explicitly state what your end goal was. If you have some quick questions, I might answer them, but I will not be waste me time trying to teach you what this does. I was able to figure it out by simply going through all of the examples that they provided, and of course by reading the documentation. Although, I admit it could definitely use a little more verbose explaining.
http://search.cpan.org/src/GAAS/HTML-Parser-3.55/eg/ http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm -
use HTML::Parser;
-
-
use strict;
-
-
my $file = shift || '20061101form.html';
-
-
my @forms = ();
-
-
sub start_form {
-
my ($tagname, $self, $text) = @_;
-
-
return if $tagname ne 'form';
-
-
# Setup Handlers
-
# - No longer look for start conditions, instead let the
-
# default handler pick those up.
-
$self->handler(start => undef);
-
$self->handler(default => \&save_form, "text");
-
$self->handler(end => \&end_form, "tagname,self,text");
-
-
# Start New Form
-
push @forms, '';
-
save_form($text);
-
}
-
-
sub save_form {
-
# Save all raw text in the current form.
-
$forms[-1] .= shift;
-
}
-
-
sub end_form {
-
my ($tagname, $self, $text) = @_;
-
-
save_form($text);
-
-
# End Processing, Wait for new Start Form
-
if ($tagname eq 'form') {
-
$self->handler(start => \&start_form, "tagname,self,text");
-
$self->handler(default => undef);
-
$self->handler(end => undef);
-
}
-
}
-
-
-
my $p = HTML::Parser->new(api_version => 3);
-
$p->handler( start => \&start_form, "tagname,self,text");
-
$p->parse_file($file) || die $!;
-
-
# Prints all found forms.
-
print @forms;
-
-
1;
-
-
__END__
-