By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,374 Members | 2,014 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,374 IT Pros & Developers. It's quick & easy.

urgent need help in parsing html tables

P: 8
I am trying to parse a simple table with two headings and get the rows but I am having a big problem trying to find out how to pass the link to the html or path to the html.

Html is apparently in my desktop itself I have a path but I have no clue how to use that in HTML::TableExtract.

Expand|Select|Wrap|Line Numbers
  1. use HTML::TableExtract;
  2.  $te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
  3.  $te->parse($html_string);
  4.  
  5.  # Examine all matching tables
  6.  foreach $ts ($te->tables) {
  7.    print "Table (", join(',', $ts->coords), "):\n";
  8.  
  9.    foreach $row ($ts->rows) {
  10.       print join(',', @$row), "\n";
  11.    }
  12.  }
Lets say I put those headings supposed heading1 and heading2 in place of Data Price
Where should put the link to the html
which is something like /home/jack/desktop/sample.html
I tried doing $html_string="/home/jack/desktop/sample.html" but it does not work at all

what am I supposed to do I appreciate if you can help me out of this .

thanks a lot
Jul 1 '08 #1
Share this Question
Share on Google+
6 Replies


KevinADC
Expert 2.5K+
P: 4,059
If you use the better HTML::TableParser module it can open the file for you. See the parse_file method:

http://search.cpan.org/~djerius/HTML...TableParser.pm

basically:

Expand|Select|Wrap|Line Numbers
  1. $p->parse_file('c:/windows/desktop/foo.html');
  2.  
where $p is the parser object and the file path is the correct one for your computer and file. Note: you can use forward slashes in windows file/directory paths.
Jul 2 '08 #2

P: 8
Thanks for the post but that looks more complicated then the previous one.
I just need to parse the a table in html which is in my desktop itself.
I do not want to use any kind of table id or sizes just the heading name.

What would be the best way to use HTML::TableExtract,
-I need to put the file path for html somewhere
(the problem I am facing here is everywhere throughout the examples in cspan html_string is already there without initialization its an incomplete program)

-I need to put the headers

Results: I need the table data thats all I am sorry but I do not want to get to see what id is my table and all that.


Please help me I think this is seems like a simple problem. I could not debug this problem because whenever I run I dont get errors and I dont get anything printed I am pretty much very irritatted and more hopeless everyday.I think I made a big mistake to tr using perl for this project the whole thing is so disorganized cant find a single example to just to that.

Please I would reall appreciate if someone can help me .

Prior thanks to all of those and thanks for the reply
Jul 2 '08 #3

KevinADC
Expert 2.5K+
P: 4,059
here you go:

Expand|Select|Wrap|Line Numbers
  1. open (HTML, 'c:/path/to/foo.html') or die "$!";
  2. my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
  3. close HTML;
Now you can parse $html.
Jul 2 '08 #4

P: 8
This is the program I wrote:
#!/usr/bin/perl
use HTML::TableExtract;
open (HTML, '/root/Desktop/test.html') or die "$!";
my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
$te = HTML::TableExtract->new( headers => [qw(Heading Heading_2)] );
$te->parse($HTML);
# Examine all matching tables
foreach $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach $row ($ts->rows) {
print join(',', @$row), "\n";
}
}

But when I do perl program.pl it does not do anything, it gives me a prompt.
Thanks for the reply I would appreciate if you solve this problem.

I am literally not getthing anything and after I do perl program.pl I get another prompt.
Thanks , please help
Jul 2 '08 #5

P: 8
Ok I think I got it there was a minor problem . Thanks a lot for help I appreciate
Jul 2 '08 #6

P: 8
Hi ,
I got the table extracted and I have a huge document full of tables. From this(HTML::TableExtract) module I am trying to search for keywords(from the user input) on the parsed tables I have to print only the necessary data.
I tried going CPAN but could not really find how to search through it for particular keywords.

One way to do it would be(a rather wrong way for me since I need corresponding columns or some other relevant data from the table if I find that in that particular table):
Output the result of the parsed tables into some .text and parse it from there
but parsing from there would hinder my aim to actually get the keywords corresponding columns

Aim and problem here:: is I cant find anyway to search through the resulting parsed table and get necessary data.


thanks for the reply I appreciate
Jul 2 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.