wo********@yahoo.com wrote:
: I need to parse the following HTML page and extract TV listing data
: using VC++
:
http://tvlistings.zap2it.com/tvlistings/ZCGrid.do
: any good way to extract the data?
: is easy for VC++ to call PERL script and do some regular expression?
: since the HTML page is not XML well formed, I cannot use a XML parser
: right?
: any other good ways to extract HTML page data?
Perl, HTML::Parser (my spelling is right but case may be wrong).
#!perl
use strict;
use HTML::Parser;
... perl code, etc...
As an aside, this is also an excellent tool for sax-like parsing of xml.
It has an xml mode that expects properly balanced tags, and etc, and
though it it doesn't handle all xml features, HTML::Parser comes with
almost all distros of perl, which means that any a script that uses it can
work with almost any installation of perl, even if you can't install
anything additional (a real life saver in a controlled environment).