Hi,
Sorry for the late response. Well, I'm trying to extract a structure of
the view of what a reader sees, e.g. extract all headings and link them
to the corresponding paragraphs etc. In the end, the output graph shall
be a hierachy of sections, sub-sections etc. I know this can be quite
complex, because the HTML used today can be very messy, esp. with
tables. I was suggested to use a "feature-based analysis" to extract
such information, but I'm not sure what exactly that should mean. What
should a feature-based analysis be, even in other contexts? Is it
really feasible to extract "features" of HTML documents?
Any help will be much appreciated.
Cheers,
Michael
Steve Pugh wrote:
da*****@hotmail.com wrote:
I've read somewhere that feature-based analysis can be used to
extractthe semantic structure of HTML documents. By semantic structure,
theymean the model of the rendered view a reader sees. Now, my question
is,what should such feature-based analysis involve? What exactly is a
feature-based analysis?
You asked the same question five days ago and at least twice in
December. You didn't bother to respond to any of the replies you got
then. So why should anyone bother replying to you now? Please join
the discussion and explain in more detail what you are looking for.
Steve
--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor
Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>