Extracting Semantic Structure of HTML Document- Feature based

dayzman

Hi,

I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?

Please help.

Cheers,
Michael

Jul 23 '05 #1

Subscribe Post Reply

1736

Steve Pugh

da*****@hotmail.com wrote:

I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?

You asked the same question five days ago and at least twice in
December. You didn't bother to respond to any of the replies you got
then. So why should anyone bother replying to you now? Please join the
discussion and explain in more detail what you are looking for.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>

Jul 23 '05 #2

dayzman

Hi,

Sorry for the late response. Well, I'm trying to extract a structure of
the view of what a reader sees, e.g. extract all headings and link them
to the corresponding paragraphs etc. In the end, the output graph shall
be a hierachy of sections, sub-sections etc. I know this can be quite
complex, because the HTML used today can be very messy, esp. with
tables. I was suggested to use a "feature-based analysis" to extract
such information, but I'm not sure what exactly that should mean. What
should a feature-based analysis be, even in other contexts? Is it
really feasible to extract "features" of HTML documents?

Any help will be much appreciated.

Cheers,
Michael

Steve Pugh wrote:

da*****@hotmail.com wrote:
I've read somewhere that feature-based analysis can be used to extractthe semantic structure of HTML documents. By semantic structure, theymean the model of the rendered view a reader sees. Now, my question is,what should such feature-based analysis involve? What exactly is a
feature-based analysis?
You asked the same question five days ago and at least twice in
December. You didn't bother to respond to any of the replies you got
then. So why should anyone bother replying to you now? Please join

the discussion and explain in more detail what you are looking for.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>

Jul 23 '05 #3

legalois

da*****@hotmail.com wrote:

Hi,

I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?

Please help.

Cheers,
Michael

If the the author of the article you read, or the person who suggested
you use this method for analysis of mark-up text knows what *he* means
by it, maybe he is the best source for a clearer explanation. Frankly,
it sounds idiosyncratic--maybe his own invention. Better to go to the
source, and find out.
Or...have you considered looking at any of the several document object
models (DOM)? A DOM is not HTML, but who knows. Maybe someone was a
little confused...
- Jake Lloyd

Jul 23 '05 #4

by: Richard Cornford | last post by:

I am interested in hearing opinions on the semantic meaning of FORM (elements) in HTML. I have to start of apologising because this question arose in a context that is not applicable to the...

HTML / CSS

Reading an HTML document & extracting content

by: Cognizance | last post by:

Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...

Javascript

Extracting Semantic Structure of HTML Doc

by: dayzman | last post by:

Hi, I'm interested in projects evolve about extracing semantic structure of HTML documents. What I mean by extracting semantic structure is to analyse HTML doc and outputs a model (perhaps a...

HTML / CSS

Extracting text from a "word document"-stream

by: Claus - Arcolutions | last post by:

I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose. The "Microsoft office ?.object" com reference, only...

C# / C Sharp

extracting db2 table records to csv

by: anwar | last post by:

Hi all, I have a task to extract DB2 table data to csv file from shell script. i have to execute a sql query and the retreived data should be moved to csv file. If i get a solution, ll be very...

DB2 Database

Extracting values from CDATA

by: Dana B | last post by:

I am trying to get the values in FIELD2 and FIELD3 in the XML file below using XSLT. I can get the value of CLOB_DATA. It comes back as an string. How can I extract the values FIELD2 and FIELD3?...

.NET Framework

extracting part of a document

by: Une Bévue | last post by:

the purpose : avoid all banners and unusefull contents of an html document the leaves intact the part from start to body and inside the body leave only the part where user has clicked (by...

Javascript

Extracting Data from IE

by: chris_j_adams | last post by:

Hi, I'm slowly discovering the world of JavaScript, so I'm not sure I'm attacking this problem in the right manner, thus if I'm in the wrong newsgroup, my apologies. What I'm trying to do is...

Javascript

Java + DOM + extracting text from XHTML

by: Damo | last post by:

I have a program, That retrieves a webpage , such as a search engine results page from the web, Then I need to go through the document and retrieve just the search results. The problem is I want to...

.NET Framework

Extract String From Enclosing Tuple

by: rshepard | last post by:

I'm a bit embarrassed to have to ask for help on this, but I'm not finding the solution in the docs I have here. Data are assembled for writing to a database table. A representative tuple looks...

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Extracting Semantic Structure of HTML Document- Feature based

Similar topics