473,406 Members | 2,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Extracting Semantic Structure of HTML Document- Feature based

Hi,

I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?

Please help.

Cheers,
Michael

Jul 23 '05 #1
3 1736
da*****@hotmail.com wrote:
I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?


You asked the same question five days ago and at least twice in
December. You didn't bother to respond to any of the replies you got
then. So why should anyone bother replying to you now? Please join the
discussion and explain in more detail what you are looking for.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 23 '05 #2
Hi,

Sorry for the late response. Well, I'm trying to extract a structure of
the view of what a reader sees, e.g. extract all headings and link them
to the corresponding paragraphs etc. In the end, the output graph shall
be a hierachy of sections, sub-sections etc. I know this can be quite
complex, because the HTML used today can be very messy, esp. with
tables. I was suggested to use a "feature-based analysis" to extract
such information, but I'm not sure what exactly that should mean. What
should a feature-based analysis be, even in other contexts? Is it
really feasible to extract "features" of HTML documents?

Any help will be much appreciated.

Cheers,
Michael

Steve Pugh wrote:
da*****@hotmail.com wrote:
I've read somewhere that feature-based analysis can be used to extractthe semantic structure of HTML documents. By semantic structure, theymean the model of the rendered view a reader sees. Now, my question is,what should such feature-based analysis involve? What exactly is a
feature-based analysis?
You asked the same question five days ago and at least twice in
December. You didn't bother to respond to any of the replies you got
then. So why should anyone bother replying to you now? Please join

the discussion and explain in more detail what you are looking for.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>


Jul 23 '05 #3
da*****@hotmail.com wrote:
Hi,

I've read somewhere that feature-based analysis can be used to extract
the semantic structure of HTML documents. By semantic structure, they
mean the model of the rendered view a reader sees. Now, my question is,
what should such feature-based analysis involve? What exactly is a
feature-based analysis?

Please help.

Cheers,
Michael

If the the author of the article you read, or the person who suggested
you use this method for analysis of mark-up text knows what *he* means
by it, maybe he is the best source for a clearer explanation. Frankly,
it sounds idiosyncratic--maybe his own invention. Better to go to the
source, and find out.
Or...have you considered looking at any of the several document object
models (DOM)? A DOM is not HTML, but who knows. Maybe someone was a
little confused...
- Jake Lloyd
Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Richard Cornford | last post by:
I am interested in hearing opinions on the semantic meaning of FORM (elements) in HTML. I have to start of apologising because this question arose in a context that is not applicable to the...
1
by: Cognizance | last post by:
Hi gang, I'm an ASP developer by trade, but I've had to create client side scripts with JavaScript many times in the past. Simple things, like validating form elements and such. Now I've been...
3
by: dayzman | last post by:
Hi, I'm interested in projects evolve about extracing semantic structure of HTML documents. What I mean by extracting semantic structure is to analyse HTML doc and outputs a model (perhaps a...
2
by: Claus - Arcolutions | last post by:
I got a word document as a stream, and I want to get the text from the word document. But I cant seem to find anything to use for that purpose. The "Microsoft office ?.object" com reference, only...
4
by: anwar | last post by:
Hi all, I have a task to extract DB2 table data to csv file from shell script. i have to execute a sql query and the retreived data should be moved to csv file. If i get a solution, ll be very...
10
by: Dana B | last post by:
I am trying to get the values in FIELD2 and FIELD3 in the XML file below using XSLT. I can get the value of CLOB_DATA. It comes back as an string. How can I extract the values FIELD2 and FIELD3?...
7
by: Une Bévue | last post by:
the purpose : avoid all banners and unusefull contents of an html document the leaves intact the part from start to body and inside the body leave only the part where user has clicked (by...
2
by: chris_j_adams | last post by:
Hi, I'm slowly discovering the world of JavaScript, so I'm not sure I'm attacking this problem in the right manner, thus if I'm in the wrong newsgroup, my apologies. What I'm trying to do is...
4
by: Damo | last post by:
I have a program, That retrieves a webpage , such as a search engine results page from the web, Then I need to go through the document and retrieve just the search results. The problem is I want to...
12
by: rshepard | last post by:
I'm a bit embarrassed to have to ask for help on this, but I'm not finding the solution in the docs I have here. Data are assembled for writing to a database table. A representative tuple looks...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.