473,322 Members | 1,806 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Unstructured HTML extraction

Hi,

I'm interested in a program that extracts the structure of unstructured
HTML documents. The program should be able to make good estimates about
different font styles used to represent headings, for example, some may
use <font size = 24> for headings and some may use <h1>, in the end,
both should output the same structure. The output can be in XML or
other formats. Manual driving should remain minimal. Does anyone know
of such program (preferably open-source)?

Cheers,
Michael

Jul 17 '05 #1
0 1295

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Kangol kangoll | last post by:
hi, i've been trying to figure out how to make visual basic start extracting text at a certain line to when i tell it to stop. I am making trying to make a program where it extracts text from a...
2
by: A. Novruzi | last post by:
Hi, I am looking for a free 3D visualization software, running in Linux, and able to visualize 3D functions in nonstructured mesh (basically, my mesh is a set of thetraheders used for some 3D FE...
0
by: hakhan | last post by:
Hello all, How can I convert unstructured flat files into structured XML? Are there any scientific articles and tools that handles this subject? Bye, hakhan
4
by: dayzman | last post by:
Hi, I'm interested in a program that extracts the structure of unstructured HTML documents. The program should be able to make good estimates about different font styles used to represent...
2
by: Jason Huang | last post by:
Hi, Would someone show me how to do the data extraction to Excel in ASP.Net using C# web form? I am not familiar with VB, so I am asking someone to help me out! Any help will be appreciated. ...
2
by: rick | last post by:
Greetings, I am trying to generate an html table that looks through the following xml source and lists links to all of the files (resource-file) and finds the resource-forms that match those...
1
by: James Lehman | last post by:
Hello. I want to write a program that reads AutoCAD shape (font) files. They are written with the convention that hexadecimal values have a leading zero and decimal values do not. All numbers...
3
by: dec01louis | last post by:
Hi all, actually i'm now doing something on license plate recognition system for my project. The first step would be the license plate extraction algorithm which means it is needed to extract a...
16
by: EM.Bateman | last post by:
Working on Visual Studio .Net I've implemented a class: #ifndef CONTRIBUTOR_H #define CONTRIBUTOR_H enum Gender {male=1, female, unk}; #include <iostream> #include <iomanip> #include...
1
by: rajc_144 | last post by:
i am doing some research where i need to parse some data from SEC web site. the data is not in xml format and sort of unstructured. can someone recommand me a way to parse this data. i need to...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.