473,385 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

html xml extractor

Hi,

I am searching for a tool that extract information from a HTLM page
and format it in xml format

For instance for this page:
http://money.guardian.co.uk/pensions...993138,00.html

get an xml file
with a <title> with the title of the article
with a <text> with the text of the article
with a <auuthor> with the text of the article

Do you know such a tool?

Marco
Jul 20 '05 #1
1 2390
FC

"Marco" <ve*****@yahoo.it> wrote in message
news:da**************************@posting.google.c om...
Hi,

I am searching for a tool that extract information from a HTLM page
and format it in xml format

For instance for this page:
http://money.guardian.co.uk/pensions...993138,00.html

get an xml file
with a <title> with the title of the article
with a <text> with the text of the article
with a <auuthor> with the text of the article

Do you know such a tool?

Marco

There is a tool called HTML tidy, if I am not wrong, it converts from HTML
into XHTML.
Everything else is up to you.
Search using HTML tidy.

Bye,
Flavio
Jul 20 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: mustafa | last post by:
anyone know some good reliable html scraping (with python) tutorials. i have looked around and found a few. one uses urllib2 and beautifull soap modules for scraping and parsing...
4
by: nephish | last post by:
hey there, i have a small app that i am going to need to get information from a few tables on different websites. i have looked at urllib and httplib. the sites i need to get data from mostly...
14
by: WUV999U | last post by:
Hi I am fairly familiar in C but not much. I want to know how I can write a html parser in C that only parses for the image file in the html file and display or print all the images found in...
1
by: _BNC | last post by:
I've been looking for a couple weeks for a regex expression that will extract text from html in a form that will look like IE screen output. I'm sure one of you guys hid it somewhere as a joke, but...
0
by: Vijay | last post by:
h any know how the website extractor tool works thanks & regard Vijay
2
by: Martin Ho | last post by:
I've got this problem, where I need to extract an articles from many different news sources (webpages). I need to write some logic which would know how to extract the text only and not a garbage...
23
by: Randy | last post by:
Since these operators can't be member functions, and since friend functions can't be declared virtual, how do I make my inserters and extractors polymorphic? --Randy Yates
3
by: rahman | last post by:
I have few hundred HTML pages. I need to extract portion of each HTML page into a text/database/HTML files format. You can imagine it is very tedious to do one by one. Is there any automatic...
3
by: Jim S | last post by:
I have a need to read the contents of an html table on a remote web page into a variable. I guess this is called screen scraping but not sure. I'm not sure where to start or what the best...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.