473,388 Members | 1,198 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

extract from html

hi,
how can i extract the number between text1 and text2 in input.html
only the first time it occurs ignoring the rest?
preferably input.html would be a URL that stops downloading once a
match has occured, that would save a lot of bandwidth..
i guess html::parser would provide an option to work with a file while
it's downloading (?)

example
----

input.html:

bla..
text1 555 text2
bla
bla
text1 6000 text2
bla
EOF
output.txt
555
thanks for your help,
peter
Jul 19 '05 #1
2 2055

"Lydia Shawn" <ap******@hotmail.com> schrieb im Newsbeitrag
news:12**************************@posting.google.c om...
hi,
how can i extract the number between text1 and text2 in input.html
only the first time it occurs ignoring the rest?
This problem I would solve by using a Hash. You can just put a unique key
into it, while finding the same term
it will be overwritten, or you can ask the hash if the term already exist

# $term is taken from your text - inbeetween text1 / text2
if( exists $myHash{$term})
{
# ignore
}else
{
$myHash{$term} = $value;
}

The Rest of your question : I donīt know ... sorry
thanks for your help,
peter


no prob...but what is your real name ?
"Lydia Shawn" or Peter :-)

HTH
greets Michael
Jul 19 '05 #2

"Lydia Shawn" <ap******@hotmail.com> wrote in message
news:12**************************@posting.google.c om...
hi,
how can i extract the number between text1 and text2 in input.html
only the first time it occurs ignoring the rest?
preferably input.html would be a URL that stops downloading once a
match has occured, that would save a lot of bandwidth..
i guess html::parser would provide an option to work with a file while
it's downloading (?)


Take a look at the lwp-download script (in your perl bin directory)
as an example of a program that incrementally downloads a URL.
You can then search the contents for your text1 and text2 and stop if found.

The script uses LWP::UserAgent to do the download.

--
brian
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Phong Ho | last post by:
Hi everyone, I try to write a simple web crawler. It has to do the following: 1) Open an URL and retrieve a HTML file. 2) Extract news headlines from the HTML file 3) Put the headlines into a...
1
by: Tim Smith | last post by:
I am looking to extract form element values from html, more generally I have a substring that identifies the beginning of a value and a string that identifies the end of value and I need to extract...
10
by: mark4 | last post by:
Hello, Are there any utilities to help me extract Content from HTML ? I'd like to store this data in a database. The HTML consists of about 10,000 files with a total size of about 160 Mb....
1
by: Ori | last post by:
Hi, I have a HTML text which I need to parse in order to extract data from it. My html contain a table contains few rows and two columns. I want to extract the data from the 2nd column in...
0
by: Vjay77 | last post by:
I posted this question, but I pressed 'post' and it disappeared. So once again: Problem: I need to go to lets say www.site.com/page.html Imagine that this html code is 6 mb long. I need to...
3
by: rahman | last post by:
I have few hundred HTML pages. I need to extract portion of each HTML page into a text/database/HTML files format. You can imagine it is very tedious to do one by one. Is there any automatic...
0
by: manuel.reil | last post by:
Hello, currently i am developing a very small cms using python and cheetah. very early i have noticed that i was lacking the method to extract/recover the contents (html,text) from the html that...
1
by: steveyjg | last post by:
I want to extract the following data from a retrieved html file and store the information as strings. 'get the text of "title" <h1 id="test_title">title</h1> 'get the contents of the value...
9
by: flit | last post by:
Hello All, Using poplib in python I can extract only the headers using the .top, there is a way to extract only the message text without the headers? like remove the fields below: "...
1
by: rcamarda | last post by:
I'd need to have a function that allows me to extract 'fields' from within the string I.E. (kinda pseudo code) declare @foo as varchar(100) set @foo = "Robert*Camarda*123 Main Street" select...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.