473,399 Members | 3,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Downloading and parsing web-stuff

Very basic:

What is the easiest way in php to download the source code (HTML etc.)
of a given URL (say, http://www.google.com) and parse this code for
certain patterns?

I guess my question can be split in two:

1) How do I download a webpage (into a string or whatever)?

2) How can I do string manupulation, regexp matching, information
extraction etc. on the downloaded information?

/David

Jul 17 '05 #1
2 1297

David Rasmussen wrote:
I guess my question can be split in two:

1) How do I download a webpage (into a string or whatever)?
$string = file_get_contents('http://some.url/blah');
2) How can I do string manupulation, regexp matching, information
extraction etc. on the downloaded information?


now look at the docs for preg_match or ereg
I prefer preg_match

if ( preg_match('|<title>(.*?)</title>|',$string,$matches) )
{
print_r($matches);
}

Jul 17 '05 #2
Treat a full URL as a file.

$contents = implode( file("http://www.google.com/", ''\n") );

Then go to www.php.net/preg_match/ to read up on PCRE (Perl compatible
regular expressions). See also ereg_* functions.

HTH.

-Mike

--
Melt away the Cellulite with Cellulean!
http://www.MeltAwayCellulite.com/
"David Rasmussen" <da*************@gmx.net> wrote in message
news:42*********************@dtext02.news.tele.dk. ..
Very basic:

What is the easiest way in php to download the source code (HTML etc.)
of a given URL (say, http://www.google.com) and parse this code for
certain patterns?

I guess my question can be split in two:

1) How do I download a webpage (into a string or whatever)?

2) How can I do string manupulation, regexp matching, information
extraction etc. on the downloaded information?

/David

Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Luke StClair | last post by:
Only marginally belonging in this newsgroup... but oh well. I've just started writing in python, and I want to make the files available on the web. So I did the standard <a...
0
by: TJ | last post by:
Hi, I've written code web-based uploading and downloading. Here is some code for it. For saving file into MS-SQL database, SaveFileIntoDB(HttpPostedFile file) { int fileLength =...
6
by: Shawn | last post by:
Hi. How can I download a file and store it on the web server. I have a complete URL to the file, but I never know what kind of file it is. It can be pdf, jpg, tif, doc, xls etc. Thanks, Shawn
4
by: Joe | last post by:
I'm hosting my web service on a Windows 2003 box which is remotely located. When trying to add a web reference to a C# project I get an error message 'There was an error downloading...
4
by: Richard L Rosenheim | last post by:
I know that I can download a file from a web server by using the WebClient.DownloadFile method. But, does anyone know of an example of downloading a file from a web server with the ability to...
23
by: Doug van Vianen | last post by:
Hi, Is there some way in JavaScript to stop the downloading of pictures from a web page? Thank you. Doug van Vianen
2
by: Tomas Martinez | last post by:
Hi there! I'm trying to download a file in my asp.net web, but when downloading it from a Firefox browser, instead of downloading the example.exe file, it's downloading example.exe.htm. My code...
1
by: Lespaul36 | last post by:
I am trying to make a downloader using sockets to download pictures from a website I have to log in to the website, so I am adding a line for authentication "Authentication Basic...
4
by: Nik0001 | last post by:
Hello everyone! I have the following problem I need to download several HTML pages and get meta-tags out of the code. I decided it would be better to download only the meta-tags rather than...
1
by: shahidrasul | last post by:
i want to download a file which user select from gridview, downloading is completing without problem but after download i want to refresh my page because i do some changes in db . but when...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.