473,387 Members | 1,771 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Parsing Web Sites

Hi I need to parse particular web sites to extract paritcular information on
a weekly basis. How is this done in PHP and is PHP better at doing this than
JSP?
Jul 17 '05 #1
5 4270
"Colum" <co********@hotmail.com> wrote in message
news:<Vi******************@news.indigo.ie>...

I need to parse particular web sites to extract paritcular
information on a weekly basis. How is this done in PHP
$remote = file_get_contents ('http://www.somesite.com/');

Now string $remote contains the entire index file for
http://www.somesite.com/. You can parse it, extract anything
you want from it, or do whatever you please with it.

As to the weekly basis, PHP itself has no scheduling tools.
You will have to use OS-level scheduling via cron on Unix
or Scheduler on Windows.
and is PHP better at doing this than JSP?


This is a very basic functionality, so it's highly unlikely
one scripting environment will be much better at it than
another...

Cheers,
NC
Jul 17 '05 #2
"Colum" <co********@hotmail.com> wrote in message news:<Vi******************@news.indigo.ie>...
Hi I need to parse particular web sites to extract paritcular information on
a weekly basis. How is this done in PHP and is PHP better at doing this than
JSP?


Get SNOOPY!

Charlie
Jul 17 '05 #3
Fox


Nikolai Chuvakhin wrote:

"Colum" <co********@hotmail.com> wrote in message
news:<Vi******************@news.indigo.ie>...

I need to parse particular web sites to extract paritcular
information on a weekly basis. How is this done in PHP
$remote = file_get_contents ('http://www.somesite.com/');


This is only available on php 4.3+ -- many hosts still only support
4.2.x or less... (like CIHost)

In case of php4.2-, use fsockopen and fgets

Now string $remote contains the entire index file for
http://www.somesite.com/. You can parse it, extract anything
you want from it, or do whatever you please with it.

As to the weekly basis, PHP itself has no scheduling tools.
You will have to use OS-level scheduling via cron on Unix
or Scheduler on Windows.
and is PHP better at doing this than JSP?


This is a very basic functionality, so it's highly unlikely
one scripting environment will be much better at it than
another...

Cheers,
NC

Jul 17 '05 #4
Fox wrote on Friday 05 December 2003 18:19:
$remote = file_get_contents ('http://www.somesite.com/');


This is only available on php 4.3+ -- many hosts still only support
4.2.x or less... (like CIHost)

In case of php4.2-, use fsockopen and fgets


If the host in question has the fopen wrappers enabled, you only need to use
file() or fopen() and fread(); socket functions would be an overkill for
that simple task.
Jul 17 '05 #5
Colum wrote:
Hi I need to parse particular web sites to extract paritcular information on
a weekly basis. How is this done in PHP and is PHP better at doing this than
JSP?


Unless you're a search engine, you're not gonna make yourself too
popular by harvesting information from other people's sites.
--
Bob
London, UK
echo Mail fefsensmrrjyaheeoceoq\! | tr "jefroq\!" "@obe.uk"
Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Gert Van den Eynde | last post by:
Hi all, Could you give me some pointers on how to parse a text input file in C++? Most will be config-file style input (keyword = data), but some maybe 'structures' like material{ name = n,...
8
by: Darius Fatakia | last post by:
Hello, I have a file that I have opened for reading and this file contains lines with several different types of constraint information. For example, here are a few lines: length(0) = 10...
6
by: Hans Kamp | last post by:
Is it possible to write a function like the following: string ReadURL(string URL) { .... } The purpose is that it reads the URL (determined by the parameter) and returns the string in which...
3
by: Sanjay Arora | last post by:
We are looking to select the language & toolset more suitable for a project that requires getting data from several web-sites in real- time....html parsing/scraping. It would require full emulation...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
5
by: mailtogops | last post by:
Hi All, I am involved in one project which tends to collect news information published on selected, known web sites inthe format of HTML, RSS, etc and sortlist them and create a bookmark on our...
7
by: John Nagle | last post by:
Is there something available that will parse the "netloc" field as returned by URLparse, including all the hard cases? The "netloc" field can potentially contain a port number and a numeric IP...
1
by: andrewwan1980 | last post by:
I need help in parsing unicode webpages & downloading jpeg image files via Perl scripts. I read http://www.cs.utk.edu/cs594ipm/perl/crawltut.html about using LWP or HTTP or get($url) functions &...
0
by: bruce | last post by:
Hi... I've got a couple of test apps that I use to parse/test different html webpages. However, I'm now looking at how to parse a given site/page that uses javascript calls to dynamically...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.