473,385 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Grabbing data in parallel connections?

Joe
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks
Jul 17 '05 #1
5 2570
"Joe114" wrote:
I need to connect to 10 web sites to grab content from them. I would like to connect to each site simultaneously so that I can obtain the data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site that will help me accomplish this?
Thanks


I don’t know if it is possible in PHP. What you can do is fork to unix
and use lynx or wget, and have them write the data to different files.
Hope others have better solutions.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-Grabbing...ict142156.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=475432
Jul 17 '05 #2

"Joe" <jk*******@rogers.com> wrote in message
news:56**************************@posting.google.c om...
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks


Just use good'o fopen().

See http://www.php.net/stream_set_blocking/ and
http://www.php.net/stream_select/.
Jul 17 '05 #3
Hi Joe,

I've done something similar a few years ago, using nonblocking sockets.
You could create your 10 connections and then read from them in a loop.
Have a look at the PHP socket documentation. But using "raw" sockets, you
will have to implement (part of) the HTTP yourself. But it might be that
today there are new functions in PHP -like fopen- that support HTTP and can
be used in nonblocking mode. I am not really up-to-date at the moment.
If the docs are of no help, let me know and I look up the code and send it
to you.
Cheers
Frank

P.S.
I just remembered that some time ago someone else posted nearly the same
questions and I send nearly the same reply:)
Or is it just a deja vu?
Joe wrote:
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks


Jul 17 '05 #4
Frank <f.******@web.de> wrote in message news:<2p************@uni-berlin.de>...
Hi Joe,

I've done something similar a few years ago, using nonblocking sockets.
You could create your 10 connections and then read from them in a loop.
Have a look at the PHP socket documentation. But using "raw" sockets, you
will have to implement (part of) the HTTP yourself. But it might be that
today there are new functions in PHP -like fopen- that support HTTP and can
be used in nonblocking mode. I am not really up-to-date at the moment.
If the docs are of no help, let me know and I look up the code and send it
to you.


If you don't mind to share your thoughts, you can post a piece/part
of your code so that it helps others.

--
| Just another PHP saint |
Email: rrjanbiah-at-Y!com
Jul 17 '05 #5
R. Rajesh Jeba Anbiah wrote:

If you don't mind to share your thoughts, you can post a piece/part
of your code so that it helps others.


Hi,

sure, I found it between some old pizza boxes;)
I have not tested it again, so I don't know if it will work with a more
recent PHP version. Thus it might not work, it might have bugs, but it
provides a starting point.

Have fun
Frank

When I recall correctly, you can use it as follows:

$this->Connection=new HTTPConnection(new URL("http:://www.whatever.com"),
"", 30, NONBLOCKING);
if (!$this->Connection->makeGETRequest())
die($this->Connection->getErrorMessage());

while($this->Connection->readBlock())
{}

print $this->Connection->getContents();

file http_class.inc.php

<?php

include_once "class.url.inc.php";

//define constants for use with choice of blocking/nonblocking mode
/** @global string For use with HTTPConnection constructor, sets connection
blocking */
define ("BLOCKING", 1);

/** @global string For use with HTTPConnection constructor, sets connection
nonblocking */
define ("NONBLOCKING", 0);

/** Objects of this class provide a way to access http-resources.
* Currently only the GET-Method is supported
*/
class HTTPConnection
{
//-----------------
//-- public methods

/** Constructor creates an unconnected object and initializes
member-variables
* @acces public
* @param &$url a reference to an url-object
* @param $sendhd string containing a valid http-header to send along,
default value is ""
* @param $timeout numeric timeout in seconds for the connect call of
fsockopen, default is 1s
* @param $blocking boolean flag, if set to true (default) the socket
will be opened in blocking mode
*/
function HTTPConnection(&$url, $sendhd="", $timeout=1,
$blockmode=BLOCKING)
{
$this->Status=false;
$this->URL = $url;
$this->IP = false;
$this->Socket=false;
$this->Timeout=$timeout;
$this->Buffer=false;
$this->Header=false;
$this->Contents=false;
$this->SendHeader=$sendhd;
$this->Blocking=$blockmode;
}
/** This method opens a socket to a http-resource and sends a GET-Request
* If the connection is opened in blocking mode, the respond of the
resource
* is fetched, parsed and stored in the internal buffers
* When using an unblocking connection, the client has to call readBlock
to fetch
* the respond
*
* @access public
* @returns true if the request was successfull, false otherwise
* @see readBlock()
*/
function makeGETRequest()
{

//make dns-lookup and connect socket
if ($this->findIP())
if($this->openSocket())
{
$path=$this->URL->getPath();
if (strlen($qs=$this->URL->getQueryString()))
$path.="?$qs";
$str="GET $path HTTP/1.0\r\nHost:".$this->URL->getHost()."\r\n"
$this->SendHeader."\r\n\r\n";

//write http-request
if (fputs($this->Socket, $str) != strlen($str))
{
$this->ErrorMessage="Unknown error writing request to socket";
return false;
}
//clear buffer
$this->Buffer="";

//when using blocking mode fetch, parse and store respond
if ($this->Blocking)
{
while (!feof($this->Socket))
$this->Buffer.=fread($this->Socket, 1024);

return $this->parseBuffer();
}

return true;
}
return false;
}

/** Returns the header of the repond from the last http-request as an
associative array
*
* @access public
* @returns array
*/
function getHeader()
{
return $this->Header;
}

/** Returns the body of the repond from the last http-request
*
* @access public
* @returns string
*/
function getContents()
{
return $this->Contents;
}

/** Returns the textual description of the last error, may be an empty
string, if no error occured so far
*
* @access public
* @returns string
*/
function getErrorMessage()
{
return $this->ErrorMessage;
}

/** Reads $blocksize bytes from socket and appends them to internal buffer
*
* @access public
* @returns false if an error occured, false otherwise
*/
function readBlock($blocksize=1024)
{
$this->Buffer.=fread($this->Socket, $blocksize);
if (!feof($this->Socket))
return true;

$this->parseBuffer();
return false;
}

/** Closes the socket
*
* @access public
* @returns true if the socket could be closed, false otherwise
*/
function close()
{
return fclose($this->Socket);
}

//-----------------
//-- private methods

/** Look up IP of host and store it
*
* @access private
* @returns IP when dns-lookup was succesfull, false otherwise
*/
function findIP()
{
$h=$this->URL->getHost();
if (ereg("[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}", $h))
return $this->IP=$h;

if ( ($ip=gethostbyname($h)) == $h )
{
$this->ErrorMessage="Could not find DNS-Entry for host: $h";
return false;
}

return $this->IP=$ip;
}

/** Opens a socket
*
* @access private
* @returns true if the socket could be opened, false otherwise
*/
function openSocket()
{
if ($this->Socket=fsockopen($this->IP, $this->URL->getPort(), &$errno,
&$errstr, $this->Timeout))
{
socket_set_blocking($this->Socket, $this->Blocking);
return true;
}

if (!strlen($errstr))
$this->ErrorMessage="Unknown error opening Socket";
else
$this->ErrorMessage="Error $errno opening Socket: $errstr";

return false;
}

/** Parse http-respond as stored in buffer into header and body
*
* @access private
* @returns true
*/
function parseBuffer()
{
$this->Header="";
if ( ($p=strpos($this->Buffer, "\r\n\r\n"))!==false )
$this->parseHeader(substr($this->Buffer, 0, $p));
else
$p=-4;
$this->Contents=substr($this->Buffer, $p+4);
return true;
}

/** Parse http-header into associative array
* The array keys are the names of the headerfields in uppercase letters
*
* @access priavate
* @param $hd string holding the http-header
* @returns void
*/
function parseHeader($hd)
{
$lines=explode("\n", $hd);
$n=count($lines);

for ($i=0; $i<$n; ++$i)
if (strlen(trim($lines[$i])))
{
$parts=split("^[a-zA-Z0-9_-]+:", $lines[$i]);
if (count($parts)>1)
{
$key=substr($lines[$i], 0, strpos($lines[$i], ":"));
$this->Header[strtoupper(trim($key))]=trim($parts[1]);
}
}
}
};

?>

file class.url.inc.php

<?php
function URL($urlstr)
{
$u=trim($urlstr);
if (strlen($u))
$this->parseURL($u);
}

function parseURL($urlstr)
{
$this->IsValid=false;
//--- ist ein Protokol angegeben?
$parts=explode(":/", $urlstr);
if (count($parts)<2)
$urlstr="http://$urlstr";

$parts=@parse_url($urlstr);

$this->Protocol = strtolower($parts["scheme"]);
$this->Host = strtolower($parts["host"]);
$this->Path = $parts["path"];
$this->Port = $parts["port"];
$this->Query = $parts["query"];
$this->Fragment = $parts["fragment"];
$this->User = $parts["user"];
$this->Password = $parts["pass"];
if (!strlen($this->Protocol))
$this->Protocol="http";

if (!strlen($this->Port))
$this->Port="80";

if (!strlen($this->Path))
$this->Path="/";

if ($this->Host)
$this->IsValid=true;
else return false;

$this->buildCanonical();
return true;
}
function getHost() { return $this->Host; }

function getPath(){ return $this->Path; }
function getPort() { return $this->Port; }
function getQueryString(){ return $this->Query;}
function getAsString()
{
return $this->Canonical;
}

function buildCanonical()
{
$this->Canonical= strtolower($this->Protocol).
"://".
strtolower($this->Host);

if ($this->Port!="80")
$this->Canonical.=":".$this->Port;

$this->Canonical.=$this->Path;

if (strlen($this->Query))
$this->Canonical.="?".$this->Query;
if (strlen($this->Fragment))
$this->Canonical.="#".$this->Fragment;
return true;
}

function isValid()
{
return $this->IsValid;
}

var $Canonical=false;
var $IsValid=false;
var $Host="";
var $Port=80;
var $Protocol="http";
var $Path="/";
}
?>
Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Frnak McKenney | last post by:
Back when computer dinosaurs roamed the earth and the precursors to today's Internet were tiny flocks of TDMs living symbiotically with the silicon giants, tracking access to data processing...
2
by: opt_inf_env | last post by:
Hello, On my server users have access to MySQL database (through PHP). What I would like to avoid is parallel execution of commands. I mean, if one user run some sequence of command the sequence...
12
by: Peter Eisentraut | last post by:
Is there any practical limit on the number of parallel connections that a PostgreSQL server can service? We're in the process of setting up a system that will require up to 10000 connections open...
1
by: evanburen | last post by:
I'm passing the name of a div and the name of checkbox to this function which either hides or displays the div. My problem is this line // var the_box =...
17
by: Alan Silver | last post by:
Hello, I have a generic method in a utility class that grabs an sqldatareader and returns it. Due to the fact that (AFAIK), you can't close the database connection before you've read the data,...
1
by: AM | last post by:
What I am trying to do is write raw data to a USB to parallel adapter to control an external device (as I dont have a parallel port) using VC++.net or C# The adapter is not a true parallel port...
17
by: Assaf Lavie | last post by:
I'm trying to run multiple xmlhttprequests in parallel in response to a button click. I want to launch N asynchronous requests and handle the responses as they come. The problem is that no request...
6
by: Abandoned | last post by:
Hi.. I use the threading module for the fast operation. But i have some problems.. This is my code sample: ================= conn =...
4
by: Soren | last post by:
Hi, I want to control some motors using the parallel port.. however, my laptop does not have any parallel ports (very few do). What I do have is a USB->Parallel converter... I thought about...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.