473,800 Members | 3,052 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Grabbing data in parallel connections?

Joe
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks
Jul 17 '05 #1
5 2598
"Joe114" wrote:
I need to connect to 10 web sites to grab content from them. I would like to connect to each site simultaneously so that I can obtain the data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site that will help me accomplish this?
Thanks


I don’t know if it is possible in PHP. What you can do is fork to unix
and use lynx or wget, and have them write the data to different files.
Hope others have better solutions.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-Grabbing...ict142156.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=475432
Jul 17 '05 #2

"Joe" <jk*******@roge rs.com> wrote in message
news:56******** *************** ***@posting.goo gle.com...
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks


Just use good'o fopen().

See http://www.php.net/stream_set_blocking/ and
http://www.php.net/stream_select/.
Jul 17 '05 #3
Hi Joe,

I've done something similar a few years ago, using nonblocking sockets.
You could create your 10 connections and then read from them in a loop.
Have a look at the PHP socket documentation. But using "raw" sockets, you
will have to implement (part of) the HTTP yourself. But it might be that
today there are new functions in PHP -like fopen- that support HTTP and can
be used in nonblocking mode. I am not really up-to-date at the moment.
If the docs are of no help, let me know and I look up the code and send it
to you.
Cheers
Frank

P.S.
I just remembered that some time ago someone else posted nearly the same
questions and I send nearly the same reply:)
Or is it just a deja vu?
Joe wrote:
I need to connect to 10 web sites to grab content from them. I would
like to connect to each site simultaneously so that I can obtain the
data as fast as possible.

I am familar doing this with perl by using parallel sockets or the
module LWP-Parallel. So what would be the best method to do this in
php? Sockets, forks?

Also, if possible, could someone provide me with a good reference site
that will help me accomplish this?
Thanks


Jul 17 '05 #4
Frank <f.******@web.d e> wrote in message news:<2p******* *****@uni-berlin.de>...
Hi Joe,

I've done something similar a few years ago, using nonblocking sockets.
You could create your 10 connections and then read from them in a loop.
Have a look at the PHP socket documentation. But using "raw" sockets, you
will have to implement (part of) the HTTP yourself. But it might be that
today there are new functions in PHP -like fopen- that support HTTP and can
be used in nonblocking mode. I am not really up-to-date at the moment.
If the docs are of no help, let me know and I look up the code and send it
to you.


If you don't mind to share your thoughts, you can post a piece/part
of your code so that it helps others.

--
| Just another PHP saint |
Email: rrjanbiah-at-Y!com
Jul 17 '05 #5
R. Rajesh Jeba Anbiah wrote:

If you don't mind to share your thoughts, you can post a piece/part
of your code so that it helps others.


Hi,

sure, I found it between some old pizza boxes;)
I have not tested it again, so I don't know if it will work with a more
recent PHP version. Thus it might not work, it might have bugs, but it
provides a starting point.

Have fun
Frank

When I recall correctly, you can use it as follows:

$this->Connection=n ew HTTPConnection( new URL("http:://www.whatever.co m"),
"", 30, NONBLOCKING);
if (!$this->Connection->makeGETRequest ())
die($this->Connection->getErrorMessag e());

while($this->Connection->readBlock())
{}

print $this->Connection->getContents( );

file http_class.inc. php

<?php

include_once "class.url.inc. php";

//define constants for use with choice of blocking/nonblocking mode
/** @global string For use with HTTPConnection constructor, sets connection
blocking */
define ("BLOCKING", 1);

/** @global string For use with HTTPConnection constructor, sets connection
nonblocking */
define ("NONBLOCKIN G", 0);

/** Objects of this class provide a way to access http-resources.
* Currently only the GET-Method is supported
*/
class HTTPConnection
{
//-----------------
//-- public methods

/** Constructor creates an unconnected object and initializes
member-variables
* @acces public
* @param &$url a reference to an url-object
* @param $sendhd string containing a valid http-header to send along,
default value is ""
* @param $timeout numeric timeout in seconds for the connect call of
fsockopen, default is 1s
* @param $blocking boolean flag, if set to true (default) the socket
will be opened in blocking mode
*/
function HTTPConnection( &$url, $sendhd="", $timeout=1,
$blockmode=BLOC KING)
{
$this->Status=false ;
$this->URL = $url;
$this->IP = false;
$this->Socket=false ;
$this->Timeout=$timeo ut;
$this->Buffer=false ;
$this->Header=false ;
$this->Contents=false ;
$this->SendHeader=$se ndhd;
$this->Blocking=$bloc kmode;
}
/** This method opens a socket to a http-resource and sends a GET-Request
* If the connection is opened in blocking mode, the respond of the
resource
* is fetched, parsed and stored in the internal buffers
* When using an unblocking connection, the client has to call readBlock
to fetch
* the respond
*
* @access public
* @returns true if the request was successfull, false otherwise
* @see readBlock()
*/
function makeGETRequest( )
{

//make dns-lookup and connect socket
if ($this->findIP())
if($this->openSocket() )
{
$path=$this->URL->getPath();
if (strlen($qs=$th is->URL->getQueryString ()))
$path.="?$qs";
$str="GET $path HTTP/1.0\r\nHost:".$ this->URL->getHost()."\r\ n"
$this->SendHeader."\r \n\r\n";

//write http-request
if (fputs($this->Socket, $str) != strlen($str))
{
$this->ErrorMessage=" Unknown error writing request to socket";
return false;
}
//clear buffer
$this->Buffer="";

//when using blocking mode fetch, parse and store respond
if ($this->Blocking)
{
while (!feof($this->Socket))
$this->Buffer.=fread( $this->Socket, 1024);

return $this->parseBuffer( );
}

return true;
}
return false;
}

/** Returns the header of the repond from the last http-request as an
associative array
*
* @access public
* @returns array
*/
function getHeader()
{
return $this->Header;
}

/** Returns the body of the repond from the last http-request
*
* @access public
* @returns string
*/
function getContents()
{
return $this->Contents;
}

/** Returns the textual description of the last error, may be an empty
string, if no error occured so far
*
* @access public
* @returns string
*/
function getErrorMessage ()
{
return $this->ErrorMessage ;
}

/** Reads $blocksize bytes from socket and appends them to internal buffer
*
* @access public
* @returns false if an error occured, false otherwise
*/
function readBlock($bloc ksize=1024)
{
$this->Buffer.=fread( $this->Socket, $blocksize);
if (!feof($this->Socket))
return true;

$this->parseBuffer( );
return false;
}

/** Closes the socket
*
* @access public
* @returns true if the socket could be closed, false otherwise
*/
function close()
{
return fclose($this->Socket);
}

//-----------------
//-- private methods

/** Look up IP of host and store it
*
* @access private
* @returns IP when dns-lookup was succesfull, false otherwise
*/
function findIP()
{
$h=$this->URL->getHost();
if (ereg("[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}", $h))
return $this->IP=$h;

if ( ($ip=gethostbyn ame($h)) == $h )
{
$this->ErrorMessage=" Could not find DNS-Entry for host: $h";
return false;
}

return $this->IP=$ip;
}

/** Opens a socket
*
* @access private
* @returns true if the socket could be opened, false otherwise
*/
function openSocket()
{
if ($this->Socket=fsockop en($this->IP, $this->URL->getPort(), &$errno,
&$errstr, $this->Timeout))
{
socket_set_bloc king($this->Socket, $this->Blocking);
return true;
}

if (!strlen($errst r))
$this->ErrorMessage=" Unknown error opening Socket";
else
$this->ErrorMessage=" Error $errno opening Socket: $errstr";

return false;
}

/** Parse http-respond as stored in buffer into header and body
*
* @access private
* @returns true
*/
function parseBuffer()
{
$this->Header="";
if ( ($p=strpos($thi s->Buffer, "\r\n\r\n"))!== false )
$this->parseHeader(su bstr($this->Buffer, 0, $p));
else
$p=-4;
$this->Contents=subst r($this->Buffer, $p+4);
return true;
}

/** Parse http-header into associative array
* The array keys are the names of the headerfields in uppercase letters
*
* @access priavate
* @param $hd string holding the http-header
* @returns void
*/
function parseHeader($hd )
{
$lines=explode( "\n", $hd);
$n=count($lines );

for ($i=0; $i<$n; ++$i)
if (strlen(trim($l ines[$i])))
{
$parts=split("^[a-zA-Z0-9_-]+:", $lines[$i]);
if (count($parts)> 1)
{
$key=substr($li nes[$i], 0, strpos($lines[$i], ":"));
$this->Header[strtoupper(trim ($key))]=trim($parts[1]);
}
}
}
};

?>

file class.url.inc.p hp

<?php
function URL($urlstr)
{
$u=trim($urlstr );
if (strlen($u))
$this->parseURL($u) ;
}

function parseURL($urlst r)
{
$this->IsValid=fals e;
//--- ist ein Protokol angegeben?
$parts=explode( ":/", $urlstr);
if (count($parts)< 2)
$urlstr="http://$urlstr";

$parts=@parse_u rl($urlstr);

$this->Protocol = strtolower($par ts["scheme"]);
$this->Host = strtolower($par ts["host"]);
$this->Path = $parts["path"];
$this->Port = $parts["port"];
$this->Query = $parts["query"];
$this->Fragment = $parts["fragment"];
$this->User = $parts["user"];
$this->Password = $parts["pass"];
if (!strlen($this->Protocol))
$this->Protocol="http ";

if (!strlen($this->Port))
$this->Port="80";

if (!strlen($this->Path))
$this->Path="/";

if ($this->Host)
$this->IsValid=true ;
else return false;

$this->buildCanonical ();
return true;
}
function getHost() { return $this->Host; }

function getPath(){ return $this->Path; }
function getPort() { return $this->Port; }
function getQueryString( ){ return $this->Query;}
function getAsString()
{
return $this->Canonical;
}

function buildCanonical( )
{
$this->Canonical= strtolower($thi s->Protocol).
"://".
strtolower($thi s->Host);

if ($this->Port!="80")
$this->Canonical.=":" .$this->Port;

$this->Canonical.=$th is->Path;

if (strlen($this->Query))
$this->Canonical.="?" .$this->Query;
if (strlen($this->Fragment))
$this->Canonical.="#" .$this->Fragment;
return true;
}

function isValid()
{
return $this->IsValid;
}

var $Canonical=fals e;
var $IsValid=false;
var $Host="";
var $Port=80;
var $Protocol="http ";
var $Path="/";
}
?>
Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2959
by: Frnak McKenney | last post by:
Back when computer dinosaurs roamed the earth and the precursors to today's Internet were tiny flocks of TDMs living symbiotically with the silicon giants, tracking access to data processing resources was much simpler: you logged in with a userID and password, and when you were done you ended your session by logging out (or occasionally by being disconnected). Connection time was easy to measure, and it made sense to both the customer...
2
1647
by: opt_inf_env | last post by:
Hello, On my server users have access to MySQL database (through PHP). What I would like to avoid is parallel execution of commands. I mean, if one user run some sequence of command the sequence of commands run by another user should stay in query till the first sequence is finished. Is it possible to reach this only with the usage of "mysql_connect" in the beginning of sequence of command and "mysql_close" in the end. In another words,...
12
3984
by: Peter Eisentraut | last post by:
Is there any practical limit on the number of parallel connections that a PostgreSQL server can service? We're in the process of setting up a system that will require up to 10000 connections open in parallel. The query load is not the problem, but we're wondering about the number of connections. Does anyone have experience with these kinds of numbers? ---------------------------(end of broadcast)--------------------------- TIP 1:...
1
1315
by: evanburen | last post by:
I'm passing the name of a div and the name of checkbox to this function which either hides or displays the div. My problem is this line // var the_box = window.document.frmCheckboxes.chkCompanyBoard; I want to assign the name of the checked checkbox to var the_box rather than just hard-code chkCompanyBoard function hideLayer2(whichLayer,the_box)
17
2385
by: Alan Silver | last post by:
Hello, I have a generic method in a utility class that grabs an sqldatareader and returns it. Due to the fact that (AFAIK), you can't close the database connection before you've read the data, this method doesn't close it, it just returns the datareader. The calling code uses the datareader and then just lets it drop out of scope, to be picked up by the garbage collector. Is this a problem? A friend of mine suggested to me that not...
1
7280
by: AM | last post by:
What I am trying to do is write raw data to a USB to parallel adapter to control an external device (as I dont have a parallel port) using VC++.net or C# The adapter is not a true parallel port and is hence treated as a USB device. How would I be able to write raw data or ASCII data to this adapter? Can I use the WritePort...
17
4304
by: Assaf Lavie | last post by:
I'm trying to run multiple xmlhttprequests in parallel in response to a button click. I want to launch N asynchronous requests and handle the responses as they come. The problem is that no request is actually transmitted until the previous one finishes, as though the browser limits the number of actual connections to 1 and puts my requests in a queue. I tried this on FF and IE. Here's the code: function newReq() {
6
12163
by: Abandoned | last post by:
Hi.. I use the threading module for the fast operation. But i have some problems.. This is my code sample: ================= conn = psycopg2.connect(user='postgres',password='postgres',database='postgres') cursor = conn.cursor() class paralel(Thread): def __init__ (self, veriler, sayii):
4
15363
by: Soren | last post by:
Hi, I want to control some motors using the parallel port.. however, my laptop does not have any parallel ports (very few do). What I do have is a USB->Parallel converter... I thought about using PyParallel, but the USB->Parallel converter doesn't actually map to the LPT port .. and PyParallel only looks for LPT ports? Has anyone tried doing this? What are my options for controlling parallel connections on a laptop with no parallel...
0
9694
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9553
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10509
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10039
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9095
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7584
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6824
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5477
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3765
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.