473,657 Members | 2,496 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

HTTP-POST simultaneous requests

Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

regards, Mark
Oct 4 '08 #1
21 2783
mark wrote:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

regards, Mark
Sorry, PHP doesn't do multithreading very well. Probably the best you
can do is start multiple background processes to do the work then
communicate via a database, shared memory, etc.

As for displaying the contents immediately - again, not guaranteed
possible. You can flush() the buffers in PHP - but that doesn't
guarantee the data will be sent by the webserver to the client
immediately, nor does it guarantee the client will display the data
before it's received.

Sounds like java might be a better fit.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Oct 4 '08 #2
mark wrote:
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
Either:
- Run in console and use fork().
- Use raw HTTP and some socket_select() magic.
- curl_multi_exec ().
- Rely on javascript, ajax techniques, and make a web browser launch 5
queries yo your web server, each of one scraping a site.
- Use ignore_user_abo rt() and a mix of raw HTTP with sockets to blindly
launch PHP threads. This one's quite tricky to pull out.

There may be more ways to do this, but unless you know what a critical
section is, please stay away from concurrent (AKA multithread) programming.

Besides, you want IPC to get the results as they appear - to make your life
easier, you should stick with either curl_multi queries or rely on
javascript to individually fetch results as they are ready.
--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

Now listening to: Deep Forest - Music.Detected_ (2002) - [4] Computer
Machine (5:12) (99.061996%)
Oct 4 '08 #3
Hello,

on 10/04/2008 05:09 PM mark said the following:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This class can do exactly what you describe:

http://www.phpclasses.org/thread

This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:

http://www.phpclasses.org/phpthreader
--

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Oct 5 '08 #4
Manuel Lemos wrote:
Hello,

on 10/04/2008 05:09 PM mark said the following:
>Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

This class can do exactly what you describe:

http://www.phpclasses.org/thread

This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:

http://www.phpclasses.org/phpthreader

Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Oct 5 '08 #5
On 4 Oct, 21:09, mark <mkazmier...@gm ail.comwrote:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
That's exactly what curl_multi_* does.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This is not a trivial bit of coding. It's not impossible but since you
seem to be relying on cut-and-paste coding, do you think you're
overstretching your abilities?

C.
Oct 6 '08 #6
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
Manuel Lemos wrote:
<snip>
>
This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader

Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Oct 9 '08 #7
R. Rajesh Jeba Anbiah wrote:
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
>Manuel Lemos wrote:
<snip>
>>This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

What's your solution? Do you have better approach?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Yes, curl_multi_exec (), as Iván indicated.

Manuel is just a spammer - virtually every answer he posts refers to
something on his site. And he doesn't even indicate it's his own site
when he spams it.

Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts. I've seen relatively new PHP programmers
do better.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Oct 9 '08 #8
..oO(Jerry Stuckle)
>R. Rajesh Jeba Anbiah wrote:
>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
>>>
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

What's your solution? Do you have better approach?

Yes, curl_multi_exec (), as Iván indicated.

Manuel is just a spammer
Wrong.
>virtually every answer he posts refers to
something on his site.
Nothing wrong with that. I would also point to my own classes to solve a
given problem if they would be freely available.
>And he doesn't even indicate it's his own site
when he spams it.
Not necessary.

It would be spam if it would be totally OT, but he posts ready-to-use
solutions to PHP problems. It doesn't matter if these solutions are his
own or not. Even if they would be commercial, it wouldn't be spam in the
given context.
>Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts.
Some. But surely not all. They might not fit your coding standards, but
this doesn't give you the right to discredit them on every chance you
get. If you have a problem with them, come to the point and post exactly
what you don't like. And _prove_ it by posting code samples.
>I've seen relatively new PHP programmers
do better.
If you don't like his solutions, post better ones or simply ignore him.
It's always good to have a choice between various ways to solve a
problem. He's contributing to the community by posting alternatives.

You OTOH are just trolling by attacking him personally on each and every
post. This sucks.

Enough is enough! >:-(

Micha
Oct 9 '08 #9
Jerry Stuckle wrote:
Manuel Lemos wrote:
Jerry Stuckle has a personality problem.
He seems to live on comp.lang.php like rat addicted to the cocaine
lever in a laboratory cage. He seems to do nothing else. Does his
employer know how much time he spends insulting people, complaining,
posturing? He seems to be a competent hacker. But also a lonely,
friendless, nasty dispositioned jerk.

Manuel Lemos is a mature, cosiderate and helpful guy by comparison.
Oct 9 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
4116
by: Wolfgang May | last post by:
Hi, I have a problem with the HTTP implementation of the PEAR package: I try to PUT an XML instance to an XML database (eXist), but it always puts a binary: <?php require_once "HTTP/Request.php"; $URL = 'http://ap34.ifi.informatik.uni-goettingen.de:8081/exist/servlet/db/may/hamlet.xml';
8
2224
by: ben | last post by:
I'm trying to write a web client script in python to log onto a web page and pull some information off of it. The page has quite a few behind the scenes http things going on that are making it difficult to write the python script. I think if I could see the raw http data that comes to my browser life would be much easier. Does anybody know of programs that would run alongside my browser and show the http data that is send back and...
7
9277
by: Michael Foord | last post by:
#!/usr/bin/python -u # 15-09-04 # v1.0.0 # auth_example.py # A simple script manually demonstrating basic authentication. # Copyright Michael Foord # Free to use, modify and relicense. # No warranty express or implied for the accuracy, fitness to purpose
17
14780
by: Patrick | last post by:
I am almost certain that I could use HTTP Post/Get to submit XML Web Service call (over SSL as well, if using Version 3 of MSXML2) from an ASP Application? However, would I only be able to call web-service in a an asynchronous mode (with a callback function)? If so, how?
24
4478
by: sinister | last post by:
After doing a websearch, it appears that it's OK to omit the "http:" to form a relative URL. Are there any pitfalls to this? For example, if there is a page http://www.domain1.com/page1.html with a link to http://www.domain2.com/page2.html you can abbreviate the second link as //www.domain2.com/page2.html
30
4836
by: Anon | last post by:
If Http headers specify the character encoding, what is the point of the Meta tag specifying it?
3
1837
by: mike | last post by:
regards: How do I know that edition 1.0 or 1.1 the HTTP Server support?..... Support of HTTP edition is decided by client end or server end? Any positive suggestion is welcome. thank you May goodness be with you all
6
2509
by: Microsoft News | last post by:
Hi, I have been using several http server code examples from the web, include one from msdn, and I can't seem to get a simple http server thread working. I can connect the server successful using IE6 and following url: http://127.0.0.1:5050 But when I attempt a second connect the windows symbol in the upper right corner of ie starts in motion and nothing happens it just sits there waiting for a response. The code also behaves very...
1
4284
by: zpinhead | last post by:
I am unable to get my downloaded extension from pecl to link up with php properly. seems like the php.so I could not use pear install http. pear claimed the extension was already installed. that is certainly not true. I downloaded the http extension from pecl. cvs -d:pserver:cvsread@cvs.php.net:/repository co pecl/http
16
2307
by: Harry Simpson | last post by:
I've been away from ASPNET - I open up a new Web app in VS2008 and go into properties and select to use IIS instead of the personal web server. Then I run in debug mode and it says I have to set the Debug= True in the Web.config which I do. Then try to run it again and it says I must enable integrated security which I do. I then try to run it again and get the HTTP 403 error - " This error (HTTP 403 Forbidden) means that Internet...
0
8421
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8844
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8742
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8518
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8621
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7354
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4330
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1971
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1734
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.