By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,304 Members | 1,242 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,304 IT Pros & Developers. It's quick & easy.

HTTP-POST simultaneous requests

P: n/a
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

regards, Mark
Oct 4 '08 #1
Share this Question
Share on Google+
21 Replies


P: n/a
mark wrote:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

regards, Mark
Sorry, PHP doesn't do multithreading very well. Probably the best you
can do is start multiple background processes to do the work then
communicate via a database, shared memory, etc.

As for displaying the contents immediately - again, not guaranteed
possible. You can flush() the buffers in PHP - but that doesn't
guarantee the data will be sent by the webserver to the client
immediately, nor does it guarantee the client will display the data
before it's received.

Sounds like java might be a better fit.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 4 '08 #2

P: n/a
mark wrote:
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
Either:
- Run in console and use fork().
- Use raw HTTP and some socket_select() magic.
- curl_multi_exec().
- Rely on javascript, ajax techniques, and make a web browser launch 5
queries yo your web server, each of one scraping a site.
- Use ignore_user_abort() and a mix of raw HTTP with sockets to blindly
launch PHP threads. This one's quite tricky to pull out.

There may be more ways to do this, but unless you know what a critical
section is, please stay away from concurrent (AKA multithread) programming.

Besides, you want IPC to get the results as they appear - to make your life
easier, you should stick with either curl_multi queries or rely on
javascript to individually fetch results as they are ready.
--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

Now listening to: Deep Forest - Music.Detected_ (2002) - [4] Computer
Machine (5:12) (99.061996%)
Oct 4 '08 #3

P: n/a
Hello,

on 10/04/2008 05:09 PM mark said the following:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This class can do exactly what you describe:

http://www.phpclasses.org/thread

This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:

http://www.phpclasses.org/phpthreader
--

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Oct 5 '08 #4

P: n/a
Manuel Lemos wrote:
Hello,

on 10/04/2008 05:09 PM mark said the following:
>Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:
http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.

In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!

This class can do exactly what you describe:

http://www.phpclasses.org/thread

This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:

http://www.phpclasses.org/phpthreader

Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 5 '08 #5

P: n/a
On 4 Oct, 21:09, mark <mkazmier...@gmail.comwrote:
Hello,

I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
That's exactly what curl_multi_* does.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This is not a trivial bit of coding. It's not impossible but since you
seem to be relying on cut-and-paste coding, do you think you're
overstretching your abilities?

C.
Oct 6 '08 #6

P: n/a
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
Manuel Lemos wrote:
<snip>
>
This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader

Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Oct 9 '08 #7

P: n/a
R. Rajesh Jeba Anbiah wrote:
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
>Manuel Lemos wrote:
<snip>
>>This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?

And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

What's your solution? Do you have better approach?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Yes, curl_multi_exec(), as Ivn indicated.

Manuel is just a spammer - virtually every answer he posts refers to
something on his site. And he doesn't even indicate it's his own site
when he spams it.

Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts. I've seen relatively new PHP programmers
do better.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 9 '08 #8

P: n/a
..oO(Jerry Stuckle)
>R. Rajesh Jeba Anbiah wrote:
>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
>>>
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?

What's your solution? Do you have better approach?

Yes, curl_multi_exec(), as Ivn indicated.

Manuel is just a spammer
Wrong.
>virtually every answer he posts refers to
something on his site.
Nothing wrong with that. I would also point to my own classes to solve a
given problem if they would be freely available.
>And he doesn't even indicate it's his own site
when he spams it.
Not necessary.

It would be spam if it would be totally OT, but he posts ready-to-use
solutions to PHP problems. It doesn't matter if these solutions are his
own or not. Even if they would be commercial, it wouldn't be spam in the
given context.
>Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts.
Some. But surely not all. They might not fit your coding standards, but
this doesn't give you the right to discredit them on every chance you
get. If you have a problem with them, come to the point and post exactly
what you don't like. And _prove_ it by posting code samples.
>I've seen relatively new PHP programmers
do better.
If you don't like his solutions, post better ones or simply ignore him.
It's always good to have a choice between various ways to solve a
problem. He's contributing to the community by posting alternatives.

You OTOH are just trolling by attacking him personally on each and every
post. This sucks.

Enough is enough! >:-(

Micha
Oct 9 '08 #9

P: n/a
Jerry Stuckle wrote:
Manuel Lemos wrote:
Jerry Stuckle has a personality problem.
He seems to live on comp.lang.php like rat addicted to the cocaine
lever in a laboratory cage. He seems to do nothing else. Does his
employer know how much time he spends insulting people, complaining,
posturing? He seems to be a competent hacker. But also a lonely,
friendless, nasty dispositioned jerk.

Manuel Lemos is a mature, cosiderate and helpful guy by comparison.
Oct 9 '08 #10

P: n/a
salmobytes wrote:
Jerry Stuckle wrote:
>Manuel Lemos wrote:

Jerry Stuckle has a personality problem.
He seems to live on comp.lang.php like rat addicted to the cocaine
lever in a laboratory cage. He seems to do nothing else. Does his
employer know how much time he spends insulting people, complaining,
posturing? He seems to be a competent hacker. But also a lonely,
friendless, nasty dispositioned jerk.

Manuel Lemos is a mature, cosiderate and helpful guy by comparison.
ROFLMAO!

FYI, I am my own employer - an independent consultant. And I suspect I
make a lot more than most of the people in this newsgroup.

No, I don't "live" here. But I check in a few times during the day,
usually when I need to take a break from coding.

As for Manuel - "mature" people don't need to spam their websites at
every opportunity. When was the last time you saw him give advice which
wasn't on his website? Not very often.

OTOH, I never refer to my website for solutions. Many here don't even
know what it is (which is fine with me).

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 9 '08 #11

P: n/a
Michael Fesser wrote:
.oO(Jerry Stuckle)
>R. Rajesh Jeba Anbiah wrote:
>>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?
Yes, curl_multi_exec(), as Ivn indicated.

Manuel is just a spammer

Wrong.
>virtually every answer he posts refers to
something on his site.

Nothing wrong with that. I would also point to my own classes to solve a
given problem if they would be freely available.
>And he doesn't even indicate it's his own site
when he spams it.

Not necessary.

It would be spam if it would be totally OT, but he posts ready-to-use
solutions to PHP problems. It doesn't matter if these solutions are his
own or not. Even if they would be commercial, it wouldn't be spam in the
given context.
>Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts.

Some. But surely not all. They might not fit your coding standards, but
this doesn't give you the right to discredit them on every chance you
get. If you have a problem with them, come to the point and post exactly
what you don't like. And _prove_ it by posting code samples.
>I've seen relatively new PHP programmers
do better.

If you don't like his solutions, post better ones or simply ignore him.
It's always good to have a choice between various ways to solve a
problem. He's contributing to the community by posting alternatives.

You OTOH are just trolling by attacking him personally on each and every
post. This sucks.

Enough is enough! >:-(

Micha

Sorry, Micha, as much as I respect you, I have to disagree. How many
posts has Manuel made which had solutions - other than saying "see this
website" - and not telling people it is his?

I don't spam my website - because its contents is not germane to this
newsgroup. I do sometimes refer people to other websites. But at NO
time have I ever referred anyone to a site where I have a pecuniary
interest. And if I did, I'd at least tell them it was my site.

And no, I haven't looked at every one of his scripts. But I know bad
coding when I see it. And there is no reason to inflict such garbage on
new PHP programmers who are trying to learn how to do things the write
way. It's at least worth warning them that the coding is lousy.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 9 '08 #12

P: n/a
..oO(salmobytes)
>If you Google "Jerry Stuckle" you get quite an impressive list of link
titles. Here are just a few samples:
[...]

...this list goes on for page after page. It's almost endless.
What is it about you Jerry?
What does this have to do with PHP?

Micha
Oct 9 '08 #13

P: n/a
Jerry Stuckle wrote:
I'm not afraid to call a troll a troll - or a spammer a spammer.
I'm afraid the trolls and spammers don't like that.
If you go back and study the posts to this group you see there
are basically two groups: newbee help seekers and a small core of
top-knotch guys Jerry accepts because of their expertise, or perhaps
because he's afraid to attack them.

Every new comer who isn't a supplicating, hat-in-hand beginner
immediately gets attacked by Jerry and then disappears, all too
often never to be heard from again. You do this group a great
disservice. You're drying it up, shrinking it down into your own
personal, wrinkled, fascist soap box forum.
Oct 10 '08 #14

P: n/a
salmobytes wrote:
Jerry Stuckle wrote:
>I'm not afraid to call a troll a troll - or a spammer a spammer.
I'm afraid the trolls and spammers don't like that.

If you go back and study the posts to this group you see there
are basically two groups: newbee help seekers and a small core of
top-knotch guys Jerry accepts because of their expertise, or perhaps
because he's afraid to attack them.

Every new comer who isn't a supplicating, hat-in-hand beginner
immediately gets attacked by Jerry and then disappears, all too
often never to be heard from again. You do this group a great
disservice. You're drying it up, shrinking it down into your own
personal, wrinkled, fascist soap box forum.
Wrong. I answer a lot of newbie questions. It's the spammers and
trolls I can't stand.

And Manuel is not a "newbie" in this newsgroup. He's spammed it many
times before.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 10 '08 #15

P: n/a

Yes, I know you answer a lot of newbee questions.
And my observations were not limited to Manuel Lemos, who does have a
lot to offer.
Your attacks are well distributed, often having nothing to do with
spam of any kind. You make this group a tense place just to visit.
You've been doing it for years. Good luck to the group.
It will need it.
Oct 10 '08 #16

P: n/a
"Jerry Stuckle" wrote...
: Sorry, Micha, as much as I respect you, I have to disagree. How many
: posts has Manuel made which had solutions - other than saying "see this
: website" - and not telling people it is his?

You're correct. He should identify it as his code, his website. I do
hesitate to call it trolling. Of course, I only read one of his posts so
far. His website looks ok.

: And no, I haven't looked at every one of his scripts. But I know bad
: coding when I see it. And there is no reason to inflict such garbage
: on new PHP programmers who are trying to learn how to do things the
: [write <mis>] right way. It's at least worth warning them that the
: coding is lousy.

The way I look at it, everyone wants to learn. He should encourage folks
to take a look at "his" code and you should encourage him to produce
better code. I suggest he put copyright notices on his code, place it in
the public domain and encourage others to test it and use it and then
identify any problems. I'm only trying to get the idea across that we,
you, him, I, and everyone else here to work together and support one
another. Encouragement produces wonderful benefits.

I like his website. Good job. I wonder if that code is in the public
domain?

If I see bad code, I think along the lines that either he knows something
I failed to see or perhaps I can provide something better.

--
Jim Carlock
More Than Five Senses
http://www.associatedcontent.com/art...ve_senses.html
George Bush And Condoleeza Rice MP3s
http://www.microcosmotalk.com/thefacts/

Oct 11 '08 #17

P: n/a
..oO(Jim Carlock)
>"Jerry Stuckle" wrote...
: Sorry, Micha, as much as I respect you, I have to disagree. How many
: posts has Manuel made which had solutions - other than saying "see this
: website" - and not telling people it is his?

You're correct. He should identify it as his code, his website. I do
hesitate to call it trolling. Of course, I only read one of his posts so
far. His website looks ok.
It's his site, but not always his code. He maintains the platform, but
there are many other code contributors as well, like the two who wrote
the threading classes mentioned here. These were not written by Manuel.

There's absolutely no need to name the author of each and every script
you link to, especially if the scripts are free or released under GPL.
All the informations are available on the site or in the code.

Micha
Oct 11 '08 #18

P: n/a
Michael Fesser wrote:
.oO(Jim Carlock)
>"Jerry Stuckle" wrote...
: Sorry, Micha, as much as I respect you, I have to disagree. How many
: posts has Manuel made which had solutions - other than saying "see this
: website" - and not telling people it is his?

You're correct. He should identify it as his code, his website. I do
hesitate to call it trolling. Of course, I only read one of his posts so
far. His website looks ok.

It's his site, but not always his code. He maintains the platform, but
there are many other code contributors as well, like the two who wrote
the threading classes mentioned here. These were not written by Manuel.

There's absolutely no need to name the author of each and every script
you link to, especially if the scripts are free or released under GPL.
All the informations are available on the site or in the code.

Micha
Micha,

My complaint is that Manuel doesn't try to help people. Every answer is
a pointer to his site, without telling people it is his site.

That's just spamming - plain and simple.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 11 '08 #19

P: n/a
I looked at the initial link posted and was commenting on that
link (domain phpied.com). That belongs to Stoyan Stefanov. Oops.

However, going back through to the link posted by Manuel, I did
end up at the following link:

http://phpclasses.betablue.net/browse/package/3953.html

It took 4 or 5 clicks, selecting a country, et al, to finally
get to the link above. Very strange. Some sort of redirects that
go on there. I've got to run for now. Catch you guys later.

--
Jim Carlock
You Have More Than Five Senses
http://www.associatedcontent.com/art...ve_senses.html

Oct 11 '08 #20

P: n/a
On Oct 9, 11:10 pm, Jerry Stuckle <jstuck...@attglobal.netwrote:
R. Rajesh Jeba Anbiah wrote:
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
Manuel Lemos wrote:
<snip>
>This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?

Yes, curl_multi_exec(), as Ivn indicated.

Manuel is just a spammer - virtually every answer he posts refers to
something on his site. And he doesn't even indicate it's his own site
when he spams it.

Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts. I've seen relatively new PHP programmers
do better.
So, you never have a better solution except trolling?? I do know
you have better knowledge in DB and OOP--but not in PHP. Manuel is
known for bringing the first OOP repository for PHP and been
internationally known for many years. By attacking such great people,
you're proving your stupidity--but this is the time to stop such.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Oct 14 '08 #21

P: n/a
R. Rajesh Jeba Anbiah wrote:
On Oct 9, 11:10 pm, Jerry Stuckle <jstuck...@attglobal.netwrote:
>R. Rajesh Jeba Anbiah wrote:
>>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attglobal.netwrote:
Manuel Lemos wrote:
<snip>
This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?
Yes, curl_multi_exec(), as Ivn indicated.

Manuel is just a spammer - virtually every answer he posts refers to
something on his site. And he doesn't even indicate it's his own site
when he spams it.

Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts. I've seen relatively new PHP programmers
do better.

So, you never have a better solution except trolling?? I do know
you have better knowledge in DB and OOP--but not in PHP. Manuel is
known for bringing the first OOP repository for PHP and been
internationally known for many years. By attacking such great people,
you're proving your stupidity--but this is the time to stop such.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Sorry, Rajesh - Manuel has shown none of that "brilliance" here. If he
were that great, he'd be helping here instead of just spamming his site.

He's no better than any other spammer on the internet.

And I would say I know a LOT more about PHP than Manuel does. And so do
a couple of dozen other people in this newsgroup.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Oct 14 '08 #22

This discussion thread is closed

Replies have been disabled for this discussion.