Hello,
I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script: http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
regards, Mark 21 2783
mark wrote:
Hello,
I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script: http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
regards, Mark
Sorry, PHP doesn't do multithreading very well. Probably the best you
can do is start multiple background processes to do the work then
communicate via a database, shared memory, etc.
As for displaying the contents immediately - again, not guaranteed
possible. You can flush() the buffers in PHP - but that doesn't
guarantee the data will be sent by the webserver to the client
immediately, nor does it guarantee the client will display the data
before it's received.
Sounds like java might be a better fit.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp. js*******@attgl obal.net
=============== ===
mark wrote:
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
Either:
- Run in console and use fork().
- Use raw HTTP and some socket_select() magic.
- curl_multi_exec ().
- Rely on javascript, ajax techniques, and make a web browser launch 5
queries yo your web server, each of one scraping a site.
- Use ignore_user_abo rt() and a mix of raw HTTP with sockets to blindly
launch PHP threads. This one's quite tricky to pull out.
There may be more ways to do this, but unless you know what a critical
section is, please stay away from concurrent (AKA multithread) programming.
Besides, you want IPC to get the results as they appear - to make your life
easier, you should stick with either curl_multi queries or rely on
javascript to individually fetch results as they are ready.
--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-
Now listening to: Deep Forest - Music.Detected_ (2002) - [4] Computer
Machine (5:12) (99.061996%)
Hello,
on 10/04/2008 05:09 PM mark said the following:
Hello,
I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script: http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This class can do exactly what you describe: http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests: http://www.phpclasses.org/phpthreader
--
Regards,
Manuel Lemos
Find and post PHP jobs http://www.phpclasses.org/jobs/
PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/
Manuel Lemos wrote:
Hello,
on 10/04/2008 05:09 PM mark said the following:
>Hello,
I want to create a php scraper that will get some information from e.g. 5 sites simultaneously. I tried the following script: http://www.phpied.com/simultaneuos-h...php-with-curl/ Everything works fine, but what I want is simultaneuos (something to multithread, when these 5 websites will be loaded not one after another, but by using different sockets) scraper.
In addition I would like to display the results as soon as it will be scraped. So when first http-post get answer, it will show the result and wait for the rest of the pages (not display everything when all scraping is done). Any ideas how can I achieve it? Thanks!
This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp. js*******@attgl obal.net
=============== ===
On 4 Oct, 21:09, mark <mkazmier...@gm ail.comwrote:
Hello,
I want to create a php scraper that will get some information from
e.g. 5 sites simultaneously. I tried the following script:http://www.phpied.com/simultaneuos-h...php-with-curl/
Everything works fine, but what I want is simultaneuos (something to
multithread, when these 5 websites will be loaded not one after
another, but by using different sockets) scraper.
That's exactly what curl_multi_* does.
In addition I would like to display the results as soon as it will be
scraped. So when first http-post get answer, it will show the result
and wait for the rest of the pages (not display everything when all
scraping is done).
Any ideas how can I achieve it? Thanks!
This is not a trivial bit of coding. It's not impossible but since you
seem to be relying on cut-and-paste coding, do you think you're
overstretching your abilities?
C.
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
Manuel Lemos wrote:
<snip>
>
This class can do exactly what you describe:
http://www.phpclasses.org/thread
This other class also uses separate HTTP requests to run multiple
parallel tasks but these are started from the browser side using AJAX
requests:
http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?
And those are your own classes (which, BTW, aren't worth a damn) you're
spamming?
What's your solution? Do you have better approach?
--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
R. Rajesh Jeba Anbiah wrote:
On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
>Manuel Lemos wrote:
<snip>
>>This class can do exactly what you describe: http://www.phpclasses.org/thread This other class also uses separate HTTP requests to run multiple parallel tasks but these are started from the browser side using AJAX requests: http://www.phpclasses.org/phpthreader
Why don't you tell him that's your own site you're spamming again, Manuel?
And those are your own classes (which, BTW, aren't worth a damn) you're spamming?
What's your solution? Do you have better approach?
--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Yes, curl_multi_exec (), as Iván indicated.
Manuel is just a spammer - virtually every answer he posts refers to
something on his site. And he doesn't even indicate it's his own site
when he spams it.
Now I wouldn't mind if he were giving good technical advice. But I've
looked at some of his scripts. I've seen relatively new PHP programmers
do better.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp. js*******@attgl obal.net
=============== ===
..oO(Jerry Stuckle)
>R. Rajesh Jeba Anbiah wrote:
>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
>>> And those are your own classes (which, BTW, aren't worth a damn) you're spamming?
What's your solution? Do you have better approach? Yes, curl_multi_exec (), as Iván indicated.
Manuel is just a spammer
Wrong.
>virtually every answer he posts refers to something on his site.
Nothing wrong with that. I would also point to my own classes to solve a
given problem if they would be freely available.
>And he doesn't even indicate it's his own site when he spams it.
Not necessary.
It would be spam if it would be totally OT, but he posts ready-to-use
solutions to PHP problems. It doesn't matter if these solutions are his
own or not. Even if they would be commercial, it wouldn't be spam in the
given context.
>Now I wouldn't mind if he were giving good technical advice. But I've looked at some of his scripts.
Some. But surely not all. They might not fit your coding standards, but
this doesn't give you the right to discredit them on every chance you
get. If you have a problem with them, come to the point and post exactly
what you don't like. And _prove_ it by posting code samples.
>I've seen relatively new PHP programmers do better.
If you don't like his solutions, post better ones or simply ignore him.
It's always good to have a choice between various ways to solve a
problem. He's contributing to the community by posting alternatives.
You OTOH are just trolling by attacking him personally on each and every
post. This sucks.
Enough is enough! >:-(
Micha
Jerry Stuckle wrote:
Manuel Lemos wrote:
Jerry Stuckle has a personality problem.
He seems to live on comp.lang.php like rat addicted to the cocaine
lever in a laboratory cage. He seems to do nothing else. Does his
employer know how much time he spends insulting people, complaining,
posturing? He seems to be a competent hacker. But also a lonely,
friendless, nasty dispositioned jerk.
Manuel Lemos is a mature, cosiderate and helpful guy by comparison. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Wolfgang May |
last post by:
Hi,
I have a problem with the HTTP implementation of the PEAR package:
I try to PUT an XML instance to an XML database (eXist), but it always
puts a binary:
<?php
require_once "HTTP/Request.php";
$URL = 'http://ap34.ifi.informatik.uni-goettingen.de:8081/exist/servlet/db/may/hamlet.xml';
|
by: ben |
last post by:
I'm trying to write a web client script in python to log onto a web
page and pull some information off of it. The page has quite a few
behind the scenes http things going on that are making it difficult to
write the python script. I think if I could see the raw http data
that comes to my browser life would be much easier. Does anybody know
of programs that would run alongside my browser and show the http data
that is send back and...
|
by: Michael Foord |
last post by:
#!/usr/bin/python -u
# 15-09-04
# v1.0.0
# auth_example.py
# A simple script manually demonstrating basic authentication.
# Copyright Michael Foord
# Free to use, modify and relicense.
# No warranty express or implied for the accuracy, fitness to purpose
|
by: Patrick |
last post by:
I am almost certain that I could use HTTP Post/Get to submit XML Web Service
call (over SSL as well, if using Version 3 of MSXML2) from an ASP
Application?
However, would I only be able to call web-service in a an asynchronous mode
(with a callback function)? If so, how?
|
by: sinister |
last post by:
After doing a websearch, it appears that it's OK to omit the "http:" to form
a relative URL. Are there any pitfalls to this?
For example, if there is a page
http://www.domain1.com/page1.html
with a link to
http://www.domain2.com/page2.html
you can abbreviate the second link as
//www.domain2.com/page2.html
| |
by: Anon |
last post by:
If Http headers specify the character encoding, what is the point of
the Meta tag specifying it?
|
by: mike |
last post by:
regards:
How do I know that edition 1.0 or 1.1 the HTTP Server support?.....
Support of HTTP edition is decided by client end or server end?
Any positive suggestion is welcome.
thank you
May goodness be with you all
|
by: Microsoft News |
last post by:
Hi,
I have been using several http server code examples from the web, include
one from msdn, and I can't seem to get a simple http server thread working.
I can connect the server successful using IE6 and following url:
http://127.0.0.1:5050
But when I attempt a second connect the windows symbol in the upper right
corner of ie starts in motion and nothing happens it just sits there waiting
for a response. The code also behaves very...
|
by: zpinhead |
last post by:
I am unable to get my downloaded extension from pecl to link up with
php
properly. seems like the php.so
I could not use pear install http. pear claimed the extension was
already
installed. that is certainly not true.
I downloaded the http extension from pecl.
cvs -d:pserver:cvsread@cvs.php.net:/repository co pecl/http
|
by: Harry Simpson |
last post by:
I've been away from ASPNET - I open up a new Web app in VS2008 and go into
properties and select to use IIS instead of the personal web server. Then
I run in debug mode and it says I have to set the Debug= True in the
Web.config which I do. Then try to run it again and it says I must enable
integrated security which I do. I then try to run it again and get the HTTP
403 error - " This error (HTTP 403 Forbidden) means that Internet...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |