473,804 Members | 3,225 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Automated web browing

Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for help
Marcin
Jan 17 '08 #1
12 1756
on 01/17/2008 07:52 PM mr_marcin said the following:
Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.
You may want to try this HTTP client class. Basically it acts like a
browser accessing pages, submitting forms, collecting cookies, handling
redirection, etc. which seems what you need to retrieve the pages with
the prices you want to grab.

http://www.phpclasses.org/httpclient
--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Jan 17 '08 #2
mr_marcin wrote:
Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for help
Marcin
cURL will allow you to get or post to pages, and will return the data.
I much prefer it over the HTTPClient class. It's more flexible.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jan 18 '08 #3
mr_marcin wrote:
Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.
Hi there,

SimpleTest has a class included called SimpleBrowser, which does what
you want, with a very intuitive API. It's not too fast tho...

SimpleTest: http://www.lastcraft.com/simple_test.php

Or, you can interactively setup browsing sessions with the Selenium IDE
and then use the PHP client for the Selenium Remote Control to run them...

Selenium IDE: http://www.openqa.org/selenium-ide/
Selenium RC: http://www.openqa.org/selenium-rc/
PHP Client for Selenium: http://pear.php.net/package/Testing_Selenium

Misc:
http://blog.thinkphp.de/archives/133...-Selenium.html
Regards,
Marlin Forbes
Freelance Developer
Data Shaman
datashaman.com
+27 (0)82 501-6647
Jan 18 '08 #4
Or, you can interactively setup browsing sessions with the Selenium IDE
and then use the PHP client for the Selenium Remote Control to run them...

Selenium IDE:http://www.openqa.org/selenium-ide/
Selenium RC:http://www.openqa.org/selenium-rc/
PHP Client for Selenium:http://pear.php.net/package/Testing_Selenium
This sounds like a quite easy to use package, but will this be
efficient enough? I will check all options next week.
Jan 18 '08 #5
cURL will allow you to get or post to pages, and will return the data.
I much prefer it over the HTTPClient class. It's more flexible.
I guess this approach requires some manual job, but you are right -
thats the most flexible and probably most effective way.
Jan 18 '08 #6
On Jan 18, 2:52 am, mr_marcin <mar...@cme.plw rote:
Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.
1. If you're looking for client tools http://www.iopus.com/imacros/firefox/
2. Web scraping with cURL or HTTPClient class
3. Look for the Web services (SOAP, XML, etc)

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
Jan 18 '08 #7
Hello,

on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for help
Marcin

cURL will allow you to get or post to pages, and will return the data. I
much prefer it over the HTTPClient class. It's more flexible.
I wonder which HTTP client you are talking about. The HTTP client I
mentioned wraps around Curl or socket functions depending on which is
more convinient to use in each PHP setup. This is the HTTP client class
I meant:

http://www.phpclasses.org/httpclient

As for Curl being flexible, I wonder what you are talking about.

Personally I find it very odd that you cannot read retrieved pages with
Curl in small chunks at a time without having to use callbacks. This is
bad because it makes very difficult to retrieve and process large pages
without using external files nor exceeding the PHP memory limits.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Jan 18 '08 #8
Manuel Lemos wrote:
Hello,

on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>>Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for help
Marcin
cURL will allow you to get or post to pages, and will return the data. I
much prefer it over the HTTPClient class. It's more flexible.

I wonder which HTTP client you are talking about. The HTTP client I
mentioned wraps around Curl or socket functions depending on which is
more convinient to use in each PHP setup. This is the HTTP client class
I meant:

http://www.phpclasses.org/httpclient
The same one.
As for Curl being flexible, I wonder what you are talking about.
I can do virtually anything with it that I can do with a browser, with
the exception of client side scripting. Also much less overhead than
the httpclient class.
Personally I find it very odd that you cannot read retrieved pages with
Curl in small chunks at a time without having to use callbacks. This is
bad because it makes very difficult to retrieve and process large pages
without using external files nor exceeding the PHP memory limits.
So? I never needed to. First of all, I have no need to retrieve huge
pages. The larges I've ever downloaded (a table with lots of info) was
a little over 3MB and Curl and PHP handled it just fine.

But if the text were split, you need to do additional processing to
handle splits at inconvenient locations. Much easier to add everything
to a temporary file and read it back in the way I need to so it.

But that's one of the advantages of cURL - it gives me the option of
doing the callbacks or not.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jan 19 '08 #9
Manuel Lemos wrote:

<snip junk>

Manuel,

I'm not going to argue with you about whether the HTTPClass is easier to
use or whatever.

My single point was that cURL is more flexible. You can do anything
with cURL that you can with the HTTPClient class and more. That is
pretty obvious - because the HTTPClient class is built on cURL - so if
cURL can't do it, neither can the HTTPClient class.

But being built on cURL, the HTTPClient class restricts what you can do.
So it is less flexible.

You can sit there and argue all you want as to the other merits of your
class. I won't bite. Because that was not my point.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jan 19 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
3725
by: Max M | last post by:
Yesterday there was an article on Slashdot: http://books.slashdot.org/article.pl?sid=03/09/04/1415210&mode=flat&tid=108&tid=126&tid=156 It is about automatic code generation. I got interrested in the subject, did a web search, and it seems kind of powerfull. My main interrest is web development in Zope/CMF/Plone, where there is a lot of repeated code in the products. So automated code generation seems
2
8757
by: Konstantin Zakharenko | last post by:
Hello, Our QA team have running a lot of test scripts (for automated regression testing), they run them on the different databases (Oracle/MS SQL). Several of those tests are dependent on the current date/time. In order to be able to use them efficiently, we changed the current date/time on the QA database server to a specific date/time before starting the scripts, so we are sure the test scripts always run in the same environment.
3
3364
by: Chris | last post by:
I've been researching this on and off for weeks, and haven't come up with anything useful yet. If anyone knows how to do this, please let me know. From a Java applet running in IE 6.0 using the Sun J2SE 1.4.2_03 plug-in, I need to retrieve the proxy host and port that will be used to access a specific URL. The site with the client machines is using automated proxy settings (i.e., a ".pac" file) to retrieve the proxy server address and...
5
1917
by: Maximus | last post by:
Guys, I need to make an automated script that runs every x seconds without using a CRON job. I heard there was a way doing it only in PHP. If you know any function that can be used please inform me about it.
5
2511
by: Salad | last post by:
I have a textbox for storing the URL to a web page. I figured the person could simply copy the URL from IE and paste it into the text box. The client would like to have a more automated process. We came upon one "glitch". We pasted the URL into the Access text box and went back to it and it didn't go to the right page. The reason is that frames were used in the page so although the page was OK, the content wasn't. To overcome that,...
1
3347
by: reymar | last post by:
1. What is the best software application in developing a database for Student Record System? 2. How to develop an Automated Student Record System? 3. What are the factors to be considered in developing this Automated Student Record System? 4. How can an Automated System solved the problem encountered by manual way of keeping the Record of the Students?
1
2259
by: rn5arn5a | last post by:
Nowadays, most websites make use of CAPTCHA to prevent automated Form submission. Can someone please give me examples of how automated Form submission can be achieved? It's not that I intend to do some nefarious activities; rather I don't understand how can automated Form submission be done other than the website developer coding it (for e.g. using JavaScript to automatically post a Form, say, after 2 minutes). Thanks,
0
1209
by: Jordan S. | last post by:
Okay so I've finally "seen the Light" about writing automated unit tests ahead of time. Question: What is a very simple approach that I can use to setting up automated unit tests, considering that I don't want to (1) embed them in production code at all (even with compiler directives to ignore them for release builds); and (2) I don't want to jump into a full-blown unit testing framework quite yet (like NUnit). I'm looking to set up a...
1
1604
by: =?Utf-8?B?UmljaGFyZCBCaXJk?= | last post by:
Hello, I have an issue with the recent Windows Automated Update disabling my ATI All-In-Wonder video capture card in Windows XP. Then when I reload the ATI software, the card works until the next Automated Udate disables it again. I have temporarily deactivated the Automated Update feature due to this problem. After the update, I get the error message: ------------------------------------------------------------------- "The TV player...
0
9708
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10589
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10340
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9161
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5527
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5663
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4302
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3828
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2999
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.