The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
i know that sounds shady, but the login is legitimate, the data access
is legitimate, and the credentials are also valid. The problem is, i
can't request a direct database link to the server hosting the actual
data because of this nondisclosure agreement. It would require
divulging the reasons for the need for such access, which my employer
is not willing to reveal at this time.
If there's anyone who can offer ideas or help, and wishes to keep
possible answers off the public board, please email me. i realize this
is a long shot, and i doubt that even if there IS a way, that anyone
would be willing/able. But i gotta try.
Thanks all.
-joe 6 2111
joe t. wrote:
The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.
<snip long winded explanation>
So you want to copy someone else's data and you've only got an HTTP
interface intended for humans to the remote system.
There's plenty of companies doing this already - no need to be shy.
How simple it is depends on how well their site is written - assuming it is
well written you should be able to parse the pages with the XML parser. How
to get the pages? That's rather up to you - you could use a site ripper
like pavuk or write your own spider, e.g. using snoopy.
HTH
C.
On 10 Aug 2006 14:25:33 -0700, "joe t." <th*******@gmai l.comwrote:
>There is a website that requires me to log in using a web-form. Obviously, POST vars are sent and verified and on success i'm given a Session and/or Cookie. Within this logged-in area, there are links leading to data query result pages. "Click here for your recent transactions " kind of thing.
Those results pages are what i want to get to, but through some kind of script that parses the results that get served out, not by user interaction. i want to send a request for a link within that logged in area and have the results served to my script, then parse out specific data from those results and in turn serve them to a user in my own page.
i know that sounds shady, but the login is legitimate, the data access is legitimate, and the credentials are also valid. The problem is, i can't request a direct database link to the server hosting the actual data because of this nondisclosure agreement. It would require divulging the reasons for the need for such access, which my employer is not willing to reveal at this time.
If there's anyone who can offer ideas or help, and wishes to keep possible answers off the public board, please email me. i realize this is a long shot, and i doubt that even if there IS a way, that anyone would be willing/able. But i gotta try.
Whilst this sort of situation is never the best way of doing things, sometimes
it's the only way. If you really do have to go down this route then there is a
particularly nice Perl module called WWW::Mechanize.
Obviously it's not PHP, but you can call Perl from PHP. http://search.cpan.org/search?query=...anize&mode=all
Whilst you're in Perl, then it also has various HTML parsing modules, the most
obvious being HTML::Parser, which can deal with HTML even if it's of dubious
quality. http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm
So combined you can have a Perl script that does all the hard stuff and then
returns its results in an easily machine-readable form to PHP.
--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
In article <11************ *********@74g20 00cwt.googlegro ups.com>,
joe t. <th*******@gmai l.comwrote:
>There is a website that requires me to log in using a web-form. ... Those results pages are what i want to get to, but through some kind of script that parses the results that get served out, not by user interaction.
I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.
It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.
-Alex
axlq wrote:
In article <11************ *********@74g20 00cwt.googlegro ups.com>,
joe t. <th*******@gmai l.comwrote:
There is a website that requires me to log in using a web-form.
...
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction.
I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.
It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.
-Alex
Thanks all of you for the suggestions. i will investigate these options
and try to report back on success.
-joe
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>
Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.
--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>
Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.
--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: |
last post by:
Please help.
After a number of wrong turns and experiments I need advice on login
management system to secure our web pages without inconveniencing our
visitors or our internal staff.
What I need:
A system whereby the user only has to register ONCE and he will have
automatic entry to ANY page without havinto to RE-LOGIN even if he comes in
|
by: Bobby |
last post by:
Hello everyone I have a question. The school I am working for is in
the beginning process of having a webpage that will direct students to
download there homework and be able to view there info like test
scores and etc(the homework and info page will reside on our
webservers at the school on the local intranet network). Now what I
need is a way for the students to go to a login page and when logging
in will be automatically directed to...
|
by: Jim |
last post by:
I need some help getting started with a .NET web project for a
commercial site. I am new to .NET and my understanding of some (but
not all) of its concepts is a little sparse. I apologize for the
length of this message, but hopefully it will help someone here give
me the most concise and useful information, and perhaps help others
out as well. :)
It's been a while since I've had to design anything "real" for the
web. I think the last...
|
by: baustin75 |
last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie
only when debugging in php designer 2005
--------------------------------------------------------------------------------
Hello,
I have a very simple problem but cannot seem to figure it out. I have a
very simple php script that sends a test email to myself. When I debug
it in PHP designer, it works with no problems, I get the test email. If
|
by: David Lozzi |
last post by:
Howdy,
I'm new to classes. Below is my class User. (is this a reserved namespace or
class?) It works great, kind of. If I specify the username and password, the
correct firstname and lastname are returned. For example, username dlozzi,
password fun. It returns David Lozzi as full name. If I login as someone
else on another computer, say username dsmith and password fun2, the second
computer displays the correct fullname. HOWEVER if I...
| |
by: Amil |
last post by:
Please don't repond to this if you are guessing or just don't know the
answer.
I'm trying to login to a backend system running Java/Tomcat. I create a
HttpWebRequest with the login data and do a POST. This works fine. The
HttpWebResponse content I get back is just javascript "window.location=xxx"
(with normal html around it). The HttpWebResponse also contains a java
session id cookie. Fine so far.
I want to go to the new location...
|
by: Calvin KD |
last post by:
Hi everyone,
Can someone suggest a way of monitoring the number of logins for each user
in a particular session to make sure that a particular user cannot log in
twice in the same session? I have thought of using Application-level counter
or even store the counter in the database but it will not work (100% of the
time that is) when the user's session timed out, or user clicking the IE back
button or even close down the browser completely....
|
by: gunimpi |
last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431
********************************************************
VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help
wanted
********************************************************
For this teeny job, please refer to:
http://feeds.reddit.com/feed/8fu/?o=25
|
by: camilin87 |
last post by:
hello.
I'm building a site using php I have a setup.php page wich has at the
begining session_start();
and every single page from my site includes setup.php. When a user
registers I save in $_SESSION the userName, so that when !
isset($_SESSION) I can redirect him to the login.php page.
After some inactivityperiod, e.g. 20 min, I need that user to be
automatically logged off. Besides I need to keep track of the users
that are online in...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |