473,657 Members | 2,758 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help to automatically traverse a login session

The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.

There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.

i know that sounds shady, but the login is legitimate, the data access
is legitimate, and the credentials are also valid. The problem is, i
can't request a direct database link to the server hosting the actual
data because of this nondisclosure agreement. It would require
divulging the reasons for the need for such access, which my employer
is not willing to reveal at this time.

If there's anyone who can offer ideas or help, and wishes to keep
possible answers off the public board, please email me. i realize this
is a long shot, and i doubt that even if there IS a way, that anyone
would be willing/able. But i gotta try.

Thanks all.
-joe

Aug 10 '06 #1
6 2111
joe t. wrote:
The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.
<snip long winded explanation>

So you want to copy someone else's data and you've only got an HTTP
interface intended for humans to the remote system.

There's plenty of companies doing this already - no need to be shy.

How simple it is depends on how well their site is written - assuming it is
well written you should be able to parse the pages with the XML parser. How
to get the pages? That's rather up to you - you could use a site ripper
like pavuk or write your own spider, e.g. using snoopy.

HTH

C.
Aug 10 '06 #2
On 10 Aug 2006 14:25:33 -0700, "joe t." <th*******@gmai l.comwrote:
>There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions " kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.

i know that sounds shady, but the login is legitimate, the data access
is legitimate, and the credentials are also valid. The problem is, i
can't request a direct database link to the server hosting the actual
data because of this nondisclosure agreement. It would require
divulging the reasons for the need for such access, which my employer
is not willing to reveal at this time.

If there's anyone who can offer ideas or help, and wishes to keep
possible answers off the public board, please email me. i realize this
is a long shot, and i doubt that even if there IS a way, that anyone
would be willing/able. But i gotta try.
Whilst this sort of situation is never the best way of doing things, sometimes
it's the only way. If you really do have to go down this route then there is a
particularly nice Perl module called WWW::Mechanize.

Obviously it's not PHP, but you can call Perl from PHP.

http://search.cpan.org/search?query=...anize&mode=all

Whilst you're in Perl, then it also has various HTML parsing modules, the most
obvious being HTML::Parser, which can deal with HTML even if it's of dubious
quality.

http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm

So combined you can have a Perl script that does all the hard stuff and then
returns its results in an easily machine-readable form to PHP.

--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Aug 10 '06 #3
In article <11************ *********@74g20 00cwt.googlegro ups.com>,
joe t. <th*******@gmai l.comwrote:
>There is a website that requires me to log in using a web-form.
...
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction.
I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.

It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.

-Alex
Aug 11 '06 #4

axlq wrote:
In article <11************ *********@74g20 00cwt.googlegro ups.com>,
joe t. <th*******@gmai l.comwrote:
There is a website that requires me to log in using a web-form.
...
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction.

I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.

It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.

-Alex

Thanks all of you for the suggestions. i will investigate these options
and try to report back on success.
-joe

Aug 11 '06 #5
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>

Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Aug 13 '06 #6
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>

Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Aug 13 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
2459
by: | last post by:
Please help. After a number of wrong turns and experiments I need advice on login management system to secure our web pages without inconveniencing our visitors or our internal staff. What I need: A system whereby the user only has to register ONCE and he will have automatic entry to ANY page without havinto to RE-LOGIN even if he comes in
2
1928
by: Bobby | last post by:
Hello everyone I have a question. The school I am working for is in the beginning process of having a webpage that will direct students to download there homework and be able to view there info like test scores and etc(the homework and info page will reside on our webservers at the school on the local intranet network). Now what I need is a way for the students to go to a login page and when logging in will be automatically directed to...
0
1926
by: Jim | last post by:
I need some help getting started with a .NET web project for a commercial site. I am new to .NET and my understanding of some (but not all) of its concepts is a little sparse. I apologize for the length of this message, but hopefully it will help someone here give me the most concise and useful information, and perhaps help others out as well. :) It's been a while since I've had to design anything "real" for the web. I think the last...
8
5462
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- Hello, I have a very simple problem but cannot seem to figure it out. I have a very simple php script that sends a test email to myself. When I debug it in PHP designer, it works with no problems, I get the test email. If
6
1575
by: David Lozzi | last post by:
Howdy, I'm new to classes. Below is my class User. (is this a reserved namespace or class?) It works great, kind of. If I specify the username and password, the correct firstname and lastname are returned. For example, username dlozzi, password fun. It returns David Lozzi as full name. If I login as someone else on another computer, say username dsmith and password fun2, the second computer displays the correct fullname. HOWEVER if I...
3
3951
by: Amil | last post by:
Please don't repond to this if you are guessing or just don't know the answer. I'm trying to login to a backend system running Java/Tomcat. I create a HttpWebRequest with the login data and do a POST. This works fine. The HttpWebResponse content I get back is just javascript "window.location=xxx" (with normal html around it). The HttpWebResponse also contains a java session id cookie. Fine so far. I want to go to the new location...
2
2421
by: Calvin KD | last post by:
Hi everyone, Can someone suggest a way of monitoring the number of logins for each user in a particular session to make sure that a particular user cannot log in twice in the same session? I have thought of using Application-level counter or even store the counter in the database but it will not work (100% of the time that is) when the user's session timed out, or user clicking the IE back button or even close down the browser completely....
0
5557
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted ******************************************************** For this teeny job, please refer to: http://feeds.reddit.com/feed/8fu/?o=25
5
2571
by: camilin87 | last post by:
hello. I'm building a site using php I have a setup.php page wich has at the begining session_start(); and every single page from my site includes setup.php. When a user registers I save in $_SESSION the userName, so that when ! isset($_SESSION) I can redirect him to the login.php page. After some inactivityperiod, e.g. 20 min, I need that user to be automatically logged off. Besides I need to keep track of the users that are online in...
0
8385
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8821
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8502
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8602
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6162
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5632
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4300
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2726
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1941
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.