473,385 Members | 1,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Help to automatically traverse a login session

The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.

There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.

i know that sounds shady, but the login is legitimate, the data access
is legitimate, and the credentials are also valid. The problem is, i
can't request a direct database link to the server hosting the actual
data because of this nondisclosure agreement. It would require
divulging the reasons for the need for such access, which my employer
is not willing to reveal at this time.

If there's anyone who can offer ideas or help, and wishes to keep
possible answers off the public board, please email me. i realize this
is a long shot, and i doubt that even if there IS a way, that anyone
would be willing/able. But i gotta try.

Thanks all.
-joe

Aug 10 '06 #1
6 2098
joe t. wrote:
The subject may sound a little cryptic, so i'll try my best to explain.
Details are unavailable, as i am under a nondisclosure agreement, but
i'm looking for general principles and tips, not necessarily fixes for
existing code.
<snip long winded explanation>

So you want to copy someone else's data and you've only got an HTTP
interface intended for humans to the remote system.

There's plenty of companies doing this already - no need to be shy.

How simple it is depends on how well their site is written - assuming it is
well written you should be able to parse the pages with the XML parser. How
to get the pages? That's rather up to you - you could use a site ripper
like pavuk or write your own spider, e.g. using snoopy.

HTH

C.
Aug 10 '06 #2
On 10 Aug 2006 14:25:33 -0700, "joe t." <th*******@gmail.comwrote:
>There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.

i know that sounds shady, but the login is legitimate, the data access
is legitimate, and the credentials are also valid. The problem is, i
can't request a direct database link to the server hosting the actual
data because of this nondisclosure agreement. It would require
divulging the reasons for the need for such access, which my employer
is not willing to reveal at this time.

If there's anyone who can offer ideas or help, and wishes to keep
possible answers off the public board, please email me. i realize this
is a long shot, and i doubt that even if there IS a way, that anyone
would be willing/able. But i gotta try.
Whilst this sort of situation is never the best way of doing things, sometimes
it's the only way. If you really do have to go down this route then there is a
particularly nice Perl module called WWW::Mechanize.

Obviously it's not PHP, but you can call Perl from PHP.

http://search.cpan.org/search?query=...anize&mode=all

Whilst you're in Perl, then it also has various HTML parsing modules, the most
obvious being HTML::Parser, which can deal with HTML even if it's of dubious
quality.

http://search.cpan.org/~gaas/HTML-Parser-3.55/Parser.pm

So combined you can have a Perl script that does all the hard stuff and then
returns its results in an easily machine-readable form to PHP.

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Aug 10 '06 #3
In article <11*********************@74g2000cwt.googlegroups.c om>,
joe t. <th*******@gmail.comwrote:
>There is a website that requires me to log in using a web-form.
...
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction.
I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.

It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.

-Alex
Aug 11 '06 #4

axlq wrote:
In article <11*********************@74g2000cwt.googlegroups.c om>,
joe t. <th*******@gmail.comwrote:
There is a website that requires me to log in using a web-form.
...
Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction.

I once did this to gather a huge amount of historical data from a
horse-racing web site. I had to write the application in Java. It
would log in with my userID and password, submit queries to forms,
save the HTML result pages sent back, then parse the tabular data in
those pages into comma-delimited text data.

It was a much bigger project than I anticipated. I suspect there
are some macro automation tools out there that will let you do it
more easily.

-Alex

Thanks all of you for the suggestions. i will investigate these options
and try to report back on success.
-joe

Aug 11 '06 #5
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>

Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Aug 13 '06 #6
joe t. wrote:
<snip>
There is a website that requires me to log in using a web-form.
Obviously, POST vars are sent and verified and on success i'm given a
Session and/or Cookie. Within this logged-in area, there are links
leading to data query result pages. "Click here for your recent
transactions" kind of thing.

Those results pages are what i want to get to, but through some kind of
script that parses the results that get served out, not by user
interaction. i want to send a request for a link within that logged in
area and have the results served to my script, then parse out specific
data from those results and in turn serve them to a user in my own
page.
<snip>

Such "web scraping" can be done with cURL <http://in.php.net/curl>
(need to set cookie support). Not all sites would allow web scraping
and will try to block automation with "CAPTCHA" (google it). Some sites
will even use Ajax based rendering which will then make the cURL
process a big tough (though I heard that cURL can work with Mozilla
JavaScript engine). In that case, it will be better to go for Delphi or
VB 6 as we can use WebBrowser component and can automate clicks, etc
with DOM object.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Aug 13 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: | last post by:
Please help. After a number of wrong turns and experiments I need advice on login management system to secure our web pages without inconveniencing our visitors or our internal staff. What I...
2
by: Bobby | last post by:
Hello everyone I have a question. The school I am working for is in the beginning process of having a webpage that will direct students to download there homework and be able to view there info...
0
by: Jim | last post by:
I need some help getting started with a .NET web project for a commercial site. I am new to .NET and my understanding of some (but not all) of its concepts is a little sparse. I apologize for the...
8
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- ...
6
by: David Lozzi | last post by:
Howdy, I'm new to classes. Below is my class User. (is this a reserved namespace or class?) It works great, kind of. If I specify the username and password, the correct firstname and lastname...
3
by: Amil | last post by:
Please don't repond to this if you are guessing or just don't know the answer. I'm trying to login to a backend system running Java/Tomcat. I create a HttpWebRequest with the login data and do...
2
by: Calvin KD | last post by:
Hi everyone, Can someone suggest a way of monitoring the number of logins for each user in a particular session to make sure that a particular user cannot log in twice in the same session? I have...
0
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...
5
by: camilin87 | last post by:
hello. I'm building a site using php I have a setup.php page wich has at the begining session_start(); and every single page from my site includes setup.php. When a user registers I save in...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.