|
Hello,
I've got an interesting project at work.
I need to gather data from a sequence of web pages. The data is stored in standard HTML tables.
This is fine, as I had figured I could build a JavaScript piece of code which could gather the information off the page, and then navigate to the next page.
The problems I'm faced with are:
a) JavaScript cannot write to a file, so I cannot save the data collected out.
b) The sequence of pages are navigated not through hyperlinks, but using a form submission button marked More.
I decided that this must be doable (given the number of web crawlers out there), but I'm having difficulty deciding where to start. I can use either Java, JavaScript and php, or VC#.NET as a platform for this.
I would like to know how Java can make HTTP requests, navigate the web, access the DOM, and make form submissions.
I can see that there is a W3C library in the J2SE API library for the DOM, but I don't know how to navigate the web in Java or how to link the page I retrieve through the DOM API.
Any help would be greatly appreciated.
Regards,
Rob.
|