473,386 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Difficulties POSTing to RDP Hierarchy Browse Page

Hello,
I'm trying to write a tool to scrape through some of the Ribosomal
Database Project II's (http://rdp.cme.msu.edu/) pages, specifically,
through the Hierarchy Browser. (http://rdp.cme.msu.edu/hierarchy/)
The Hierarchy Browser is accessed first through a page with a form.
There are four fields with several options to be chosen from (Strain,
Source, Size, and Taxonomy) and then a submit button labeled "Browse".
The HTML of the form is as follows (note, I am also including the
Javascript code, as it is called by the submit button):

--------excerpted HTML----------------
<script language="Javascript">

function resetHiddenVar(){
var f_form = document.forms['hierarchyForm'];
f_form.action= "HierarchyControllerServlet/start";
return ;
}

</script>

<form name="hierarchyForm" method="POST"
action="HierarchyControllerServlet/start/">
<input type='hidden' name='printParams' value='no' />

<h1>Hierarchy Browser - Start</h1><div class="cart" style="float:
right">[&nbsp;<a href="hb_help.jsp">help</a>&nbsp;]</div>

<p>&nbsp;</p>
<div id="options">

<table summary="options area" cellpadding="0" cellspacing="0"
border="0"><tr><td align="left" valign="middle">
<table border="0" cellspacing="0" cellpadding="0" summary="Options"
align="left" class="borderup">
<tr>
<th align="right" valign="middle" class="bottom greenbg"
nowrap="nowrap">Strain:</th>
<td class="bottom formtext" nowrap="nowrap"><input id="type"
name="strain" type="radio" value="type">
<label for="type">Type</label></td>
<td class="bottom formtext" nowrap="nowrap"><input id="nontype"
name="strain" type="radio" value="nontype">
<label for="nontype">Non Type</label>&nbsp;</td>
<td class="bottom formtext" nowrap="nowrap"><input name="strain"
type="radio" id="strainboth" value="both" checked>
<label for="strainboth">Both</label>&nbsp;</td>

</tr>
<tr>
<th align="right" valign="middle" class="bottom greenbg">Source:</th>
<td class="bottom formtext" nowrap="nowrap"><input id="environmental"
name="source" type="radio" value="environ">
<label for="environmental">Uncultured&nbsp;</label></td>
<td class="bottom formtext" nowrap="nowrap"><input id="isolates"
name="source" type="radio" value="isolates">
<label for="isolates">Isolates</label></td>
<td class="bottom formtext" nowrap="nowrap"><input name="source"
type="radio" id="sourceboth" value="both" checked >
<label for="sourceboth">Both</label></td>
</tr>

<tr>
<th align="right" valign="middle" class="bottom greenbg">Size:</th>
<td class="bottom formtext" nowrap="nowrap"><input
id="greaterthan1200" name="size" type="radio" value="gt1200" checked>
<label for="greaterthan1200"><u>&gt;</u>1200</label></td>
<td class="bottom formtext" nowrap="nowrap"><input id="lessthan1200"
name="size" type="radio" value="lt1200">
<label for="lessthan1200">&lt;1200</label></td>
<td class="bottom formtext" nowrap="nowrap"><input id="sizeboth"
name="size" type="radio" value="both">
<label for="sizeboth">Both</label></td>
</tr>
<tr>

<th align="right" valign="middle" class="bottom
greenbg">Taxonomy:</th>
<td class="bottom formtext" nowrap="nowrap"><input id="bergeys"
name="taxonomy" type="radio" value="rdpHome" checked>
<label for="bergeys">Bergey's</label></td>
<td colspan="2" class="bottom formtext" nowrap="nowrap"><input
id="ncbi" name="taxonomy" type="radio" value="ncbiHome">
<label for="ncbi">NCBI</label></td>
</tr>
</table>
</td>
<td align="left" valign="middle">&nbsp;&nbsp;&nbsp;
<input name="browse" type="submit" id="browse"
onclick="resetHiddenVar(); return true;" value="Browse">

</td></tr></table></p>
</div>
<!-- end options -->
</form>
----------end excerpted HTML--------------
The options I would like to simulate are browsing by strain=type,
source=both, size = gt1200, and taxonomy = bergeys. I see that the
form method is POST, and I read through the urllib documentation, and
saw that the syntax for POSTing is urllib.urlopen(url, data). Since
the submit button calls HierarchyControllerServlet/start (see the
Javascript), I figure that the url I should be contacting is
http://rdp.cme.msu.edu/hierarchy/Hie...rServlet/start
Thus, I came up with the following test code:

--------Python test code---------------
#!/usr/bin/python

import urllib

options = [("strain", "type"), ("source", "both"),
("size", "gt1200"), ("taxonomy", "bergeys"),
("browse", "Browse")]

params = urllib.urlencode(options)

rdpbrowsepage = urllib.urlopen(
"http://rdp.cme.msu.edu/hierarchy/HierarchyControllerServlet/start",
params)

pagehtml = rdpbrowsepage.read()

print pagehtml
---------end Python test code----------
However, the page that is returned is an error page that says the
request could not be completed. The correct page should show various
bacterial taxonomies, which are clickable to reveal greater detail of
that particular taxon.

I'm a bit stumped, and admittedly, I am in over my head on the subject
matter of networking and web-clients. Perhaps I should be using the
httplib module for connecting to the RDP instead, but I am unsure what
methods I need to use to do this. This is complicated by the fact that
these are JSP generated pages and I'm unsure what exactly the server
requires before giving up the desired page. For instance, there's a
jsessionid that's given and I'm unsure if this is required to access
pages, and if it is, how to place it in POST requests.

If anyone has suggestions, I would greatly appreciate them. If any
more information is needed that I haven't provided, please let me know
and I'll be happy to give what I am able. Thanks very, very much in
advance.

Chris
Jul 18 '05 #1
1 2264
[Chris Lasher]
I'm trying to write a tool to scrape through some of the Ribosomal
Database Project II's (http://rdp.cme.msu.edu/) pages, specifically,
through the Hierarchy Browser. (http://rdp.cme.msu.edu/hierarchy/)
I'm sure that urllib is the right tool to use. However, there may be one
or two problems with the way you're using it.
--------excerpted HTML----------------
<!-- snip -->
<form name="hierarchyForm" method="POST"
action="HierarchyControllerServlet/start/">
<input type='hidden' name='printParams' value='no' />
This is an omission from the params you are passing to the
HierarchyServlet. Although the "printParams" field is not visible to you
in a browser, the browser still submits a name/value pair in its form
submission. So you should also in your code, as shwon below.
<input id="bergeys" name="taxonomy" type="radio" value="rdpHome" checked>
Also, you are using the wrong value for the taxonomy field. You are
setting a value of "bergeys", which is the ID of the field, not its
value. The correct value is "rdpHome".
--------Python test code---------------
#!/usr/bin/python

import urllib

options = [("strain", "type"), ("source", "both"),
("size", "gt1200"), ("taxonomy", "bergeys"),
("browse", "Browse")]
Try this

options = [ ("printParams", "no"), ("strain", "type"),
("source", "both"), ("size", "gt1200"),
("taxonomy", "rdpHome"), ("browse", "Browse"),]

params = urllib.urlencode(options)

rdpbrowsepage = urllib.urlopen(
"http://rdp.cme.msu.edu/hierarchy/HierarchyControllerServlet/start",
params)

pagehtml = rdpbrowsepage.read()

print pagehtml
---------end Python test code----------


HTH,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Julia | last post by:
Hi, I am still having Charest conversion difficulties s when passing string from C# TO ASP and than to access using ADO I am using HttpWebRequest to POST some Multilanguage(Hebrew and...
4
by: Larry Brindise | last post by:
I have an asp.net application. I have used VStudio Web Deployment Project to create the MSI file. I copy the MSI file from my developer PC to my test server running Win2003Server Web Edition. I...
3
by: Bennett Haselton | last post by:
I want to display a hierarchical listing of items from a database table, where, say, each row in the table has an "ID" field and a "parent_id" field giving the ID of its parent (NULL if it's at the...
13
by: Ian.Suttle | last post by:
I am have been researching this issue to no end, so any help would be very much appreciated. I have a page with form tags. Inside of the form tags is a panel that contains a user control. The...
5
by: Mike Moore | last post by:
I need to create something very similiar to the browse folder capability. This would allow me to support the following - A user would create a document on their server, then they would need to...
5
by: Bryan | last post by:
I am trying to get to a label control to get its Text value. from a previous page. The label control is buried in a Web User Control that is in a webpart zone. When I use this code:...
29
by: Gernot Frisch | last post by:
Hi, I have no clue. - I want to align the red, green, blue boxes in one line - red,green,blue must be 45px high - red (center) must be as wide as possible - yellow must start exactly below...
8
by: =?Utf-8?B?UGV0ZXJX?= | last post by:
I install Visual Studio 2005 Pro on Vista. I open and migrate a 2003 web project to 2005. I attempt to browse an aspx file from the Solution Exploer. It displays a blank html page. I create a...
0
tjc0ol
by: tjc0ol | last post by:
Hi guys, I'm a newbie of this stuffs, We had a small office network (1 Windows 2K - Server) and (3 Windows XP - Client). I am testing to 1 PC (Windows 2K) installed with Licensed Wingate...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.