473,765 Members | 2,203 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Difficulties POSTing to RDP Hierarchy Browse Page

Hello,
I'm trying to write a tool to scrape through some of the Ribosomal
Database Project II's (http://rdp.cme.msu.edu/) pages, specifically,
through the Hierarchy Browser. (http://rdp.cme.msu.edu/hierarchy/)
The Hierarchy Browser is accessed first through a page with a form.
There are four fields with several options to be chosen from (Strain,
Source, Size, and Taxonomy) and then a submit button labeled "Browse".
The HTML of the form is as follows (note, I am also including the
Javascript code, as it is called by the submit button):

--------excerpted HTML----------------
<script language="Javas cript">

function resetHiddenVar( ){
var f_form = document.forms['hierarchyForm'];
f_form.action= "HierarchyContr ollerServlet/start";
return ;
}

</script>

<form name="hierarchy Form" method="POST"
action="Hierarc hyControllerSer vlet/start/">
<input type='hidden' name='printPara ms' value='no' />

<h1>Hierarchy Browser - Start</h1><div class="cart" style="float:
right">[&nbsp;<a href="hb_help.j sp">help</a>&nbsp;]</div>

<p>&nbsp;</p>
<div id="options">

<table summary="option s area" cellpadding="0" cellspacing="0"
border="0"><tr> <td align="left" valign="middle" >
<table border="0" cellspacing="0" cellpadding="0" summary="Option s"
align="left" class="borderup ">
<tr>
<th align="right" valign="middle" class="bottom greenbg"
nowrap="nowrap" >Strain:</th>
<td class="bottom formtext" nowrap="nowrap" ><input id="type"
name="strain" type="radio" value="type">
<label for="type">Type </label></td>
<td class="bottom formtext" nowrap="nowrap" ><input id="nontype"
name="strain" type="radio" value="nontype" >
<label for="nontype">N on Type</label>&nbsp;</td>
<td class="bottom formtext" nowrap="nowrap" ><input name="strain"
type="radio" id="strainboth " value="both" checked>
<label for="strainboth ">Both</label>&nbsp;</td>

</tr>
<tr>
<th align="right" valign="middle" class="bottom greenbg">Source :</th>
<td class="bottom formtext" nowrap="nowrap" ><input id="environment al"
name="source" type="radio" value="environ" >
<label for="environmen tal">Uncultured &nbsp;</label></td>
<td class="bottom formtext" nowrap="nowrap" ><input id="isolates"
name="source" type="radio" value="isolates ">
<label for="isolates"> Isolates</label></td>
<td class="bottom formtext" nowrap="nowrap" ><input name="source"
type="radio" id="sourceboth " value="both" checked >
<label for="sourceboth ">Both</label></td>
</tr>

<tr>
<th align="right" valign="middle" class="bottom greenbg">Size:</th>
<td class="bottom formtext" nowrap="nowrap" ><input
id="greaterthan 1200" name="size" type="radio" value="gt1200" checked>
<label for="greatertha n1200"><u>&gt;</u>1200</label></td>
<td class="bottom formtext" nowrap="nowrap" ><input id="lessthan120 0"
name="size" type="radio" value="lt1200">
<label for="lessthan12 00">&lt;1200</label></td>
<td class="bottom formtext" nowrap="nowrap" ><input id="sizeboth"
name="size" type="radio" value="both">
<label for="sizeboth"> Both</label></td>
</tr>
<tr>

<th align="right" valign="middle" class="bottom
greenbg">Taxono my:</th>
<td class="bottom formtext" nowrap="nowrap" ><input id="bergeys"
name="taxonomy" type="radio" value="rdpHome" checked>
<label for="bergeys">B ergey's</label></td>
<td colspan="2" class="bottom formtext" nowrap="nowrap" ><input
id="ncbi" name="taxonomy" type="radio" value="ncbiHome ">
<label for="ncbi">NCBI </label></td>
</tr>
</table>
</td>
<td align="left" valign="middle" >&nbsp;&nbsp;&n bsp;
<input name="browse" type="submit" id="browse"
onclick="resetH iddenVar(); return true;" value="Browse">

</td></tr></table></p>
</div>
<!-- end options -->
</form>
----------end excerpted HTML--------------
The options I would like to simulate are browsing by strain=type,
source=both, size = gt1200, and taxonomy = bergeys. I see that the
form method is POST, and I read through the urllib documentation, and
saw that the syntax for POSTing is urllib.urlopen( url, data). Since
the submit button calls HierarchyContro llerServlet/start (see the
Javascript), I figure that the url I should be contacting is
http://rdp.cme.msu.edu/hierarchy/Hie...rServlet/start
Thus, I came up with the following test code:

--------Python test code---------------
#!/usr/bin/python

import urllib

options = [("strain", "type"), ("source", "both"),
("size", "gt1200"), ("taxonomy", "bergeys"),
("browse", "Browse")]

params = urllib.urlencod e(options)

rdpbrowsepage = urllib.urlopen(
"http://rdp.cme.msu.edu/hierarchy/HierarchyContro llerServlet/start",
params)

pagehtml = rdpbrowsepage.r ead()

print pagehtml
---------end Python test code----------
However, the page that is returned is an error page that says the
request could not be completed. The correct page should show various
bacterial taxonomies, which are clickable to reveal greater detail of
that particular taxon.

I'm a bit stumped, and admittedly, I am in over my head on the subject
matter of networking and web-clients. Perhaps I should be using the
httplib module for connecting to the RDP instead, but I am unsure what
methods I need to use to do this. This is complicated by the fact that
these are JSP generated pages and I'm unsure what exactly the server
requires before giving up the desired page. For instance, there's a
jsessionid that's given and I'm unsure if this is required to access
pages, and if it is, how to place it in POST requests.

If anyone has suggestions, I would greatly appreciate them. If any
more information is needed that I haven't provided, please let me know
and I'll be happy to give what I am able. Thanks very, very much in
advance.

Chris
Jul 18 '05 #1
1 2288
[Chris Lasher]
I'm trying to write a tool to scrape through some of the Ribosomal
Database Project II's (http://rdp.cme.msu.edu/) pages, specifically,
through the Hierarchy Browser. (http://rdp.cme.msu.edu/hierarchy/)
I'm sure that urllib is the right tool to use. However, there may be one
or two problems with the way you're using it.
--------excerpted HTML----------------
<!-- snip -->
<form name="hierarchy Form" method="POST"
action="Hierarc hyControllerSer vlet/start/">
<input type='hidden' name='printPara ms' value='no' />
This is an omission from the params you are passing to the
HierarchyServle t. Although the "printParam s" field is not visible to you
in a browser, the browser still submits a name/value pair in its form
submission. So you should also in your code, as shwon below.
<input id="bergeys" name="taxonomy" type="radio" value="rdpHome" checked>
Also, you are using the wrong value for the taxonomy field. You are
setting a value of "bergeys", which is the ID of the field, not its
value. The correct value is "rdpHome".
--------Python test code---------------
#!/usr/bin/python

import urllib

options = [("strain", "type"), ("source", "both"),
("size", "gt1200"), ("taxonomy", "bergeys"),
("browse", "Browse")]
Try this

options = [ ("printParam s", "no"), ("strain", "type"),
("source", "both"), ("size", "gt1200"),
("taxonomy", "rdpHome"), ("browse", "Browse"),]

params = urllib.urlencod e(options)

rdpbrowsepage = urllib.urlopen(
"http://rdp.cme.msu.edu/hierarchy/HierarchyContro llerServlet/start",
params)

pagehtml = rdpbrowsepage.r ead()

print pagehtml
---------end Python test code----------


HTH,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1214
by: Julia | last post by:
Hi, I am still having Charest conversion difficulties s when passing string from C# TO ASP and than to access using ADO I am using HttpWebRequest to POST some Multilanguage(Hebrew and English) to an ASP page which in turn uses ADO to save them to an access data base the access data base support(as I underatdn( UNICODE strings),I can insert Hebrew string fro access IDE)
4
1892
by: Larry Brindise | last post by:
I have an asp.net application. I have used VStudio Web Deployment Project to create the MSI file. I copy the MSI file from my developer PC to my test server running Win2003Server Web Edition. I run the MSI to install - looks good so far - I see the global directory, etc. I try opening the app in IE via www.myweb.com/aspxfolder; where aspxfolder is the global directory. I get the wonderful "HTTP Error 404 - File or directory not found"...
3
2043
by: Bennett Haselton | last post by:
I want to display a hierarchical listing of items from a database table, where, say, each row in the table has an "ID" field and a "parent_id" field giving the ID of its parent (NULL if it's at the top level of the hierarchy) -- like message posts and their replies. Is there a built-in way to do this, or a generally accepted simplest way? My first idea was to create a user control like HierarchicalListing that contains a Repeater, and...
13
2752
by: Ian.Suttle | last post by:
I am have been researching this issue to no end, so any help would be very much appreciated. I have a page with form tags. Inside of the form tags is a panel that contains a user control. The form tags are NOT inside of the user control. Inside that user control are a few panels that contains different steps of an application (for applying to something). When I go to step 1 and submit, the asp:validators catch the errors and the...
5
3136
by: Mike Moore | last post by:
I need to create something very similiar to the browse folder capability. This would allow me to support the following - A user would create a document on their server, then they would need to login to the application which we are building to 1). the user would locate their purchase request record in the application. 2). browse the directory for the newly created document 3). find it 4). they would click a save button - this would...
5
1446
by: Bryan | last post by:
I am trying to get to a label control to get its Text value. from a previous page. The label control is buried in a Web User Control that is in a webpart zone. When I use this code: foreach(Control c in PreviousPage.Controls) { and try to find the control here. }
29
2113
by: Gernot Frisch | last post by:
Hi, I have no clue. - I want to align the red, green, blue boxes in one line - red,green,blue must be 45px high - red (center) must be as wide as possible - yellow must start exactly below the line - yellow must be left aligned with red one.
8
1977
by: =?Utf-8?B?UGV0ZXJX?= | last post by:
I install Visual Studio 2005 Pro on Vista. I open and migrate a 2003 web project to 2005. I attempt to browse an aspx file from the Solution Exploer. It displays a blank html page. I create a master page with 3 content placeholders. I create a WebForm that uses the master and then browse it to have a look. I get an empty html page. I change from the IIS to the internal server. No change. Only displays blank page.
0
2765
tjc0ol
by: tjc0ol | last post by:
Hi guys, I'm a newbie of this stuffs, We had a small office network (1 Windows 2K - Server) and (3 Windows XP - Client). I am testing to 1 PC (Windows 2K) installed with Licensed Wingate 6.2.2 - i.e. my Server and 1 Windows XP as my client. The Server which is Windows 2K has 2 NICs on it, One is connected to the Modem - Gilat Satellite, and the other is connected to Cnet 8 port Switch with a default IP Address 192.168.0.1 and my...
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9398
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10156
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8831
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7375
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2805
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.