473,414 Members | 1,691 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

extracting from an XML file

hello sir,
My aim is to extract 'id' and 'ac' from given XML files,and store the results in two different files.the code i wrote can extract 'ids',and give the output in a file.But i cant extract 'ac'.I want to extract all values of ac ,for eg
ac="Q708T3",ie the output file should contain only Q708T3.
Kindly provide a solution.

The input file( ie XML ) is as follows:

<?xml version="1.0" ?>
- <EBIApplicationResult xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/ApplicationResult.xsd">
- <Header>
<program name="WU-blastp" version="2.0MP-WashU [01-Jan-2006]" citation="PMID:12824421" />
- <parameters>
- <sequences total="1">
<sequence number="1" name="Sequence" type="p" length="149" />
</sequences>
- <databases total="1" sequences="241242" letters="88541632">
<database number="1" name="swissprot" type="p" created="2006-10-29T23:34:03+00:00" />
</databases>
<scores>100</scores>
<alignments>50</alignments>
<matrix>BLOSUM62</matrix>
<expectationUpper>10</expectationUpper>
<statistics>sump</statistics>
</parameters>
<timeInfo start="2006-10-31T21:11:01+00:00" end="2006-10-31T21:11:03+00:00" search="PT02S" />
</Header>
- <SequenceSimilaritySearchResult>
- <hits total="31">
- <hit number="1" database="swissprot" id="MT_PODSI" ac="Q708T3" length="63" description="Metallothionein (MT).">
- <alignments total="2">
- <alignment number="1">
<score>48</score>
<bits>22.0</bits>
<expectation>0.051</expectation>
<probability>0.050</probability>
<identity>40</identity>
<positives>40</positives>
<querySeq start="15" end="34">NCHITINASECCLCCL--CCLC</querySeq>
<pattern>NC T CC CC C C</pattern>
<matchSeq start="24" end="45">NCKCTSCKKSCCSCCPAGCAKC</matchSeq>
</alignment>
- <alignment number="2">
<score>33</score>
<bits>16.7</bits>
<expectation>0.051</expectation>
<probability>0.050</probability>
<identity>45</identity>
<positives>54</positives>
<querySeq start="58" end="68">RCNTFCXCLEP</querySeq>
<pattern>+C C C EP</pattern>
<matchSeq start="44" end="54">KCAKSCVCKEP</matchSeq>
</alignment>
</alignments>
</hit>
- <hit number="2" database="swissprot" id="IBB4_DOLAX" ac="P01059" length="76" description="Bowman-Birk type proteinase inhibitor DE-4.">
- <alignments total="1">
- <alignment number="1">
<score>62</score>
<bits>26.9</bits>
<expectation>0.19</expectation>
<probability>0.18</probability>
<identity>27</identity>
<positives>44</positives>
<querySeq start="2" end="36">CIDICMAMMALIANCHIT-INASECCLCCLCCLCIL</querySeq>
<pattern>C D+C ++ CH + + C C C+C L</pattern>
<matchSeq start="15" end="50">CCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICAL</matchSeq>
</alignment>
</alignments>
</hit>
- <hit number="3" database="swissprot" id="IBBC2_SOYBN" ac="P01063" length="83" description="Bowman-Birk type proteinase inhibitor C-II precursor.">
- <alignments total="1">
- <alignment number="1">
<score>61</score>
<bits>26.5</bits>
<expectation>0.26</expectation>
<probability>0.23</probability>
<identity>32</identity>
<positives>44</positives>
<querySeq start="2" end="34">CIDICMAMMALIANCHIT-INASECCLCCLCCLC</querySeq>
<pattern>C D+CM ++ CH I + C C C C</pattern>
<matchSeq start="21" end="54">CCDLCMCTASMPPQCHCADIRLNSCHSACDRCAC</matchSeq>
</alignment>
</alignments>
</hit>
etc...


the code i wrote is:
[code]

import java.io.*;
import java.lang.*;
import java.util.*;
import java.sql.*;

public class NameHandler
{

public static void main(String[] args)
{
new NameHandler().runProgram();
}

public void runProgram()
{
try
{
PrintWriter pw1 = new PrintWriter (new FileWriter("outIDS.txt"));
String line="";

String swissprot = "swissprot";
BufferedReader br1=new BufferedReader(new FileReader("blast-20061031-21110099.xml"));
int i=0;
while((line=br1.readLine())!= null)
{


if(line.startsWith(" <hit number"))
{
i++;
if(i<=10)
{
String eleminate =" <hit number="+i+"database="+"swissprot"+" "+"id="+"\"";


String valuefrom = new NameHandler().getElement(line,eleminate);

String trimmed = valuefrom.trim();
pw1.println(trimmed);
}

}


}
pw1.flush();
pw1.close();



}
catch(Exception e)
{}
}
public String getElement(String line, String tagName)
{

int length = tagName.length();
line = line.substring(length);

String value="";



System.out.println("index="+length);


value = line.substring(5,line.lastIndexOf(" ac")-1);


return value;


}
}
Nov 15 '06 #1
1 1386
r035198x
13,262 8TB
hello sir,
My aim is to extract 'id' and 'ac' from given XML files,and store the results in two different files.the code i wrote can extract 'ids',and give the output in a file.But i cant extract 'ac'.I want to extract all values of ac ,for eg
ac="Q708T3",ie the output file should contain only Q708T3.
Kindly provide a solution.

The input file( ie XML ) is as follows:

<?xml version="1.0" ?>
- <EBIApplicationResult xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/ApplicationResult.xsd">
- <Header>
<program name="WU-blastp" version="2.0MP-WashU [01-Jan-2006]" citation="PMID:12824421" />
- <parameters>
- <sequences total="1">
<sequence number="1" name="Sequence" type="p" length="149" />
</sequences>
- <databases total="1" sequences="241242" letters="88541632">
<database number="1" name="swissprot" type="p" created="2006-10-29T23:34:03+00:00" />
</databases>
<scores>100</scores>
<alignments>50</alignments>
<matrix>BLOSUM62</matrix>
<expectationUpper>10</expectationUpper>
<statistics>sump</statistics>
</parameters>
<timeInfo start="2006-10-31T21:11:01+00:00" end="2006-10-31T21:11:03+00:00" search="PT02S" />
</Header>
- <SequenceSimilaritySearchResult>
- <hits total="31">
- <hit number="1" database="swissprot" id="MT_PODSI" ac="Q708T3" length="63" description="Metallothionein (MT).">
- <alignments total="2">
- <alignment number="1">
<score>48</score>
<bits>22.0</bits>
<expectation>0.051</expectation>
<probability>0.050</probability>
<identity>40</identity>
<positives>40</positives>
<querySeq start="15" end="34">NCHITINASECCLCCL--CCLC</querySeq>
<pattern>NC T CC CC C C</pattern>
<matchSeq start="24" end="45">NCKCTSCKKSCCSCCPAGCAKC</matchSeq>
</alignment>
- <alignment number="2">
<score>33</score>
<bits>16.7</bits>
<expectation>0.051</expectation>
<probability>0.050</probability>
<identity>45</identity>
<positives>54</positives>
<querySeq start="58" end="68">RCNTFCXCLEP</querySeq>
<pattern>+C C C EP</pattern>
<matchSeq start="44" end="54">KCAKSCVCKEP</matchSeq>
</alignment>
</alignments>
</hit>
- <hit number="2" database="swissprot" id="IBB4_DOLAX" ac="P01059" length="76" description="Bowman-Birk type proteinase inhibitor DE-4.">
- <alignments total="1">
- <alignment number="1">
<score>62</score>
<bits>26.9</bits>
<expectation>0.19</expectation>
<probability>0.18</probability>
<identity>27</identity>
<positives>44</positives>
<querySeq start="2" end="36">CIDICMAMMALIANCHIT-INASECCLCCLCCLCIL</querySeq>
<pattern>C D+C ++ CH + + C C C+C L</pattern>
<matchSeq start="15" end="50">CCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICAL</matchSeq>
</alignment>
</alignments>
</hit>
- <hit number="3" database="swissprot" id="IBBC2_SOYBN" ac="P01063" length="83" description="Bowman-Birk type proteinase inhibitor C-II precursor.">
- <alignments total="1">
- <alignment number="1">
<score>61</score>
<bits>26.5</bits>
<expectation>0.26</expectation>
<probability>0.23</probability>
<identity>32</identity>
<positives>44</positives>
<querySeq start="2" end="34">CIDICMAMMALIANCHIT-INASECCLCCLCCLC</querySeq>
<pattern>C D+CM ++ CH I + C C C C</pattern>
<matchSeq start="21" end="54">CCDLCMCTASMPPQCHCADIRLNSCHSACDRCAC</matchSeq>
</alignment>
</alignments>
</hit>
etc...


the code i wrote is:
[code]

import java.io.*;
import java.lang.*;
import java.util.*;
import java.sql.*;

public class NameHandler
{

public static void main(String[] args)
{
new NameHandler().runProgram();
}

public void runProgram()
{
try
{
PrintWriter pw1 = new PrintWriter (new FileWriter("outIDS.txt"));
String line="";

String swissprot = "swissprot";
BufferedReader br1=new BufferedReader(new FileReader("blast-20061031-21110099.xml"));
int i=0;
while((line=br1.readLine())!= null)
{


if(line.startsWith(" <hit number"))
{
i++;
if(i<=10)
{
String eleminate =" <hit number="+i+"database="+"swissprot"+" "+"id="+"\"";


String valuefrom = new NameHandler().getElement(line,eleminate);

String trimmed = valuefrom.trim();
pw1.println(trimmed);
}

}


}
pw1.flush();
pw1.close();



}
catch(Exception e)
{}
}
public String getElement(String line, String tagName)
{

int length = tagName.length();
line = line.substring(length);

String value="";



System.out.println("index="+length);


value = line.substring(5,line.lastIndexOf(" ac")-1);


return value;


}
}
In your xml file,
1) is it always the case that id and ac occur in lines starting with <hit number = ..?
2)Does ac always appear immediately after id?
3)Did you say you can get ids just fine?
Nov 15 '06 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Nazgul | last post by:
Hi! I want to implement a small tool in Python for distributing "patches" and I need Your advice. This application should be able to package all files chosen by a user into a self-extracting.exe...
10
by: Calvin FONG | last post by:
Dear all, Are there any utility that can be call by python to create self extracting zip file. I'm now using the powerarchiever. But the command line options aren't flexible enough. Basically, I...
2
by: Avi | last post by:
hi, Can anyone tell me what the problem is and how to solve it The following piece of code resides on an asp page on the server and is used to download files from the server to the machine...
5
by: Astra | last post by:
Hi All Is there an ASP way of extracting the height and width of a swf file so that I can specify these dims when adding the whole OBJECT code to the web page? Thanks Robbie
0
by: k_nil | last post by:
I have a link on my web page for a self extracting executable file placed on the server. When the link is clicked, 1) i could see dialog box with open or save options 2) when open clicked, self...
1
by: Terry Olsen | last post by:
Ok, now that I've got my disk imager program working, I'd like to attach a "self-extractor" to the front end of the image file and make it a self-extracting disk image executable file. The idea...
2
by: bjm | last post by:
I created a self extracting zip file with about 9000 files in it. I extracted it manually from the command line without a problem. However, when I tried to do the same extraction at the same...
6
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has...
4
by: dexter48 | last post by:
Hi I'm searching for a string occurance in a text file. I find the string ok and write the results to a log file. But on the line above is also some information I need. How can i get that. The string...
4
by: Ant | last post by:
Hi all, My kids have a bunch of games that have to be run from CD (on Windows XP). Now they're not very careful with them, and so I have a plan. I've downloaded a utility (Daemon Tools) which...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.