469,356 Members | 2,033 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,356 developers. It's quick & easy.

Extracting information from String (regex, Beanshell)

11
Hello,

I am working with Taverna to build a workflow. Taverna has a beanshell where I can program in java. I am having some problems in writing a script. I want to extract information from a string, separated by newline. For this i am using regex.

The String is given:

P48534
EXP value is: e-10
Q0543
EXP value is: 4e-07


My script look like this in Beanshell:

Expand|Select|Wrap|Line Numbers
  1.  import java.util.regex.Matcher;
  2. import java.util.regex.Pattern;
  3.  
  4. Pattern pGI = Pattern.compile("(^.*?$)");
  5. Pattern pEvaLue = Pattern.compile("is: (.*)$");
  6. Matcher mGI;
  7. Matcher mEvalue;
  8. StringBuffer temp = new StringBuffer();
  9. String [] line = BlastReport.split("\n");
  10. int arraysize = line.length; 
  11.  
  12. for (int i=0; i<(arraysize-1); i+=2){
  13.     String sGI = line[i];
  14.     String sEvalue = line[i+1];    
  15.     mGI = pGI.matcher(sGI);
  16.     mEvalue = pEvalue.matcher(sEvalue);
  17.     String gi="";
  18.  
  19.     if (mGI.find()){
  20.         gi =mGI.group(1);
  21.     }
  22.     if (mEvalue.find()){
  23.         String eval = mEvalue.group(1);
  24.         if(eval.startsWith("e")){
  25.             eval= "1".concat(eval);
  26.         }
  27.         Double d = new Double (eval);
  28.         double Evalue = d.doubleValue();
  29.         if (Evalue<=0.02){
  30.             temp.append(gi + "\n");
  31.         }
  32.     }
  33. }
  34.  
  35. String result = temp.toString().trim();

The error message is: "Attempt to resolve method: matcher() on undefined variable or class name: pEvalue: at Line: 16: pEvalue .matcher(sEvalue)"


Can someone tell me why is giving me this error and how can i fix it.

Thank you in advance,

Mokita
Aug 6 '07 #1
10 10934
prometheuzz
197 Expert 100+
Hello,

I am working with Taverna to build a workflow. Taverna has a beanshell where I can program in java. I am having some problems in writing a script. I want to extract information from a string, separated by newline. For this i am using regex.

The String is given:

P48534
EXP value is: e-10
Q0543
EXP value is: 4e-07

...
It looks like you're trying to capture the text after the is:, right?
Try this:
Expand|Select|Wrap|Line Numbers
  1.  String text = "P48534\nEXP value is: e-10\nQ0543\nEXP value is: 4e-07";
  2. Pattern pattern = Pattern.compile("(?<=is:\\s)[^\\n]+");
  3. Matcher matcher = pattern.matcher(text);
  4.  
  5. while(matcher.find()) {
  6.   System.out.println(matcher.group());
  7. }
Aug 6 '07 #2
r035198x
13,262 8TB
Which one is your line 16?
Aug 6 '07 #3
Mokita
11
My line 16 is:
mEvalue = pEvalue.matcher(sEvalue);

Mokita
Aug 6 '07 #4
r035198x
13,262 8TB
My line 16 is:
mEvalue = pEvalue.matcher(sEvalue);

Mokita
Java is case sensitive.
Now give yourself a kick.
Aug 6 '07 #5
prometheuzz
197 Expert 100+
My line 16 is:
mEvalue = pEvalue.matcher(sEvalue);

Mokita
That splitting of your String and increasing with 2 in your for loop looks dangerous. Perhaps I (or someone else) can suggest a better approach. But then you first need to explain what is you're trying to do.

So given the String:
Expand|Select|Wrap|Line Numbers
  1. "P48534\nEXP value is: e-10\nQ0543\nEXP value is: 4e-07"
what is it you're trying to extract and/or group?
Aug 6 '07 #6
prometheuzz
197 Expert 100+
My line 16 is:
mEvalue = pEvalue.matcher(sEvalue);

Mokita
Try this:
Expand|Select|Wrap|Line Numbers
  1. String text = "P48534\nEXP value is: e-10\nQ0543\nEXP value is: 4e-07";
  2. Pattern pattern = Pattern.compile("([A-Z]\\d+).*\\n?.*((?<=is:\\s)[^\\n]+)");
  3. Matcher matcher = pattern.matcher(text);
  4.  
  5. System.out.println(text+"\n");
  6.  
  7. while(matcher.find()) {
  8.   String id = matcher.group(1);
  9.   String sVal = matcher.group(2);
  10.   sVal = sVal.startsWith("e") ? 1+sVal : sVal;
  11.   double dVal = Double.parseDouble(sVal);
  12.   System.out.println(id+"\t"+dVal);
  13. }
Aug 6 '07 #7
Mokita
11
That splitting of your String and increasing with 2 in your for loop looks dangerous. Perhaps I (or someone else) can suggest a better approach. But then you first need to explain what is you're trying to do.

So given the String:
Expand|Select|Wrap|Line Numbers
  1. "P48534\nEXP value is: e-10\nQ0543\nEXP value is: 4e-07"
what is it you're trying to extract and/or group?

I am trying to do a workflow with taverna, which has a beanshell, where i can write a script in it.
In my workflow i will have the output of a blast search, GI number, which are the P23234 or Q12344 or A12443 or only numbers and also a E-value, which is the EXP vaule is: e-10.
From that output i want to extract the GI numbers which have an e-value<= 0.02. The way i thought i could extract was with regular expressions in java, but the way i wrote the script it is not working.

I think it is more clear what i want to do, but if you need more explanation please ask.

Mokita
Aug 6 '07 #8
r035198x
13,262 8TB
I am trying to do a workflow with taverna, which has a beanshell, where i can write a script in it.
In my workflow i will have the output of a blast search, GI number, which are the P23234 or Q12344 or A12443 or only numbers and also a E-value, which is the EXP vaule is: e-10.
From that output i want to extract the GI numbers which have an e-value<= 0.02. The way i thought i could extract was with regular expressions in java, but the way i wrote the script it is not working.

I think it is more clear what i want to do, but if you need more explanation please ask.

Mokita
I don't think regex is best for this (I could be wrong of course). You are not searching for a pattern (which is where I mostly use my regex) but you are searching for numbers within some range.
P.S I hope you managed to correct the spelling mistake for that variable.
Aug 6 '07 #9
Mokita
11
Hello

I want to thank you for your help, it is working. I also want to ask you a quick question.
How can i change group 1 ([A-Z]\\d+) to catch: P00DF3 or 1234653 or Q5647GJD or A4658DF
It is not catching the ones with letters in the midle.

Thank you again,

Mokita
Aug 6 '07 #10
prometheuzz
197 Expert 100+
...
How can i change group 1 ([A-Z]\\d+) to catch: P00DF3 or 1234653 or Q5647GJD or A4658DF
It is not catching the ones with letters in the midle.

Thank you again,

Mokita
Try replacing "([A-Z]\\d+)" with "(\\w+)"
Aug 6 '07 #11

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

5 posts views Thread by Markus Ernst | last post: by
3 posts views Thread by Stephan Bour | last post: by
1 post views Thread by John Seeliger | last post: by
4 posts views Thread by Mokita | last post: by
11 posts views Thread by Ebenezer | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.