By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,737 Members | 1,989 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,737 IT Pros & Developers. It's quick & easy.

Another RegEx Problem

P: n/a
Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon

Jul 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<zS*********************@news-text.cableinet.net>...
Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.
Jul 17 '05 #2

P: n/a
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Thanks.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om...
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message

news:<zS*********************@news-text.cableinet.net>... Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
>> Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.

Jul 17 '05 #3

P: n/a
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<mc***************@news-binary.blueyonder.co.uk>...
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Here is a simple example. Hope this helps.
<code>
import java.nio.*;
import java.nio.channels.*;
import java.io.*;
import java.util.regex.*;

public class TagBodyExtractor{

public static void main(String[] args){
String tagId, closingTag, inFileName;
boolean bodyOnly;

if (args.length < 1){
System.err.println("USAGE:");
System.err.println("java TagBodyExtractor filename");
System.err.println("or,");
System.err.println("java TagBodyExtractor tagtext filename");
System.exit(1);
}

if (args.length == 2){
tagId = args[0];
inFileName = args[1];
}
else{
tagId = "script"; // do to-lower on tags before using this prog
inFileName = args[0];
}
closingTag = "</" + tagId + ">";

bodyOnly = false; //output both tags and their bodies

try{
FileInputStream fis = new FileInputStream(inFileName);
FileChannel fc = fis.getChannel();
MappedByteBuffer mbf
= fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
byte[] barray = new byte[(int)(fc.size())];
mbf.get(barray);
String str = new String(barray, "US-ASCII");
//or //String str = new String(barray"); //use default

String match1, match2, match3;
//here we assume syntax-error-free html file!
String regex = "(<" + tagId + "[^>]*>)" //1st capturing group
+ "((?:\"[^\"]*\"|\'[^\']*\'|[^\"\'])*?(?="
+ closingTag + "))" //2nd capturing group
+ "(" + closingTag + ")"; //3rd capturing group
Pattern pat = Pattern.compile(regex, Pattern.DOTALL | Pattern.MULTILINE);
boolean hasMore = false;
Matcher mat = pat.matcher(str);
while (hasMore = mat.find()){
match1 = mat.group(1);
match2 = mat.group(2);
match3 = mat.group(3);
if (bodyOnly){
System.out.println(match2);
}
else{
System.out.println(match1 + match2 + match3);
}
}
fc.close();
fis.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
</code>
Jul 17 '05 #4

P: n/a
HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...

Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.
Jul 17 '05 #5

P: n/a
Hi.

Thanks, I have it working now.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om... HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...
Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.