473,226 Members | 1,405 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,226 software developers and data experts.

Another RegEx Problem

Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon

Jul 17 '05 #1
5 3790
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<zS*********************@news-text.cableinet.net>...
Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.
Jul 17 '05 #2
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Thanks.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om...
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message

news:<zS*********************@news-text.cableinet.net>... Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
>> Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.

Jul 17 '05 #3
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<mc***************@news-binary.blueyonder.co.uk>...
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Here is a simple example. Hope this helps.
<code>
import java.nio.*;
import java.nio.channels.*;
import java.io.*;
import java.util.regex.*;

public class TagBodyExtractor{

public static void main(String[] args){
String tagId, closingTag, inFileName;
boolean bodyOnly;

if (args.length < 1){
System.err.println("USAGE:");
System.err.println("java TagBodyExtractor filename");
System.err.println("or,");
System.err.println("java TagBodyExtractor tagtext filename");
System.exit(1);
}

if (args.length == 2){
tagId = args[0];
inFileName = args[1];
}
else{
tagId = "script"; // do to-lower on tags before using this prog
inFileName = args[0];
}
closingTag = "</" + tagId + ">";

bodyOnly = false; //output both tags and their bodies

try{
FileInputStream fis = new FileInputStream(inFileName);
FileChannel fc = fis.getChannel();
MappedByteBuffer mbf
= fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
byte[] barray = new byte[(int)(fc.size())];
mbf.get(barray);
String str = new String(barray, "US-ASCII");
//or //String str = new String(barray"); //use default

String match1, match2, match3;
//here we assume syntax-error-free html file!
String regex = "(<" + tagId + "[^>]*>)" //1st capturing group
+ "((?:\"[^\"]*\"|\'[^\']*\'|[^\"\'])*?(?="
+ closingTag + "))" //2nd capturing group
+ "(" + closingTag + ")"; //3rd capturing group
Pattern pat = Pattern.compile(regex, Pattern.DOTALL | Pattern.MULTILINE);
boolean hasMore = false;
Matcher mat = pat.matcher(str);
while (hasMore = mat.find()){
match1 = mat.group(1);
match2 = mat.group(2);
match3 = mat.group(3);
if (bodyOnly){
System.out.println(match2);
}
else{
System.out.println(match1 + match2 + match3);
}
}
fc.close();
fis.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
</code>
Jul 17 '05 #4
HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...

Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.
Jul 17 '05 #5
Hi.

Thanks, I have it working now.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om... HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...
Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website...
3
by: Jon Maz | last post by:
Hi All, Am getting frustrated trying to port the following (pretty simple) function to CSharp. The problem is that I'm lousy at Regular Expressions.... //from...
4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
6
by: Talin | last post by:
I've run in to this problem a couple of times. Say I have a piece of text that I want to test against a large number of regular expressions, where a different action is taken based on which regex...
5
by: redamazon200 | last post by:
I am looking for a way to copy a pattern (letter 'A' in the following example) to another string. string str1 = "1111AAAA111111AA"; string str2 = "1111000000001111"; After the copy str2...
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.