473,395 Members | 2,795 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Another RegEx Problem

Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon

Jul 17 '05 #1
5 3798
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<zS*********************@news-text.cableinet.net>...
Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.
Jul 17 '05 #2
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Thanks.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om...
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message

news:<zS*********************@news-text.cableinet.net>... Hi Everyone.

Ok I have a problem getting the following regex to work in Java.

<script[^>]*>(.|\r|\n)+?</script>

It works fine in EditPad Pro but in Java it causes the following error
message when ran:-

'Exception in thread "main" java.lang.StackOverflowError'

Any ideas why?

Best Regards
>> Andrew Dixon


(1)Use MULTILINE flag instead of \r|\n etc.
(2)Use non-capturing-group parenthesization.
(3)Guard against </script> and quotes in the above parens.

Jul 17 '05 #3
"Andrew Dixon" <da******@NOREPLY.yahoo.co.uk> wrote in message news:<mc***************@news-binary.blueyonder.co.uk>...
Hi.

Sorry, I'm not really understanding what you mean, could you show me an
example or re-write my expression.

Here is a simple example. Hope this helps.
<code>
import java.nio.*;
import java.nio.channels.*;
import java.io.*;
import java.util.regex.*;

public class TagBodyExtractor{

public static void main(String[] args){
String tagId, closingTag, inFileName;
boolean bodyOnly;

if (args.length < 1){
System.err.println("USAGE:");
System.err.println("java TagBodyExtractor filename");
System.err.println("or,");
System.err.println("java TagBodyExtractor tagtext filename");
System.exit(1);
}

if (args.length == 2){
tagId = args[0];
inFileName = args[1];
}
else{
tagId = "script"; // do to-lower on tags before using this prog
inFileName = args[0];
}
closingTag = "</" + tagId + ">";

bodyOnly = false; //output both tags and their bodies

try{
FileInputStream fis = new FileInputStream(inFileName);
FileChannel fc = fis.getChannel();
MappedByteBuffer mbf
= fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
byte[] barray = new byte[(int)(fc.size())];
mbf.get(barray);
String str = new String(barray, "US-ASCII");
//or //String str = new String(barray"); //use default

String match1, match2, match3;
//here we assume syntax-error-free html file!
String regex = "(<" + tagId + "[^>]*>)" //1st capturing group
+ "((?:\"[^\"]*\"|\'[^\']*\'|[^\"\'])*?(?="
+ closingTag + "))" //2nd capturing group
+ "(" + closingTag + ")"; //3rd capturing group
Pattern pat = Pattern.compile(regex, Pattern.DOTALL | Pattern.MULTILINE);
boolean hasMore = false;
Matcher mat = pat.matcher(str);
while (hasMore = mat.find()){
match1 = mat.group(1);
match2 = mat.group(2);
match3 = mat.group(3);
if (bodyOnly){
System.out.println(match2);
}
else{
System.out.println(match1 + match2 + match3);
}
}
fc.close();
fis.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
</code>
Jul 17 '05 #4
HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...

Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.
Jul 17 '05 #5
Hi.

Thanks, I have it working now.

--

Best Regards
Andrew Dixon
"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om... HG******@nifty.ne.jp (hiwa) wrote in message news:<68**************************@posting.google. com>...
Note: This particular regex string just happens to have no dot '.',
line-head '^' and line-tail '$' regexp operators. So, DOTALL and
MULTILINE flags aren't necessary for this particular case. But
specifying these flags is a good practice when you handle a multi-line
document in a single matcher loop.

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website...
3
by: Jon Maz | last post by:
Hi All, Am getting frustrated trying to port the following (pretty simple) function to CSharp. The problem is that I'm lousy at Regular Expressions.... //from...
4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
6
by: Talin | last post by:
I've run in to this problem a couple of times. Say I have a piece of text that I want to test against a large number of regular expressions, where a different action is taken based on which regex...
5
by: redamazon200 | last post by:
I am looking for a way to copy a pattern (letter 'A' in the following example) to another string. string str1 = "1111AAAA111111AA"; string str2 = "1111000000001111"; After the copy str2...
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.