473,804 Members | 2,101 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regular expressions in java

I am writing a javascript that will make an http request, sort through
the html for any links on the page, and then store them for future
processing.

To test things, I pasted html source code into my text editor, ran a
GREP search using the following regular expression:

(<a [^>]*href\s*=\s*")([^"]+)("[^>]*>)

And the appropriate links were highlighted correctly. I then
implemented this regular expression into my javascript. However, the
javascript only matched one out of 10 links on the page I was
searching. Can anyone

(a.) tell me what's wrong with my regular expression
(b.) suggest a better one?

Here's my javascript code. I've even tried it two ways (as you'll see
by the commented code):

// make the page request
var xmlhttp;

if (window.XMLHttp Request)
{
xmlhttp=new XMLHttpRequest( );
xmlhttp.open("G ET",querystring ,false);
xmlhttp.send(nu ll);

//just to verify I'm getting something back:
//document.write( xmlhttp.respons eText);
}

var htmltext = xmlhttp.respons eText;

//Method 1:
//create a pattern matcher and execute it
var reg = new RegExp("");
reg.compile('(< a class=l [^>]*href\s*=\s*")([^"]+)("[^>]*>)');
var searchresults = reg.exec(htmlte xt);

//Method 2:
//use the string.match() function
//var searchresults = htmltext.match( '(<a class=l
[^>]*href\s*=\s*")([^"]*)("[^>]*>)');

Apr 17 '06 #1
1 2292
Your Subject line doesn't seem to have any relationship to your problem, so
I've changed it.

pa************* ***@hotmail.com wrote:
//Method 1:
//create a pattern matcher and execute it
var reg = new RegExp("");
reg.compile('(< a class=l [^>]*href\s*=\s*")([^"]+)("[^>]*>)');
var searchresults = reg.exec(htmlte xt);


Inside a string \s is simply another way to write s. If you want a
backslash in a string you must escape it.

Why are you using strings to create your regular expression at all? Why are
you trying to allow for spaces at a point where they aren't allowed?

var reg = /(<a class=l [^>]*href=")([^"]+)("[^>]*>)/;

should get you a bit further.

Final question, do you really need regular expressions here at all? You are
making a lot of assumptions about exactly how the element has been written:
no quotes round the class attribute, double quotes round the href
attribute, class appears before href. If you parse the responseText into an
HTML DOM then you can just use getElementsByTa gName('a') and then pick out
the ones with the right class. In other words, if you want to process HTML
you might be better loading it into an iframe rather than using
XMLHTTPRequest, or if you have control over what you are retrieving send
back XML instead of HTML.
Apr 17 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4187
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
1
2154
by: pawel | last post by:
I have made some comparision C# to Java RegularExpression. The problem was to find out if the rule match some text. Matching were done for precompiled regular expressions, in 100000 iterations loop. Those loops were executed 11 times and average value of consumend time was calculated. Below are codes for both classes. And I found, that Java implementation is 2 to 5 times faster than C# (it depends on complexity of expression). Maybe my...
2
5107
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I have to use all the expressions seperately? Here are my regular expressions that check for valid email address and link Dim Expression As String =
4
5187
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or adate LIKE "2004.01.10 __:15") .... into something like this: .... where adate LIKE "2004.01.10 __:(30/15)" ...
7
3833
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way to avoid multiple copies of a string. Any help will be highly appreciated
5
1451
by: Markus Innerebner | last post by:
Hello to everyone, Yesterday I tried long time to make a validation for a number format input field. As I am using Regex in Java I wrote following expression: String pattern = "((\\d+)((\\.)(\\d+)(\\,(\\d)+)?)?)|((\\d+)((\\,)(\\d+)(\\.(\\d)+)?)?)"; So the sequence of the chars of the input field could be?
19
2323
by: Davy | last post by:
Hi all, I am a C/C++/Perl user and want to switch to Python (I found Python is more similar to C). Does Python support robust regular expression like Perl? And Python and Perl's File content manipulation, which is better? Any suggestions will be appreciated!
1
4389
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more regular expressions match the string I want the first one returned) The code that does this has...
13
7497
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can be accomplished with regular expressions this way, such as validating a mathematical expression or parsing a language with nested parens, quoting or expressions. Another feature I'm missing is once-only subpatterns and possessive quantifiers...
0
10350
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10096
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9174
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6866
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5534
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4311
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3834
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3002
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.