higabe <higabe@hotmail.com> writes:
[color=blue]
> Three questions
>
> 1)
>
> I have a string function that works perfectly but according to W3C.org
> web site is syntactically flawed because it contains the characters </
> in sequence. So how am I supposed to write this function?
>
> String.replace(/</g,'<');[/color]
Hmm, I can see that I have some of those too, the most recent of them
written today. Bummer. I never noticed that there was a </-sequence in
that.
Try
String.replace(/[<]/g,'<');
or
String.replace(RegExp("<","g"),'<');
[color=blue]
> 2)
>
> While I'm on the subject, anyone know why they implemented replace using
> a slash delimiter instead of quotes? I know it's how it's done in Perl
> but why is it done that way?[/color]
They didn't implement "replace" with slash-delimiters. They
implemented *regular expressions* with slash-delimiters. You can use
regular expressions in many other ways than just string-replace.
You could also write
var myRegExp = /[<]/g;
String.replace(myRegExp,'<');
These are equivalent uses of regular expressions and strings:
/a*b/i.exec("caabc")
and
"caabc".match(/a*b/i)
[color=blue]
> 3)
>
> One last regexp question:
> is it possible to do something like this:
> String.replace(/<(.*?)>(.*?)</$1>/ig,'<$1>$2</$1>');[/color]
Yes, but you need to escape the slash in "</" and it's "\1" instead of
"$1". Also you will only want to match the tag name, not attributes,
and you have no letters, so the "i" flag is not necessary. And don't
call a variable "String", since it conflicts with the global variable
holding the constructor of String objects.
So, this should do what you wanted:
string.replace(/<\s*(\w+)\b(.*?)>(.*?)<\/\1>/g,
'<$1$2>$3</$1>');
It is confuzed if ">" occurs inside an attribute, e.g. <tag
attr="foo>bar">. Just don't do that :)
It doesn't handle nested tags either. That is still outside the power
of regular expressions, even with backreference.
There are ways around that, though, using a function as second argument
of replace, allowing us to use recursion:
function tagify(string) {
return string.replace(/<\s*(\w+)\b(.*?)>(.*?)<\/\1>/g,
function(match,sub1,sub2,sub3) {
return "<"+sub1+sub2+">" +
tagify(sub3) +
"</"+sub1+">";
});
}
This still fails for elements with no closing tag. It could probably
be made to work for XHTML, where all tags have end tags (sometimes
abbreviated to just end in "/>"):
/<\s*(\w+)\b(|.*?[^/])(?:\/>|>(.*?)<\/\1>)/g
^start tag
^optional whitespace
^tagname
^optional attributes, not ending in /
^either >content</tagname> or just />
The XHTML parser would then be:
function tagify(string) {
return string.replace(
/<\s*(\w+)\b(|.*?[^/])(?:\/>|>(.*?)<\/\1>)/g,
function(match,sub1,sub2,sub3) {
return "<"+sub1+" "+sub2+
(sub3 !== undefined ?
">" + tagify(sub3) +
"</"+sub1+">" :
"/>");
});
}
Hmm. I feel stupid, considering the much larger parser for XHTML that
I made some time ago. Oh well, at least it handled ">" inside
attribute values :).
[color=blue]
> This is just an example where a sub-match used in a regular expression
> must sub-match again exactly as it did the first time later in the same
> string.[/color]
It works in recent versions of Javascript/ECMAScript. Earlier ones didn't
have non-greedy matches (*?) or backreferences (\1).
[color=blue]
> But I don't know how to do that in a regexp although it seems
> like it should be possible.[/color]
It is, and you were close.
Adding backreferences to regular expressions gives them more power than
"real" regular expressions, i.e., they can be used to match something that
is not a regular language. Example:
/^(11+)\1+$/
This regular expression matches any string of 1's that can be written
as two or more repetitions of two or more 1's. That is, unary representation
of composite numbers.
!/^(11+)\1+$/.test("--string of n 1's--")
is a test for whether n is prime (but not a very efficient one).
/L
--
Lasse Reichstein Nielsen -
lrn@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'