473,406 Members | 2,467 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Convert CDATA expression to Javascript RegExp

Max
Hello everyone!

Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
(Char* ']]>' Char*)" to Javascript Regular Expression?

Thanks,

Max
Feb 13 '07 #1
7 3944
Translation to English: A CDATA's value can contain any legal XML
characters except the three-character sequence ]](which is used to
terminate the value.

I don't do Javascript, so you'll have to translate it the rest of the
way yourself.
--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Feb 13 '07 #2
On 13 Feb, 17:38, Max <a...@tiscali.itwrote:
Hello everyone!

Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
(Char* ']]>' Char*)" to Javascript Regular Expression?

Thanks,

Max
Doing regular expressions that end with a string of characters is
slightly involved. You need to do something like:

/([^\]]*|][^\]]|]][^>]|]]?$)*/

Not the easiest thing to see! Maybe the best thing is to break it
into it's component parts. e.g.:

var no_bracket = "[^\]]*";
var one_bracket = "][^\]]";
var two_brackets = "]][^>]";
var end_bracket = "]]?$";

var expr = "/(" + no_bracket + "|" + one_bracket + "|" + two_bracket +
+ "|" + end_bracket + ")*/";

I'll admit I haven't tested it, but hopefully it gives you an idea!
(The $ anchor may not work where it is. In which case try \Z in its
place.)

HTH,

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
http://www.codalogic.com/lmx
(or http://www.xml2cpp.com)
=============================================

Feb 13 '07 #3
On 13 Feb, 20:38, use...@tech-know-ware.com wrote:
On 13 Feb, 17:38, Max <a...@tiscali.itwrote:
Hello everyone!
Can anyone help me to convert the CDATA expression "CDATA ::= (Char* -
(Char* ']]>' Char*)" to Javascript Regular Expression?
Thanks,
Max

Doing regular expressions that end with a string of characters is
slightly involved. You need to do something like:

/([^\]]*|][^\]]|]][^>]|]]?$)*/

Not the easiest thing to see! Maybe the best thing is to break it
into it's component parts. e.g.:

var no_bracket = "[^\]]*";
var one_bracket = "][^\]]";
var two_brackets = "]][^>]";
var end_bracket = "]]?$";

var expr = "/(" + no_bracket + "|" + one_bracket + "|" + two_bracket +
+ "|" + end_bracket + ")*/";

I'll admit I haven't tested it, but hopefully it gives you an idea!
(The $ anchor may not work where it is. In which case try \Z in its
place.)
I was thinking more about this over night. The details of the regular
expression depend on what input string you want to apply the matching
on. If you could give an idea of the types of strings you want the
match to be applied (e.g. whole XML message, or element text etc) to
it might be possible to have a better pattern.

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
http://www.codalogic.com/lmx
(or http://www.xml2cpp.com)
=============================================


Feb 14 '07 #4
Max
Hello Pete!

I have written this regular expression:

<!\\[CDATA\\[(((?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)(]]>(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)*)]]>

I break it into these component parts:

XParser.CHAR =
"(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])";
XParser.CDSTART = "<!\\[CDATA\\[";
XParser.CDATA = "((" + XParser.CHAR + "*?)(]]>" + XParser.CHAR + "*?)*)";
XParser.CDEND = "]]>";
XParser.CDSECT = XParser.CDSTART + XParser.CDATA + XParser.CDEND;

XML code example:

<![CDATA[this child is of <<<>nodeType CDATA]]>

The problem is been born expanding the simple regular expression for
CDATA ('(" + XParser.CHAR + "*?)') with the feature to capture more
markup ']]>'.
But in this way it capture also two or more CDSECT...

Example:
1 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]>
Capture: this child is of <<<>nodeType CDATA

2 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]><![CDATA[this
child is of <<<>nodeType CDATA]]>
Capture: this child is of <<<>nodeType CDATA]]><![CDATA[this child is of
<<<>nodeType CDATA

Is it possible to resolve this?

Thanks in advance,

Max
Feb 14 '07 #5
This sounds like it's really a Javascript programming question rather
than an XML question, since the question is how to express something in
that language's reg-exp syntax rather than what to express. So you might
get better answers by asking in a Javascript newsgroup than here.

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Feb 14 '07 #6
(After all, most of us just use an existing XML parser and let *it* deal
with syntax.)

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Feb 14 '07 #7
On 14 Feb, 14:59, Max <a...@tiscali.itwrote:
Hello Pete!

I have written this regular expression:

<!\\[CDATA\\[(((?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFF*D]|[\\u10000-\\u10FFFF])*?)(]]>(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]*|[\\uE000-\\uFFFD]|[\\u10000-\\u10FFFF])*?)*)]]>

I break it into these component parts:

XParser.CHAR =
"(?:\\u0009|\\u000A|\\u000D|[\\u0020-\\uD7FF]|[\\uE000-\\uFFFD]|[\\u10000-\*\u10FFFF])";
XParser.CDSTART = "<!\\[CDATA\\[";
XParser.CDATA = "((" + XParser.CHAR + "*?)(]]>" + XParser.CHAR + "*?)*)";
XParser.CDEND = "]]>";
XParser.CDSECT = XParser.CDSTART + XParser.CDATA + XParser.CDEND;

XML code example:

<![CDATA[this child is of <<<>nodeType CDATA]]>

The problem is been born expanding the simple regular expression for
CDATA ('(" + XParser.CHAR + "*?)') with the feature to capture more
markup ']]>'.
But in this way it capture also two or more CDSECT...

Example:
1 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]>
Capture: this child is of <<<>nodeType CDATA

2 Tag: <![CDATA[this child is of <<<>nodeType CDATA]]><![CDATA[this
child is of <<<>nodeType CDATA]]>
Capture: this child is of <<<>nodeType CDATA]]><![CDATA[this child is of
<<<>nodeType CDATA

Is it possible to resolve this?

Thanks in advance,

Max
Hi Max,

In this case I think you need to rework your XParser.CDATA rule along
the lines of the following:

// You could write these using a similar approach to your XParser.CHAR
if you prefer
var no_bracket = "[^\\]]*";
var one_bracket = "][^\\]]";
var two_brackets = "]][^>]";

XParser.CDATA = "(" + no_bracket + "|" + one_bracket + "|" +
two_bracket + ")*" + "]*";

The logic is basically:

if( current char is not ] ||
current char is ] AND next char is NOT ] ||
current char is ] and the next char is ] and the next one is NOT
)
then OK;

which is more easily understood as:

if( current char is not ] ) then OK;
else if( current char is ] AND next char is NOT ] ) then OK;
else if( current char is ] and the next char is ] and the next one is
NOT ) then OK;

The end just allow any number of ] characters if necessary.

HTH,

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
(or http://www.xml2cpp.com)
=============================================

Feb 14 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Leila | last post by:
Hi, I am having a problem retrieving the html tags from my XML document when it's being loaded into a DOM object. For example, my xml contains the following: <my:InsideView> .. ..
3
by: Gopinath | last post by:
Hi JavaScript Gurus, I've a question on Regular Expressions using RegExp object. I just want to know whether it is possible to do the search (see below) using RegExp. Any pointers would be of...
9
by: Harry | last post by:
Hi there, does anyone know how I can build a regular expression e.g. for the string.search() function on runtime, depending on the content of variables? Should be something like this: var...
3
by: Balaras | last post by:
Hi, Can sombody here please help me a bit with a regular expression. I have a xml file where I need to strip the CDATA sections of any contained data. Eg. <xml> <tag><]></tag>...
4
by: joe_rattz | last post by:
I need to convert a text string ("Dewey & Cheatham & Howe") to an XML encoded string ("Dewey &amp; Cheatham &amp; Howe"). I am not building an XML document, I am just trying to convert a single string. I...
9
by: MLibby | last post by:
How do I convert an HTML page into XML? My initial thought is to convert the page to xslt but I'm not sure how to do this. Please provide any source code examples if you have them. Thanks, Mike...
8
by: Rajeev Soni | last post by:
Hi I am looking for the regular expression for validating the allowed file types to upload like files like "zip,pdf,doc,rtf,gif,jpg,png,txt"; and the expression should not be case sensitive like...
7
by: intrader | last post by:
The regular expression is /(?!((00000)|(11111)))/ in oRe. That is oRE=/(?!((00000)|(11111)))/ The test strings are 92708, 00000, 11111 in checkStr The expression used is checkStr.search(oRE). The...
4
by: Velhari | last post by:
Hi all, I wrote the following Javascript function used to execute the Javascript codes from the ajax response which contains both html & javascript. It works fine, if the javascript codes from...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.