471,078 Members | 828 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,078 software developers and data experts.

Regex replace question

Hi:
I have a XML like
<?xml version="1.0" ?>
<object>
<comments>www.site.com/page.aspx?param1=value1&param2=value2</comments>
</object>

Since "&" is invalid in XML, I need to replace all "&" to "&amp;" only
within <comments> tag, so I need to build a Regex pattern to replace "&"
only between <comments> and </comments>.
Anybody has idea how to make it?
Thanks!

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy
Nov 15 '05 #1
4 4479
I'd try something like:

(?<start>\<comments\>.+?)&(.+?<end>\</comments\>)

The .+? is a non-greedy match, so you won't match anything in between.
You'll need to refere to the start and end captures in your replacement
string so that that part of the text ends up back in the string.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Hardy Wang" <ha********@marketrend.com> wrote in message
news:eS**************@TK2MSFTNGP12.phx.gbl...
Hi:
I have a XML like
<?xml version="1.0" ?>
<object>
<comments>www.site.com/page.aspx?param1=value1&param2=value2</comments>
</object>

Since "&" is invalid in XML, I need to replace all "&" to "&amp;" only
within <comments> tag, so I need to build a Regex pattern to replace "&"
only between <comments> and </comments>.
Anybody has idea how to make it?
Thanks!

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy

Nov 15 '05 #2
Hi
[inline]
"Hardy Wang" <ha********@marketrend.com> wrote in message
news:eS**************@TK2MSFTNGP12.phx.gbl...
Hi:
I have a XML like
<?xml version="1.0" ?>
<object>
<comments>www.site.com/page.aspx?param1=value1&param2=value2</comments>
</object>

Since "&" is invalid in XML, I need to replace all "&" to "&amp;" only
within <comments> tag, so I need to build a Regex pattern to replace "&"
only between <comments> and </comments>.
Anybody has idea how to make it?
Because .NET supports variable lookbehind (which is special) you can do
something like this:
string ouput = Regex.Replace(input,
"(?<=\\<comments\\>[^\\<\\>]*?)&(?=[^\\<\\>]*\\</comments\\>)", "&amp");

Using lookahead and lookbehind to make sure the & is inside comments tags.

HTH,
greetings



Thanks!

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy

Nov 15 '05 #3
Thanks man, because my program need to receive a text file passed from a
third party, unfortunately we cannot control the output from other side. The
text file SHOULD be just a XML document, sadly there are some "&" in it. So
that is the reason I will clean up them.

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy
"Nick Malik" <ni*******@hotmail.nospam.com> wrote in message
news:lDCQb.110670$Rc4.804177@attbi_s54...
You stated >> I have a XML like<<
Clearly, you don't have XML, because the string is not well formed. I
assume, therefore, that you are actually CREATING the xml in your code.

so while creating the XML document in code, and you want to replace all of
the & characters because the resulting XML would be invalid.

Why not just create an XML object, add the <object> node, under it add the
<comments> node, and in that provide the text. The XML object will escape
the chararacters for you when you output the document.

Creating the object in a string is the problem.

On the other hand, if you are creating it in code, you can replace all of
the invalid characters BEFORE placing it in the XML tags. I believe that
there is a method similar to HTMLEncode that will do this for you... and
then you can add the resulting string to the tags.

So, two solutions... neither requiring difficult Regex programming.

Hope this helps,
--- Nick

"Hardy Wang" <ha********@marketrend.com> wrote in message
news:eS**************@TK2MSFTNGP12.phx.gbl...
Hi:
I have a XML like
<?xml version="1.0" ?>
<object>
<comments>www.site.com/page.aspx?param1=value1&param2=value2</comments>
</object>

Since "&" is invalid in XML, I need to replace all "&" to "&amp;" only
within <comments> tag, so I need to build a Regex pattern to replace "&"
only between <comments> and </comments>.
Anybody has idea how to make it?
Thanks!

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy


Nov 15 '05 #4
I think this would do it for you:

string goodXML = Regex.Replace(badXML,
@"(?<=\<comments\>.*)&(?=.*\</comments\>)", "&amp;")

Regular expressions are in a strange but seemingly beautiful domain ;-)

cheers,

mortb

"Hardy Wang" <ha********@marketrend.com> wrote in message
news:%2****************@TK2MSFTNGP11.phx.gbl...
Thanks man, because my program need to receive a text file passed from a
third party, unfortunately we cannot control the output from other side. The text file SHOULD be just a XML document, sadly there are some "&" in it. So that is the reason I will clean up them.

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy
"Nick Malik" <ni*******@hotmail.nospam.com> wrote in message
news:lDCQb.110670$Rc4.804177@attbi_s54...
You stated >> I have a XML like<<
Clearly, you don't have XML, because the string is not well formed. I
assume, therefore, that you are actually CREATING the xml in your code.

so while creating the XML document in code, and you want to replace all of the & characters because the resulting XML would be invalid.

Why not just create an XML object, add the <object> node, under it add the <comments> node, and in that provide the text. The XML object will escape the chararacters for you when you output the document.

Creating the object in a string is the problem.

On the other hand, if you are creating it in code, you can replace all of the invalid characters BEFORE placing it in the XML tags. I believe that there is a method similar to HTMLEncode that will do this for you... and
then you can add the resulting string to the tags.

So, two solutions... neither requiring difficult Regex programming.

Hope this helps,
--- Nick

"Hardy Wang" <ha********@marketrend.com> wrote in message
news:eS**************@TK2MSFTNGP12.phx.gbl...
Hi:
I have a XML like
<?xml version="1.0" ?>
<object>
<comments>www.site.com/page.aspx?param1=value1&param2=value2</comments> </object>

Since "&" is invalid in XML, I need to replace all "&" to "&amp;" only
within <comments> tag, so I need to build a Regex pattern to replace "&" only between <comments> and </comments>.
Anybody has idea how to make it?
Thanks!

--
WWW: http://hardywang.1accesshost.com
ICQ: 3359839
yours Hardy



Nov 15 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Theo Chakkapark | last post: by
6 posts views Thread by tshad | last post: by
17 posts views Thread by clintonG | last post: by
9 posts views Thread by Whitless | last post: by
6 posts views Thread by Martin Evans | last post: by
15 posts views Thread by morleyc | last post: by
reply views Thread by Karch | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.