469,326 Members | 1,616 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,326 developers. It's quick & easy.

Removing obscure chars

Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on a
certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","™")
s = Replace(s,"—","-")
s = Replace(s,"’",""")
s = Replace(s,"'",""")
s = Replace(s,"""",""")
s = Replace(s,"&","&")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need be?

Thanks

Apr 3 '07 #1
3 5332
Gazing into my crystal ball I observed "Yobbo" <in**@NoSpamIt.com>
writing in news:ug**************@TK2MSFTNGP05.phx.gbl:
Hi All

I have an ASP function in place to strip invalid chars out of a data
store before I create an XML file of this data, but my function
doesn't work on a certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)
I detest these "smart" quotes. Are regular quotes dumb by comparison?
>
Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking
for the html equiv rather than the actual char, but I can't possibly
get away with simply copy and pasting these friggin(!!) chars into my
function. Surely this is bad practise?
You are putting in the HTML entity, you may need to put the ascii
character instead, for example:
s = replace(s,chr(60),"&gt;")
>
Does anybody know how I can trap and replace/remove these chars if
need be?

Thanks

HTH

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Apr 4 '07 #2
Yobbo wrote on Tue, 3 Apr 2007 18:17:59 +0100:
Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on
a certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need
be?
Your function is quite limited. What happens when a character not in your
list appears? The XML supported entity list is pretty small.

Here's the function I use in my own XML generation code, it's crude but it works:

function XMLEncode(strText)

'loop through code and replace all non-alphanumeric characters with their
ascii value
strNewText = ""

For i = 1 to Len(strText)

j = Asc(Mid(strText,i,1))

If j = 10 Then
'replace tab with a line break
strNewText= strNewText & "&lt;br&gt;"
ElseIf j = 13 or j = 9 then 'cr, lf, tab
'strip them
ElseIf j = 34 then
strNewText = strNewText & "&quot;"
ElseIf j = 39 then
strNewText = strNewText & "&apos;"
ElseIf j = 32 or j = 45 or (j >=49 and j <= 57) or (j >=65 and j <= 90) or
(j >= 97 and j <= 122) then
'ok
strNewText = strNewText & Mid(strText,i,1)
ElseIf j = 38 Then '&
strNewText = strNewText & "&amp;"
ElseIf j = 60 then '<
strNewText = strNewText & "&lt;"
ElseIf j = 62 then '>
strNewText = strNewText & "&gt;"
Else
strNewText = strNewText & "&#" & j & ";"
End If

Next

XMLEncode = strNewText
End Function
This checks each character in the string in turn, and replaces some with
entities, and the rest of the non-printable characters with their numeric
value. You could easily add a few more entity replacements as required. Just
watch out for the first couple of replacements where I replace tabs with a
<br>, and strip out carriage returns and line feeds, as that might not fit
what you want do with the XML yourself.

Dan
Apr 4 '07 #3

"Yobbo" <in**@NoSpamIt.comwrote in message
news:ug**************@TK2MSFTNGP05.phx.gbl...
Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on
a
certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need
be?
>
Thanks
If you are creating an XML file can you use a DOMDocument to build it and
save it?
That'll ensure correct XML is created.

Apr 5 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Ryan | last post: by
6 posts views Thread by Aaron Collins | last post: by
10 posts views Thread by Tom Szabo | last post: by
2 posts views Thread by matt | last post: by
7 posts views Thread by A.M. Kuchling | last post: by
6 posts views Thread by Batista, Facundo | last post: by
6 posts views Thread by bruce | last post: by
reply views Thread by Yobbo | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Gurmeet2796 | last post: by
reply views Thread by mdpf | last post: by
reply views Thread by listenups61195 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.