473,396 Members | 1,760 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Removing obscure chars

Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on a
certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","™")
s = Replace(s,"—","-")
s = Replace(s,"’",""")
s = Replace(s,"'",""")
s = Replace(s,"""",""")
s = Replace(s,"&","&")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need be?

Thanks

Apr 3 '07 #1
3 5633
Gazing into my crystal ball I observed "Yobbo" <in**@NoSpamIt.com>
writing in news:ug**************@TK2MSFTNGP05.phx.gbl:
Hi All

I have an ASP function in place to strip invalid chars out of a data
store before I create an XML file of this data, but my function
doesn't work on a certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)
I detest these "smart" quotes. Are regular quotes dumb by comparison?
>
Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking
for the html equiv rather than the actual char, but I can't possibly
get away with simply copy and pasting these friggin(!!) chars into my
function. Surely this is bad practise?
You are putting in the HTML entity, you may need to put the ascii
character instead, for example:
s = replace(s,chr(60),"&gt;")
>
Does anybody know how I can trap and replace/remove these chars if
need be?

Thanks

HTH

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Apr 4 '07 #2
Yobbo wrote on Tue, 3 Apr 2007 18:17:59 +0100:
Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on
a certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need
be?
Your function is quite limited. What happens when a character not in your
list appears? The XML supported entity list is pretty small.

Here's the function I use in my own XML generation code, it's crude but it works:

function XMLEncode(strText)

'loop through code and replace all non-alphanumeric characters with their
ascii value
strNewText = ""

For i = 1 to Len(strText)

j = Asc(Mid(strText,i,1))

If j = 10 Then
'replace tab with a line break
strNewText= strNewText & "&lt;br&gt;"
ElseIf j = 13 or j = 9 then 'cr, lf, tab
'strip them
ElseIf j = 34 then
strNewText = strNewText & "&quot;"
ElseIf j = 39 then
strNewText = strNewText & "&apos;"
ElseIf j = 32 or j = 45 or (j >=49 and j <= 57) or (j >=65 and j <= 90) or
(j >= 97 and j <= 122) then
'ok
strNewText = strNewText & Mid(strText,i,1)
ElseIf j = 38 Then '&
strNewText = strNewText & "&amp;"
ElseIf j = 60 then '<
strNewText = strNewText & "&lt;"
ElseIf j = 62 then '>
strNewText = strNewText & "&gt;"
Else
strNewText = strNewText & "&#" & j & ";"
End If

Next

XMLEncode = strNewText
End Function
This checks each character in the string in turn, and replaces some with
entities, and the rest of the non-printable characters with their numeric
value. You could easily add a few more entity replacements as required. Just
watch out for the first couple of replacements where I replace tabs with a
<br>, and strip out carriage returns and line feeds, as that might not fit
what you want do with the XML yourself.

Dan
Apr 4 '07 #3

"Yobbo" <in**@NoSpamIt.comwrote in message
news:ug**************@TK2MSFTNGP05.phx.gbl...
Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on
a
certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","&trade;")
s = Replace(s,"—","-")
s = Replace(s,"’","&quot;")
s = Replace(s,"'","&quot;")
s = Replace(s,"""","&quot;")
s = Replace(s,"&","&amp;")
s = Replace(s,"<","&lt;")
s = Replace(s,">","&gt;")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need
be?
>
Thanks
If you are creating an XML file can you use a DOMDocument to build it and
save it?
That'll ensure correct XML is created.

Apr 5 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ryan | last post by:
hi i am having a problem with a php script i am trying to write. the problem is that the ouput from the script (the html) was sometimes being truncated, but sometimes it would work as expected....
6
by: Aaron Collins | last post by:
Hello, i have a string that was entered in a web form and stored in a mysql database where the user entered carriage returns as well as <br>'s after each line in the form. the data is stored as...
10
by: Tom Szabo | last post by:
Hi All, Just wondering there is any other way to remove the toolbar from the browser than using " ...toolbar=no ..." in the window.open(...)
2
by: matt | last post by:
I have an upload script for a photo and a caption. It all goes pear shaped when I upload a character like ' " or / \ | Is there anyway I can parse through the filename when submitting the form and...
7
by: A.M. Kuchling | last post by:
python.org has a page of "Python vs. X" language comparisons at <http://www.python.org/doc/Comparisons.html>. They're all pretty outdated, and often unfair because they're written by a person who...
6
by: Batista, Facundo | last post by:
I'm doing a small program, in which the user will have the option to enter his/her password everytime, or just save it (to a file). So, is there a module to obscure the password text in a secure...
6
by: bruce | last post by:
hi... i'm running into a problem where i'm seeing non-ascii chars in the parsing i'm doing. in looking through various docs, i can't find functions to remove/restrict strings to valid ascii...
17
by: Eric_Dexter | last post by:
def simplecsdtoorc(filename): file = open(filename,"r") alllines = file.read_until("</CsInstruments>") pattern1 = re.compile("</") orcfilename = filename + "orc" for line in alllines: if not...
0
by: Yobbo | last post by:
Hi All I have an ASP function in place to strip invalid chars out of a data store before I create an XML file of this data, but my function doesn't work on a certain set of chars. As far as I...
3
by: MLH | last post by:
Back in mid-2003, lucason posted a question about removing punctuation chars from a string. Suggested code was posted using Replace function. Could the FN below be easily modified for use with A97...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.