473,770 Members | 2,096 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What's the proper way to remove entity escapes from a string?

I have XML replies in a DOM which contain entity escapes,
like "&". What's the proper way to replace them with
the ordinary characters? Preferably something that will work in
most browsers?

I know about ".innerText ", but that's not portable; some
browsers convert escapes when reading from innerText and some
don't.

John Nagle
May 4 '07 #1
3 8196
John Nagle said the following on 5/4/2007 5:41 PM:
I have XML replies in a DOM which contain entity escapes,
like "&". What's the proper way to replace them with
the ordinary characters? Preferably something that will work in
most browsers?
Use a regular expression.
I know about ".innerText ", but that's not portable;
True, but not for the reasons you stated. Or, are you referring to
innerHTML?
some browsers convert escapes when reading from innerText and some
don't.
IE is the only browser that supports innerText and, AFAIK, all versions
handle innerText the same.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javas cript FAQ - http://jibbering.com/faq/index.html
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
May 4 '07 #2
John Nagle wrote:

Hi,
I have XML replies in a DOM which contain entity escapes,
like "&". What's the proper way to replace them with
the ordinary characters? Preferably something that will work in
most browsers?
It's best to let the browser do the translation, and get back the
translated value.

An HTML entity is used to represent some character: the HTML parser
makes the conversion while building the DOM tree, after which the entity
is completely gone, the DOM tree being an object, made of typed nodes.
Therefore, a sound approach to solve your issue is to pass the entity to
the parser (using the innerHTML method, for instance) and get back the
value directly from the DOM tree (using DOM methods).

Tested IE7, FF2, O9. Returns the expanded string, or the empty string if
not supported.

---
<script type="text/javascript">
String.prototyp e.expandHtmlEnt ities=function( ){
var div=null;
var result="";
if(document.cre ateElement) {
div=document.cr eateElement("di v");
if(
typeof div.innerHTML!= "undefined" &&
typeof div.firstChild! ="undefined"
){
result=this.val ueOf().replace(
/&[a-z0-9#]+;/gi,
function(a){
return div.innerHTML=a , div.firstChild. nodeValue;
}
);
}
}
return result;
}

alert("Test : &lt;€&gt;".expa ndHtmlEntities( ));
</script>
---

HTH,
Elegie.
May 5 '07 #3
John Nagle wrote:
I have XML replies in a DOM which contain entity escapes,
like "&amp;". What's the proper way to replace them with
the ordinary characters? Preferably something that will work
in most browsers?
Valid XML knows only 5 (default) character entities: 'quot', 'amp',
'apos', 'lt' and 'gt'.
So:

var ent = new Object()
ent['quot'] = 34
ent['amp'] = 38
ent['apos'] = 39
ent['lt'] = 60
ent['gt'] = 62
var xml = '<root>a &lt; &apos;</root>'
for (var i in ent)
xml = xml.replace(new RegExp('&'+i+'; ','gi'),
String.fromChar Code(ent[i]));
xml = xml.replace(/(&)(#)(\d{1,})( ;)/g,
function (tot,amp,cr,cp, sem) {
return String.fromChar Code(cp)
}
)
alert(xml)

You can use more, but those need to be declared in your XML's DTD in
order to remain valid.

In practice this border is often vague. I would suggest to add HTML4
character entities as well to play broad:

var ent = new Object()
ent['apos'] = 39 // from XML, not present in HTML4
ent['quot'] = 34
ent['amp'] = 38
ent['lt'] = 60
ent['gt'] = 62
ent['nbsp'] = 160
ent['iexcl'] = 161
ent['cent'] = 162
ent['pound'] = 163
ent['curren'] = 164
ent['yen'] = 165
ent['brvbar'] = 166
ent['sect'] = 167
ent['uml'] = 168
ent['copy'] = 169
ent['ordf'] = 170
ent['laquo'] = 171
ent['not'] = 172
ent['shy'] = 173
ent['reg'] = 174
ent['macr'] = 175
ent['deg'] = 176
ent['plusmn'] = 177
ent['sup2'] = 178
ent['sup3'] = 179
ent['acute'] = 180
ent['micro'] = 181
ent['para'] = 182
ent['middot'] = 183
ent['cedil'] = 184
ent['sup1'] = 185
ent['ordm'] = 186
ent['raquo'] = 187
ent['frac14'] = 188
ent['frac12'] = 189
ent['frac34'] = 190
ent['iquest'] = 191
ent['Agrave'] = 192
ent['Aacute'] = 193
ent['Acirc'] = 194
ent['Atilde'] = 195
ent['Auml'] = 196
ent['Aring'] = 197
ent['AElig'] = 198
ent['Ccedil'] = 199
ent['Egrave'] = 200
ent['Eacute'] = 201
ent['Ecirc'] = 202
ent['Euml'] = 203
ent['Igrave'] = 204
ent['Iacute'] = 205
ent['Icirc'] = 206
ent['Iuml'] = 207
ent['ETH'] = 208
ent['Ntilde'] = 209
ent['Ograve'] = 210
ent['Oacute'] = 211
ent['Ocirc'] = 212
ent['Otilde'] = 213
ent['Ouml'] = 214
ent['times'] = 215
ent['Oslash'] = 216
ent['Ugrave'] = 217
ent['Uacute'] = 218
ent['Ucirc'] = 219
ent['Uuml'] = 220
ent['Yacute'] = 221
ent['THORN'] = 222
ent['szlig'] = 223
ent['agrave'] = 224
ent['aacute'] = 225
ent['acirc'] = 226
ent['atilde'] = 227
ent['auml'] = 228
ent['aring'] = 229
ent['aelig'] = 230
ent['ccedil'] = 231
ent['egrave'] = 232
ent['eacute'] = 233
ent['ecirc'] = 234
ent['euml'] = 235
ent['igrave'] = 236
ent['iacute'] = 237
ent['icirc'] = 238
ent['iuml'] = 239
ent['eth'] = 240
ent['ntilde'] = 241
ent['ograve'] = 242
ent['oacute'] = 243
ent['ocirc'] = 244
ent['otilde'] = 245
ent['ouml'] = 246
ent['divide'] = 247
ent['oslash'] = 248
ent['ugrave'] = 249
ent['uacute'] = 250
ent['ucirc'] = 251
ent['uuml'] = 252
ent['yacute'] = 253
ent['thorn'] = 254
ent['yuml'] = 255
ent['OElig'] = 338
ent['oelig'] = 339
ent['Scaron'] = 352
ent['scaron'] = 353
ent['Yuml'] = 376
ent['fnof'] = 402
ent['circ'] = 710
ent['tilde'] = 732
ent['Alpha'] = 913
ent['Beta'] = 914
ent['Gamma'] = 915
ent['Delta'] = 916
ent['Epsilon'] = 917
ent['Zeta'] = 918
ent['Eta'] = 919
ent['Theta'] = 920
ent['Iota'] = 921
ent['Kappa'] = 922
ent['Lambda'] = 923
ent['Mu'] = 924
ent['Nu'] = 925
ent['Xi'] = 926
ent['Omicron'] = 927
ent['Pi'] = 928
ent['Rho'] = 929
ent['Sigma'] = 931
ent['Tau'] = 932
ent['Upsilon'] = 933
ent['Phi'] = 934
ent['Chi'] = 935
ent['Psi'] = 936
ent['Omega'] = 937
ent['alpha'] = 945
ent['beta'] = 946
ent['gamma'] = 947
ent['delta'] = 948
ent['epsilon'] = 949
ent['zeta'] = 950
ent['eta'] = 951
ent['theta'] = 952
ent['iota'] = 953
ent['kappa'] = 954
ent['lambda'] = 955
ent['mu'] = 956
ent['nu'] = 957
ent['xi'] = 958
ent['omicron'] = 959
ent['pi'] = 960
ent['rho'] = 961
ent['sigmaf'] = 962
ent['sigma'] = 963
ent['tau'] = 964
ent['upsilon'] = 965
ent['phi'] = 966
ent['chi'] = 967
ent['psi'] = 968
ent['omega'] = 969
ent['thetasym'] = 977
ent['upsih'] = 978
ent['piv'] = 982
ent['ensp'] = 8194
ent['emsp'] = 8195
ent['thinsp'] = 8201
ent['zwnj'] = 8204
ent['zwj'] = 8205
ent['lrm'] = 8206
ent['rlm'] = 8207
ent['ndash'] = 8211
ent['mdash'] = 8212
ent['lsquo'] = 8216
ent['rsquo'] = 8217
ent['sbquo'] = 8218
ent['ldquo'] = 8220
ent['rdquo'] = 8221
ent['bdquo'] = 8222
ent['dagger'] = 8224
ent['Dagger'] = 8225
ent['bull'] = 8226
ent['hellip'] = 8230
ent['permil'] = 8240
ent['prime'] = 8242
ent['Prime'] = 8243
ent['lsaquo'] = 8249
ent['rsaquo'] = 8250
ent['oline'] = 8254
ent['frasl'] = 8260
ent['euro'] = 8364
ent['image'] = 8465
ent['weierp'] = 8472
ent['real'] = 8476
ent['trade'] = 8482
ent['alefsym'] = 8501
ent['larr'] = 8592
ent['uarr'] = 8593
ent['rarr'] = 8594
ent['darr'] = 8595
ent['harr'] = 8596
ent['crarr'] = 8629
ent['lArr'] = 8656
ent['uArr'] = 8657
ent['rArr'] = 8658
ent['dArr'] = 8659
ent['hArr'] = 8660
ent['forall'] = 8704
ent['part'] = 8706
ent['exist'] = 8707
ent['empty'] = 8709
ent['nabla'] = 8711
ent['isin'] = 8712
ent['notin'] = 8713
ent['ni'] = 8715
ent['prod'] = 8719
ent['sum'] = 8721
ent['minus'] = 8722
ent['lowast'] = 8727
ent['radic'] = 8730
ent['prop'] = 8733
ent['infin'] = 8734
ent['ang'] = 8736
ent['and'] = 8743
ent['or'] = 8744
ent['cap'] = 8745
ent['cup'] = 8746
ent['int'] = 8747
ent['there4'] = 8756
ent['sim'] = 8764
ent['cong'] = 8773
ent['asymp'] = 8776
ent['ne'] = 8800
ent['equiv'] = 8801
ent['le'] = 8804
ent['ge'] = 8805
ent['sub'] = 8834
ent['sup'] = 8835
ent['nsub'] = 8836
ent['sube'] = 8838
ent['supe'] = 8839
ent['oplus'] = 8853
ent['otimes'] = 8855
ent['perp'] = 8869
ent['sdot'] = 8901
ent['lceil'] = 8968
ent['rceil'] = 8969
ent['lfloor'] = 8970
ent['rfloor'] = 8971
ent['lang'] = 9001
ent['rang'] = 9002
ent['loz'] = 9674
ent['spades'] = 9824
ent['clubs'] = 9827
ent['hearts'] = 9829
ent['diams'] = 9830

var xml = '<root>&loz; &lt; &AMP; &lt; a ॥</root>'
for (var i in ent)
xml = xml.replace(new RegExp('&'+i+'; ','gi'),
String.fromChar Code(ent[i]));
xml = xml.replace(/(&)(#)(\d{1,})( ;)/g,
function (tot,amp,cr,cp, sem) {
return String.fromChar Code(cp)
}
)
alert(xml)

Info:
http://en.wikipedia.org/wiki/List_of...ity_references

Hope this helps,

--
Bart

May 5 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
1724
by: Derek Fountain | last post by:
I was just writing a sanitisation route for a bit of user input. The data is an English text description of a product, and will go into a DB, then back out to other user's browsers. As per normal practise, I was working on the basis of leaving in all characters that I considered safe and stripping out everything else. This led me to think of what characters are actually safe, given that the user will want to be able to use at least basic...
1
2038
by: Home User | last post by:
Hello, I am trying to find an application (if it exists) that actually creates .gent files (General Entity). Also, I am wondering if I can import .gent files into Dreamweaver? Perhaps with a custom designed plugin for DW's Extension Manager? I need to be able to edit these .gent files with quality word processing application or web editor. I do not wish to edit with Notepad.
9
7148
by: developer | last post by:
Does anyone know what is the way IE treats span tags(<span>) and table tags(<tr>, <td>)? Should the <span> tag be encolsed in tds and trs if it placed with other elements that are in a table? Can the span tag itself contain table tags within it? I have some scripting code and when I wrap the span in table elements it does not find the html within the span. Here is an example.... <tr><td colspan="4" align="left"><span...
4
7046
by: terry | last post by:
could someone tell me how to add or remove entity to a xml file when i dim xmlentity as new xmlentity it's say it's sube new is private thks
0
2817
by: Frederico Guimar??es via DotNetMonster.com | last post by:
Hi, I'm trying to use the Microsoft.Web.Services2.Messaging. ISoapFormatter but I receive this error: System.Xml.XmlException: Reference to undeclared parameter entity, 'meetingmaker'. at System.Xml.XmlLoader.ExpandEntityReference(XmlEntityReference eref) at System.Xml.XmlEntityReference.SetParent(XmlNode node) at System.Xml.XmlNode.AppendChild(XmlNode newChild)
2
1299
by: jesl | last post by:
Group, I have created a User Control with the property "Html" of type string. If I declare this control on an ASPX page with the value "<b>This is an entity: &lt;</b>" for the property "Html", the ASP.NET parser seems to automatically convert the entity reference "&lt;" to it's corresponding character value "<". For example, if the tagprefix and tagname for the user control is "dn" and "test":
9
2670
by: mistral | last post by:
What is difference between two encoding methods below and what method can be considered more "web safe", fully retaining functionality of the original source code, without the danger of misinterpretation of original code characters (code contains long Registry entries, activeX, etc). A.
0
1096
by: soeter04 | last post by:
Quick question... At work, we do business with a third party. They 'deliver' XML to us over HTML, which is parsed and processed by a servlet. I suspect some of the delivered XML is malformed. However, i'm not the typical XML guru, so i'm not if i should be as mad at that third party as i'd like to be :) The issue: (element names/values are changed ;) ) 1) <ELEMENT>&nbsp;</ELEMENT>
10
1857
by: alf | last post by:
question without words: File "<stdin>", line 1 r"\" ^ SyntaxError: EOL while scanning single-quoted string '\\ '
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10260
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10101
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9906
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6712
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5354
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2850
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.