473,394 Members | 1,785 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Handling " entity in attribute value

Hi,

I'd like to ask how XML parsers should handle attributes which consists
of " entity as value. I know XML allows to use both: single and
double quotes as attribute value terminator. That's clear.
But how should parser react for such situation:

I have CORDSYS element with string attribute which consists of value
with many " entities:

<COORDSYS
string="GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;WGS84&quot;,SPHEROID[&quot;WGS84&quot;,6378137,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0],UNIT[&quot;Degree&quot;,0.0174532925199433]]"/>

So, when I read it to DOM and after someoperations I try to save it to
file parsers replaces double-quote value terminators to single-quote as
follows:

<COORDSYS
string='GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;WGS84&quot;,SPHEROID[&quot;WGS84&quot;,6378137,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0],UNIT[&quot;Degree&quot;,0.0174532925199433]]'/>

Please, explain me how parser is expected to handle this element in
save operation.

Best regards

--
Mateusz Loskot
http://mateusz.loskot.net

Oct 21 '05 #1
5 3423
"Mateusz Loskot" <ma*****@loskot.net> wrote:
I'd like to ask how XML parsers should handle attributes which consists
of &quot; entity as value.
As data that contains the ASCII quotation mark.
I have CORDSYS element with string attribute which consists of value
with many &quot; entities:
OK.
So, when I read it to DOM and after someoperations I try to save it to
file parsers replaces double-quote value terminators to single-quote as
follows:


That's external to XML parsing. You are not processing XML any more but
data constructed by parsing an XML document and representing it as a tree.
What happens then depends on the tools you use. Most probably the internal
representation does not contain the enclosing quotation marks or the entity
references but the parsed attribute values a strings. When you later output
the data in some format, perhaps linearizing it as XML, the results depend
on how you do that.

If all occurrences of ASCII quote and ASCII apostrophe in the attribute
values are "escaped" using entity or character references, it does not
matter whether you use quotes or apostrophes as delimiters when converting
the data back to XML format. (Naturally you need to use matching
delimiters, i.e. the same character as opening and as closing delimiter.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Oct 21 '05 #2
Jukka K. Korpela wrote:
"Mateusz Loskot" <ma*****@loskot.net> wrote:
So, when I read it to DOM and after someoperations I try to save it to
file parsers replaces double-quote value terminators to single-quote as
follows:
That's external to XML parsing. You are not processing XML any more but
data constructed by parsing an XML document and representing it as a tree.


Yes, I know
What happens then depends on the tools you use.
Yes, I use TinyXML DOM parser.
Most probably the internal
representation does not contain the enclosing quotation marks or the entity
references but the parsed attribute values a strings. When you later output
the data in some format, perhaps linearizing it as XML, the results depend
on how you do that.


I did some investigation and now I know internals of TinyXML. During
Save operation TinyXML checks if attribute value contains double-quote
character (")
then it encloses attribute value in single-quotes ('). Certainly, it's
correct from XML spec point of view.
This checking is simply made using (let's say function) find('\"') in
attribute value.

TinyXML can be compiled in, let's say, C-style, then it uses its own
string class or with STL support, then it uses std::string.
When TinyXML is compiled in C-style then all &quot; entities are
"vislble" to parser as double-quotes so if you printf value of my
'string' attribute in way how it is hold by TinyXML then you will get
double-quotes instead of &quot; entities. But when TinyXML is compiled
with STL support then everything works fine. TinyXML holds 'string'
attribute with &quot; entities and does not convert it to double-quotes
internally.

Here is longer story with some source code:
http://sourceforge.net/forum/forum.p...orum_id=172103

I'm not sure if this approach is correct. I'm also not sure if this is
a TinyXML bug. That's why I've asked this question.
I'm going to do some further discussion with TinyXML developmend Team.

Thanks a lot

--
Mateusz Loskot
http://mateusz.loskot.net

Oct 21 '05 #3
"Mateusz Loskot" <ma*****@loskot.net> wrote:
During
Save operation TinyXML checks if attribute value contains double-quote
character (")
then it encloses attribute value in single-quotes ('). Certainly, it's
correct from XML spec point of view.
It is, but if the attribute value contains _both_ an ASCII quotation
mark " _and_ an ASCII apostrophe ' (which is admittedly rare), then
either of them _must_ be "escaped".
I'm not sure if this approach is correct.


I still don't know what the problem or question is about. You are saying
that the output format is correct. The internal format is not really an XML
issue and mostly a practical question: you need to know the internal format
in order to play with it.

What we _can_ say is that in processing XML data, &quot; and " (assuming a
context where " may appear) must be treated as identical. The distinction
should normally be lost in parsing, but if it is preserved in the internal
format, it should not affect processing of the data as XML. (The
distinction could be retained e.g. in order to be able to print out the
original XML source verbatim for some purpose.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Oct 23 '05 #4
In article <Xn*****************************@193.229.0.31>,
Jukka K. Korpela <jk******@cs.tut.fi> wrote:
It is, but if the attribute value contains _both_ an ASCII quotation
mark " _and_ an ASCII apostrophe ' (which is admittedly rare)


Not that rare: in an XSLT stylesheet an XPath may well contain a
string containing a quote. If you want an XPath string containing
both you're stuck!

-- Richard
Oct 23 '05 #5

Jukka K. Korpela wrote:
"Mateusz Loskot" <ma*****@loskot.net> wrote:
I'm not sure if this approach is correct.


I still don't know what the problem or question is about. You are saying
that the output format is correct. The internal format is not really an XML
issue and mostly a practical question: you need to know the internal format
in order to play with it.

What we _can_ say is that in processing XML data, &quot; and " (assuming a
context where " may appear) must be treated as identical.


Yes, I understand it. The problem seems to be more technical and
implementation related:

http://sourceforge.net/forum/forum.p...orum_id=172103

You can see that TinyXML parser works differently depending on C/C++
internal usage.

We are sure that when using every XML parser if I search XML element
for " then both &quot; and " (double-quotes) are expected to be
matched.

Cheers

--
Mateusz Loskot
http://mateusz.loskot.net

Oct 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: David Furey | last post by:
Hi I have an XML documnet and a XSLT document as shown below THe XSLT document brings back a filtered docmument that has the VendorName that starts with a particular sub-string This works as...
4
by: barney | last post by:
Hello, I' m using .NET System.Xml.XmlDOcument. When I do the following: XmlDocument xml = new XmlDocument(); xml.Load("blah"); .... xml.Save("blub"); I've got the problem that the following...
5
by: martin | last post by:
Hi, I would be extremly grateful for some help on producing an xml fragemt. The fragment that I wish to produce should look like this <Addresses> <Address>&qout;Somebody's Name&quot;...
3
by: DC Gringo | last post by:
I have an image control (that pulls an image off an ESRI map server): <ASP:IMAGE ID="imgZonedCountry" RUNAT="server"></ASP:IMAGE> In the code behind I am setting the ImageURL to a String value...
7
by: DC Gringo | last post by:
I am having a bear of a time with setting a URL query string as a text value in a dropdownlist and Server.URLEncode does not seem to do its job. theFullLink = theLinkPrefix &...
14
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only...
7
by: Kirt | last post by:
i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&amp;" #------------------test1.xml <Directory> <dirname>C:\Documents and...
13
by: Ragnar | last post by:
Hi, 2 issues left with my tidy-work: 1) Tidy transforms a "&amp;" in the source-xml into a "&" in the tidied version. My XML-Importer cannot handle it 2) in a long <title>-string a wrap is...
3
by: LionelAndJen | last post by:
I have an XML file that has a free form comment field in which the data provider, very kindly, already uses "&quot;" when writing "doesn't", I have doesn&apos;t . it's PERFECT, because that xml is...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.