473,782 Members | 2,485 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Whitespace in Canonicalized XML

If I understand correctly, canonicalized XML is a simplified, or
rather, "standardiz ed" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Thank you for your reply...
Jul 20 '05 #1
3 2065
"Celedor" <Ce*****@tekken .cc> wrote...
If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?


Anything that affects how the image will appear is obviously part of
the information.
Jul 20 '05 #2

"Celedor" <Ce*****@tekken .cc> wrote in message
news:4e******** *************** **@posting.goog le.com...
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardiz ed" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Hi,

The characteristics and properties of a "presentati on" depend very much
on who / what the intended recipient is. In the case of XML, by design,
humans are not the only possible recipients. XML is intended to also convery
data to machines, and these machines should be capable to processing XML
without any ambiguity messing up the works. To accomplish this, XML has
defined a very simple rule : anything in "tags" is XML markup, and
everything else is data.

If you look at the XML spec, you can see that there are different XML
node types defined. One of them is the text node. Consider the example below
:

<a>This is a text node
<ThisIsAnElemen tNode x="this is an attribute node">This is also a text
node</ThisIsAnElement Node></a>

This is perfectly valid XML. There are no assumptions that you can make
in general about the content of the text nodes. They may be completely
whitespace, or not, and only the recieving application / entity can tell you
if the whitespace is significant. When writing a spec, obviously, the
general case is what needs to be catered to, and hence, pure whitespace text
nodes cannot be "normalized " away.

That being said, the "xml:space" attribute exists to help normalization
of pure whitespace nodes. When the XML / higher-level application processor
(example XSL processor) encounters xml:space, it may or may not normalize -
it depends on the application.

Regards,
Kenneth
Jul 20 '05 #3
Celedor wrote:
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardiz ed" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?


Only if you have a DTD or Schema that tells you where PCDATA is allowed.

Without one, you have to assume character data can occur anywhere, which
makes *all* white-space significant.

///Peter

Jul 20 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2169
by: Wolfgang Jeltsch | last post by:
Hello, it is often convenient to insert whitespace into an XML document in order to format it nicely. For example, take this snippet of a notional DocBook XML document: <para> This is a longer paragraph. With <wordasword>longer</wordasword> I mean that it contains more than one sentence.
2
2026
by: Carlitos | last post by:
Hi there, A class in Xerces J-API (Java) called TextImpl contains a property that returns whether the text is ignorable whitespace (http://xml.apache.org/xerces-j/apiDocs/org/apache/xerces/dom/TextImpl.html#isIgnorableWhitespace()). ; I guess when they refer to "ignorable whitespace" in Java we may interpret that as an "insignificant whitespace" in .NET. Am I correct to say that? So, I need to manually convert some Java code to C#,...
2
1947
by: Carlitos | last post by:
Hi there, A class in Xerces J-API (Java) called TextImpl contains a property that returns whether the text is ignorable whitespace (http://xml.apache.org/xerces-j/apiDocs/org/apache/xerces/dom/TextImpl.html#isIgnorableWhitespace()). I guess when they refer to "ignorable whitespace" in Java we may interpret that as an "insignificant whitespace" in .NET. Am I correct to say that? So, I need to manually convert some Java code to C#,...
0
2259
by: Shan Plourde | last post by:
Hi everyone, I have been using various regular expressions with the ASP.NET RegularExpressionValidator for quite some time. In general it works very well. One of the common regex's that I use follows: ValidationExpression = "^\d{0,3}(\.\d{0,4})?$" The purpose of this one is to validate that numeric values input follow the syntax 999.9999. This works well. But, one thing that I have never tested previously (which has now been uncovered...
3
2627
by: David Pratt | last post by:
Hi. I am splitting a string on a non whitespace character. One or more whitespace characters can be returned as items in the list. I do not want the items in the list that are only whitespace (can be one or more characters of whitespace) and plan to use string.strip on those items that are not only whitespace (to remove any whitespace from front or back of items). What kind of efficient test can I use to obtain only list items returned...
56
3570
by: infidel | last post by:
Where are they-who-hate-us-for-our-whitespace? Are "they" really that stupid/petty? Are "they" really out there at all? "They" almost sound like a mythical caste of tasteless heathens that "we" have invented. It just sounds like so much trivial nitpickery that it's hard to believe it's as common as we've come to believe.
9
2557
by: amattie | last post by:
Does anyone have any idea on how I can strip the extra whitespace in the XML that shows up when I receive a response from an ASP.NET 2.0 webservice? This has been discussed before, but no one has ever come up with a good answer to what seems like such a common question. ...
5
3910
by: John Gordon | last post by:
My XSLT files have many occurrences of this general pattern: <a> <xsl:attribute name="href"> <xsl:value-of select="xyz" /> </xsl:attribute> </a> When I execute an XSL transform, the resulting HTML looks like this:
13
27966
by: Chaim Krause | last post by:
I am unable to figure out why the first two statements work as I expect them to and the next two do not. Namely, the first two spit the sentence into its component words, while the latter two return the whole sentence entact. import string from string import whitespace mytext = "The quick brown fox jumped over the lazy dog.\n" print mytext.split()
0
9641
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10313
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10080
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9944
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7494
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6735
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3643
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.