473,385 Members | 1,347 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

whitespace in element content

Hello,

it is often convenient to insert whitespace into an XML document in order to
format it nicely. For example, take this snippet of a notional DocBook XML
document:

<para>
This is a longer paragraph.
With <wordasword>longer</wordasword> I mean that it contains more than
one sentence.
</para>

I want the whitespace in this snippet of code to be handled as follows:

(1) The whitespace between "<para>" and "This" as well as the whitespace
between "sentence." and "</para>" shall be discarded.

(2) Each other sequence of adjacent whitespace characters shall be
transformed into a single space character.

But how do XML processors and applications deal with this issue?

In section 2.10 of "Extensible Markup Language (XML) 1.0 (Third Edition)",
one can read:

In editing XML documents, it is often convenient to use "white
space" (spaces, tabs, and blank lines) to set apart the markup for
greater readability. Such white space is typically not intended for
inclusion in the delivered version of the document.

But who decides which whitespace shall be considered as whitespace that is
just used to set apart the markup? And is whitespace just used to indent
lines of text also not intended for inclusion in the delivered version?
What is this "delivered version" of the document?

I'd be thankful for any clarification.

Best whishes,
Wolfgang
Jul 20 '05 #1
2 2121
In article <2u*************@uni-berlin.de>,
Wolfgang Jeltsch <je*****@tu-cottbus.de> wrote:
But who decides which whitespace shall be considered as whitespace that is
just used to set apart the markup? And is whitespace just used to indent
lines of text also not intended for inclusion in the delivered version?
What is this "delivered version" of the document?
As far as the XML spec is concerned, deciding which whitespace is
significant or not is a job for the application, which really means
"everything except the parser". A conformant parser must give all the
whitespace to the application, which can then decide what to do with
it.

Of course, there may be other standard programs or libraries layered
on top of the XML parser which you might not consider to be the
application. XSLT for example allows you to specify that some
whitespace is to be stripped from its input. From the point of view
of the parser, XSLT is the application, but you may regard it as just
a library that you're using.
I want the whitespace in this snippet of code to be handled as follows:

(1) The whitespace between "<para>" and "This" as well as the whitespace
between "sentence." and "</para>" shall be discarded.

(2) Each other sequence of adjacent whitespace characters shall be
transformed into a single space character.


This is a fairly common form of whitespace normalization and often
goes under the name of "tokenization". For example, XML itself treats
tokenized attributes like this. Among other things, you could use an
XML Schema processor to do this normalization.

-- Richard
Jul 20 '05 #2

"Wolfgang Jeltsch" <je*****@tu-cottbus.de> wrote in message
news:2u*************@uni-berlin.de...
Hello,

it is often convenient to insert whitespace into an XML document in order
to
format it nicely. For example, take this snippet of a notional DocBook
XML
document:

<para>
This is a longer paragraph.
With <wordasword>longer</wordasword> I mean that it contains more
than
one sentence.
</para>

I want the whitespace in this snippet of code to be handled as follows:

(1) The whitespace between "<para>" and "This" as well as the
whitespace
between "sentence." and "</para>" shall be discarded.

(2) Each other sequence of adjacent whitespace characters shall be
transformed into a single space character.

But how do XML processors and applications deal with this issue?

In section 2.10 of "Extensible Markup Language (XML) 1.0 (Third Edition)",
one can read:

In editing XML documents, it is often convenient to use "white
space" (spaces, tabs, and blank lines) to set apart the markup for
greater readability. Such white space is typically not intended for
inclusion in the delivered version of the document.

But who decides which whitespace shall be considered as whitespace that is
just used to set apart the markup? And is whitespace just used to indent
lines of text also not intended for inclusion in the delivered version?
What is this "delivered version" of the document?

I'd be thankful for any clarification.

my parser uses what could be called the "newline whitespace assertion",
namely:
any initial whitespace is ignored;
any whitespace following a newline is eaten and replaced with a single space
(unless it is the end of the text).
<foo>Hello World
Again</foo>

is parsed as:
<foo>Hello World Again</foo>

Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Wired Earp | last post by:
I've had some luck using string values "\t" "\n" and "\r" to insert tabs, newlines and carriagereturn textnodes into a document, but I can't *read* these nodes, at least not by analyzing the...
8
by: Tjerk Wolterink | last post by:
Hello all, how does xsl handle white space? I know you can set domething like this for nice indentation: <xsl:output method="xhtml" indent="yes"/> But know i have xsl code like this:
7
by: Georg J. Stach | last post by:
Hi, as mentioned above I'd like to validate a simple XML-document with a simple DTD. For this, I use Java and Xerces. But, when I have tags of this form: <tag>some characters in here</tag> ...
0
by: Daniel Cazzulino [MVP XML] | last post by:
Hi guys, I need an attribute (could be an element too) to have its whitespace collapsed . I don't get the expected behavior, however. The schema is as follows: <xs:schema ...> <xs:element...
4
by: Larry | last post by:
I believe the .Net XmlValidatingReader should fail when validating XML that contains a ComplexType element with white space when the ComplexType element has the mixed attribute set to false in the...
0
by: SimonDev | last post by:
Hi I've got an unusual problem I'm hoping someone could advise me on, regarding the formatting of the body of an HTTP response from a web service. We are using HTTP POST rather than SOAP for...
9
by: amattie | last post by:
Does anyone have any idea on how I can strip the extra whitespace in the XML that shows up when I receive a response from an ASP.NET 2.0 webservice? This has been discussed before, but no one has...
1
by: andrew_nuss | last post by:
Hi, Lets say I have a MIXED tag in my XML DTD with content that is going to be rendered as HMTL, as well as a <boldand <italicstag. What about whitespace? Specifically, does the whitespace in...
5
by: John Gordon | last post by:
My XSLT files have many occurrences of this general pattern: <a> <xsl:attribute name="href"> <xsl:value-of select="xyz" /> </xsl:attribute> </a> When I execute an XSL transform, the...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.