XML equality

onetitfemme

Hi *,

I have been looking for a definition or at least some workable concept
of "XML equality".

Searching on "XML equality" in comp.text.xml, microsoft.public.xsl and
microsoft.public.xml resulted in no hits

I also searched for: XML equality schema (single words) on the same
newsgroups gave very little and not-to-the-point links

I have read about from the commercial "XMLBooster" that it now
addresses these issues by generating code to:
- Check for equality among XML instances
- Compute the distance between two XML instances
- Compute the minimal set of changes required to go from one instance
to another, similar in spirit to what the diff Unix command does for
text files.

But it is hard to tell what is it exactly they mean by "equality among
XML instances" and "distance between two XML instances". I spent some
time at their web site and I think they are just using sale pitches. I
couldn't find any docs exacting or at least clarifying their
claims/terminology

I know xml is basically (structured) text and there aren't such
definitions for texts/natural languages' grammars (their usefulness and
validity actually is more of a semantic not a syntactic one)

Do you know of works dealing with the definition of such terms?

Thanks
otf

Dec 21 '05 #1

Subscribe Post Reply

1537

mgungora

Look for "xml diff" instead...

Dec 21 '05 #2

Richard Tobin

In article <11*********************@g14g2000cwa.googlegroups. com>,
onetitfemme <on*************@yahoo.com> wrote:

I have been looking for a definition or at least some workable concept
of "XML equality".

A natural definition would use the infoset. Norm Walsh has a
definition:

http://norman.walsh.name/2004/05/19/infoset-equal

-- Richard

Dec 21 '05 #3

onetitfemme

// - - - - - - - - - - - - - - - - - - - -

Look for "xml diff" instead...
mgungora, this is how I started. search comp.text.xml for "OSS,
java-based XML Diff?"

I could not find much either, as a matter of fact no one replied to me

// - - - - - - - - - - - - - - - - - - - -
I have been looking for a definition or at least some workable concept
of "XML equality".

A natural definition would use the infoset. Norm Walsh has a
definition: http://norman.walsh.name/2004/05/19/infoset-equal

Richard, thank you for pointing me to norman walsh's article

// __
Infoset Equality
19 May 2004 (modified 11 Sep 2005)
Volume 7, Issue 86
by norman walsh

http://norman.walsh.name/2004/05/19/infoset-equal
// __

in which he approaches the concept from the perspective of infosets
(http://www.w3.org/TR/xml-infoset/) is definitely a good start, but
there are a number of issues that I see right away by just looking at
his defs. for example:

// __ in def. 2:
2. Element Information Items

Two element information items are equal if the following properties
are equal:

- [namespace name]
- [local name]
- [children]
- [attributes]

Children are compared in order, attributes without respect to order.
// __
._ I would also include the path to the element, just the path, NOT
the content of all elements in the path(unless he understands it as
part of the "[namespace name]"). To me, it is very natural to include
the path to an element and I wonder why it escaped his considerations.
._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all). These two sections of XML "instances" should be equal

<node4>
<children>younger child: Paul<children>
<children>older child: Mary<children>
</node4>

and

<node4>
<children>older child: Mary<children>
<children>younger child: Paul<children>
</node4>

._ if an attribute is not mandatory, should these two sections be the
same?

<node4>
<children>older child: Mary<children>
<children>younger child: Paul<children>
</node4>

and

<node4>
<children adopted="true">older child: Mary<children>
<children>younger child: Paul<children>
</node4>

Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?

Thanks
otf

Dec 21 '05 #4

Richard Tobin

In article <11**********************@g49g2000cwa.googlegroups .com>,
onetitfemme <on*************@yahoo.com> wrote:

._ I would also include the path to the element, just the path, NOT
the content of all elements in the path
I don't understand why you would do that. If the elements don't have
the same path from the root, you wouldn't be comparing them at all.

Unless you are considering comparison of fragments of documents, in
which case you probably don't care about the position in the document.
._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
XML documents aren't required to have any kind of schema. This would
be equality on documents+schemas, not documents.
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all).
This requires knowledge of the interpretation of the document that is not
inherent in the document itself. Given some kind of schema, it might be
appropriate to interpret the children as a set rather than a sequence,
but in that case you are again not comparing documents themselves, but
the data models resulting from application of a schema to the documents.
._ if an attribute is not mandatory, should these two sections be the
same?
As XML documents, they would be different. According to some
interpretation, they might be the same. Optional attributes
are not always interpreted as supplying optional information: their
absence may be as significant as their presence.
Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?

Processing instructions are used for many different purposes. But their
obvious canonical use is to specify the processing of (part of) the
document rather than its content.

-- Richard

Dec 21 '05 #5

onetitfemme

> Richard Tobin wrote ...
Hi *,

._ I would also include the path to the element, just the path, NOT
the content of all elements in the path
I don't understand why you would do that. If the elements don't have
the same path from the root, you wouldn't be comparing them at all.
"If the elements don't have the same path from the root, you
wouldn't be comparing them at all"
otf: exactly! Here I might be a little biased and/or some intuition
artifacts might be kicking in. We theoretical physicists
"naturally" think this way. You may go LOL, but to us if more
people board a train, it might still reach its end, but the trajectory
will definitely not be the same ;-)
Jokes aside now, to me (in an ontology (well structure hierarchical
tree-like depedency)) the Path to an element is as important as the
element itself Unless you are considering comparison of fragments of documents, in
which case you probably don't care about the position in the document. "fragments of documents"
otf: am I considering, but I still care about the position in the
document. ._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
XML documents aren't required to have any kind of schema. This would
be equality on documents+schemas, not documents.
"equality on documents+schemas, not documents."
otf: exactly! "structural and type affinity on their schemas ..."
should be very important to even consider any type of comparison
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all). This requires knowledge of the interpretation of the document that is not
inherent in the document itself. Given some kind of schema, it might be
appropriate to interpret the children as a set rather than a sequence,
but in that case you are again not comparing documents themselves, but
the data models resulting from application of a schema to the documents.
otf: granted! But how is it that you would not interpret the children
as a set, if no other indication has been explicitly indicated in the
schema?
Actually the data models resulting from the COMPLIANCE of documents to
a schema, so that they become actionable data for an XML application
._ if an attribute is not mandatory, should these two sections be the
same?

As XML documents, they would be different. According to some
interpretation, they might be the same. Optional attributes
are not always interpreted as supplying optional information: their
absence may be as significant as their presence.

otf: OK. I think I have started to see that there might not be such
thing as "XML equality" (as you have e.g. for mathematical
magnitudes), but degrees thereof

Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?

Processing instructions are used for many different purposes. But their
obvious canonical use is to specify the processing of (part of) the
document rather than its content.

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I am thinking of tones of web pages (and/or any other marked up dox)
as a huge forest of texts where "links" among them are not only
given though URLs, but though their structure as well.
I understood something from your comments when you talked about the
"position in the document" (of an element) I think I am missing
something. Even the path to the elements might not be enough to an
accurate description of "equality", but since "degrees thereof"
might be important as well, even the closed graphs to the point an
element is should be considered

Thanks
otf

Dec 22 '05 #6

onetitfemme

just found a really good article which answers my XML diffing doubts to
a large extent

http://www.mulberrytech.com/Extreme/...haffert01.html

Structure-Preserving Difference Search for XML Documents
by E. Schubert, S. Schaffert, and F. Bry
abstract:
Current XML differencing applications usually try to find a minimal
sequence of edit operations that transform one XML document to another
XML document (the so-called "edit script"). In our conviction, this
approach often produces increments that are unintuitive for human
readers and do not reflect the actual changes. We therefore propose in
this article a different approach trying to maximize the retained
structure instead of minimizing the edit sequence. Structure is thereby
not limited to the usual tree structure of XML - any kind of structural
relations can be considered (like parent-child, ancestor-descendant,
sibling, document order). In our opinion, this approach is very
flexible and able to adapt to the user's requirements. It produces more
readable results while still retaining a reasonably small edit
sequence.
Keywords: Web; XML; Difference

Dec 23 '05 #7

by: Alexander Block | last post by:

Hello newsgroup, let's say I have a function like template<class Type> inline bool areEqual(const Type &a, const Type &b) { return ( a == b ); }

C / C++

pointer equality

by: Ike Naar | last post by:

In K&R "The C++ programming language (2nd ANSI C edition), the reference manual states (paragraphs 7.9 and 7.10) that pointer comparison is undefined for pointers that do not point to the same...

C / C++

Question about equality

by: Matt Burland | last post by:

I'm a little confused about the way the default equality operator works with classes. Here's the situation, I have two comboboxes that are each filled with different object (i.e. ComboBox1 contains...

C# / C Sharp

value equality

by: Marcel Sottnik | last post by:

Hallo NG Does anyone have an idea how could one implement, a general routine for value equality ? I mean something using Reflections to get all the members of a class and compare them...

C# / C Sharp

Why keep identity-based equality comparison?

by: spam.noam | last post by:

Hello, Guido has decided, in python-dev, that in Py3K the id-based order comparisons will be dropped. This means that, for example, "{} < " will raise a TypeError instead of the current...

Python

Equality vs Sameness

by: Gary Brown | last post by:

Hi, In C#, how do you determine two objects are the "same" rather than "equal?" In C/C++ you can check the addresses and LISP provides a rich set of equality operators but C# appears ambiguous....

C# / C Sharp

C++/CLI comparing two clr poiinters for equality

by: Edward Diener | last post by:

Now that operator overloading allows to ref classes to be compared for equality using == syntax, how does one compare the actual ref pointers ( ^ ) for equality instead ? As an example: ...

.NET Framework

equality operator based on position

by: toton | last post by:

Hi, I have a struct Point { int x, int y; } The points are stored in a std::vector<Pointpoints; (global vector) I want to add equality (operator == ) for the point, which will check equality...

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Similar topics