473,796 Members | 2,839 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XML equality

Hi *,

I have been looking for a definition or at least some workable concept
of "XML equality".

Searching on "XML equality" in comp.text.xml, microsoft.publi c.xsl and
microsoft.publi c.xml resulted in no hits

I also searched for: XML equality schema (single words) on the same
newsgroups gave very little and not-to-the-point links

I have read about from the commercial "XMLBooster " that it now
addresses these issues by generating code to:
- Check for equality among XML instances
- Compute the distance between two XML instances
- Compute the minimal set of changes required to go from one instance
to another, similar in spirit to what the diff Unix command does for
text files.

But it is hard to tell what is it exactly they mean by "equality among
XML instances" and "distance between two XML instances". I spent some
time at their web site and I think they are just using sale pitches. I
couldn't find any docs exacting or at least clarifying their
claims/terminology

I know xml is basically (structured) text and there aren't such
definitions for texts/natural languages' grammars (their usefulness and
validity actually is more of a semantic not a syntactic one)

Do you know of works dealing with the definition of such terms?

Thanks
otf

Dec 21 '05 #1
6 1550
Look for "xml diff" instead...

Dec 21 '05 #2
In article <11************ *********@g14g2 000cwa.googlegr oups.com>,
onetitfemme <on************ *@yahoo.com> wrote:
I have been looking for a definition or at least some workable concept
of "XML equality".


A natural definition would use the infoset. Norm Walsh has a
definition:

http://norman.walsh.name/2004/05/19/infoset-equal

-- Richard
Dec 21 '05 #3
// - - - - - - - - - - - - - - - - - - - -
Look for "xml diff" instead...
mgungora, this is how I started. search comp.text.xml for "OSS,
java-based XML Diff?"

I could not find much either, as a matter of fact no one replied to me

// - - - - - - - - - - - - - - - - - - - -
I have been looking for a definition or at least some workable concept
of "XML equality".

A natural definition would use the infoset. Norm Walsh has a
definition: http://norman.walsh.name/2004/05/19/infoset-equal


Richard, thank you for pointing me to norman walsh's article

// __
Infoset Equality
19 May 2004 (modified 11 Sep 2005)
Volume 7, Issue 86
by norman walsh

http://norman.walsh.name/2004/05/19/infoset-equal
// __

in which he approaches the concept from the perspective of infosets
(http://www.w3.org/TR/xml-infoset/) is definitely a good start, but
there are a number of issues that I see right away by just looking at
his defs. for example:

// __ in def. 2:
2. Element Information Items

Two element information items are equal if the following properties
are equal:

- [namespace name]
- [local name]
- [children]
- [attributes]

Children are compared in order, attributes without respect to order.
// __
._ I would also include the path to the element, just the path, NOT
the content of all elements in the path(unless he understands it as
part of the "[namespace name]"). To me, it is very natural to include
the path to an element and I wonder why it escaped his considerations.
._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all). These two sections of XML "instances" should be equal

<node4>
<children>young er child: Paul<children>
<children>old er child: Mary<children>
</node4>

and

<node4>
<children>old er child: Mary<children>
<children>young er child: Paul<children>
</node4>

._ if an attribute is not mandatory, should these two sections be the
same?

<node4>
<children>old er child: Mary<children>
<children>young er child: Paul<children>
</node4>

and

<node4>
<children adopted="true"> older child: Mary<children>
<children>young er child: Paul<children>
</node4>

Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?

Thanks
otf

Dec 21 '05 #4
In article <11************ **********@g49g 2000cwa.googleg roups.com>,
onetitfemme <on************ *@yahoo.com> wrote:
._ I would also include the path to the element, just the path, NOT
the content of all elements in the path
I don't understand why you would do that. If the elements don't have
the same path from the root, you wouldn't be comparing them at all.

Unless you are considering comparison of fragments of documents, in
which case you probably don't care about the position in the document.
._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
XML documents aren't required to have any kind of schema. This would
be equality on documents+schem as, not documents.
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all).
This requires knowledge of the interpretation of the document that is not
inherent in the document itself. Given some kind of schema, it might be
appropriate to interpret the children as a set rather than a sequence,
but in that case you are again not comparing documents themselves, but
the data models resulting from application of a schema to the documents.
._ if an attribute is not mandatory, should these two sections be the
same?
As XML documents, they would be different. According to some
interpretation, they might be the same. Optional attributes
are not always interpreted as supplying optional information: their
absence may be as significant as their presence.
Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?


Processing instructions are used for many different purposes. But their
obvious canonical use is to specify the processing of (part of) the
document rather than its content.

-- Richard
Dec 21 '05 #5
> Richard Tobin wrote ...
Hi *,
._ I would also include the path to the element, just the path, NOT
the content of all elements in the path
I don't understand why you would do that. If the elements don't have
the same path from the root, you wouldn't be comparing them at all.
"If the elements don't have the same path from the root, you
wouldn't be comparing them at all"
otf: exactly! Here I might be a little biased and/or some intuition
artifacts might be kicking in. We theoretical physicists
"naturally" think this way. You may go LOL, but to us if more
people board a train, it might still reach its end, but the trajectory
will definitely not be the same ;-)
Jokes aside now, to me (in an ontology (well structure hierarchical
tree-like depedency)) the Path to an element is as important as the
element itself Unless you are considering comparison of fragments of documents, in
which case you probably don't care about the position in the document. "fragments of documents"
otf: am I considering, but I still care about the position in the
document. ._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
XML documents aren't required to have any kind of schema. This would
be equality on documents+schem as, not documents.
"equality on documents+schem as, not documents."
otf: exactly! "structural and type affinity on their schemas ..."
should be very important to even consider any type of comparison
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all). This requires knowledge of the interpretation of the document that is not
inherent in the document itself. Given some kind of schema, it might be
appropriate to interpret the children as a set rather than a sequence,
but in that case you are again not comparing documents themselves, but
the data models resulting from application of a schema to the documents.
otf: granted! But how is it that you would not interpret the children
as a set, if no other indication has been explicitly indicated in the
schema?
Actually the data models resulting from the COMPLIANCE of documents to
a schema, so that they become actionable data for an XML application
._ if an attribute is not mandatory, should these two sections be the
same?

As XML documents, they would be different. According to some
interpretation, they might be the same. Optional attributes
are not always interpreted as supplying optional information: their
absence may be as significant as their presence.


otf: OK. I think I have started to see that there might not be such
thing as "XML equality" (as you have e.g. for mathematical
magnitudes), but degrees thereof
Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?

Processing instructions are used for many different purposes. But their
obvious canonical use is to specify the processing of (part of) the
document rather than its content.

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I am thinking of tones of web pages (and/or any other marked up dox)
as a huge forest of texts where "links" among them are not only
given though URLs, but though their structure as well.
I understood something from your comments when you talked about the
"position in the document" (of an element) I think I am missing
something. Even the path to the elements might not be enough to an
accurate description of "equality", but since "degrees thereof"
might be important as well, even the closed graphs to the point an
element is should be considered

Thanks
otf

Dec 22 '05 #6
just found a really good article which answers my XML diffing doubts to
a large extent

http://www.mulberrytech.com/Extreme/...haffert01.html

Structure-Preserving Difference Search for XML Documents
by E. Schubert, S. Schaffert, and F. Bry
abstract:
Current XML differencing applications usually try to find a minimal
sequence of edit operations that transform one XML document to another
XML document (the so-called "edit script"). In our conviction, this
approach often produces increments that are unintuitive for human
readers and do not reflect the actual changes. We therefore propose in
this article a different approach trying to maximize the retained
structure instead of minimizing the edit sequence. Structure is thereby
not limited to the usual tree structure of XML - any kind of structural
relations can be considered (like parent-child, ancestor-descendant,
sibling, document order). In our opinion, this approach is very
flexible and able to adapt to the user's requirements. It produces more
readable results while still retaining a reasonably small edit
sequence.
Keywords: Web; XML; Difference

Dec 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

26
3184
by: Alexander Block | last post by:
Hello newsgroup, let's say I have a function like template<class Type> inline bool areEqual(const Type &a, const Type &b) { return ( a == b ); }
40
5731
by: Ike Naar | last post by:
In K&R "The C++ programming language (2nd ANSI C edition), the reference manual states (paragraphs 7.9 and 7.10) that pointer comparison is undefined for pointers that do not point to the same object. So if we have const char * foo = "foo" , * bar = "bar" ; int foobar = ( foo == bar ) ; would it mean that foobar is undefined?
4
1677
by: Matt Burland | last post by:
I'm a little confused about the way the default equality operator works with classes. Here's the situation, I have two comboboxes that are each filled with different object (i.e. ComboBox1 contains objects of class A, ComboBox2 contains objects of class B). What I'm trying to do is determine if a given object is contained in one of the comboboxes, i.e.: Combobox1.Items.Contains(MyA); Combobox2.Items.Contains(MyB); Now the problem is...
2
1789
by: Marcel Sottnik | last post by:
Hallo NG Does anyone have an idea how could one implement, a general routine for value equality ? I mean something using Reflections to get all the members of a class and compare them recursivelly until the comparison of value type is reached. I found in Reflections only getting of public members. Although the internal state of an object should not be interesting for evalution of equality I would like to make a value equality check also...
37
2820
by: spam.noam | last post by:
Hello, Guido has decided, in python-dev, that in Py3K the id-based order comparisons will be dropped. This means that, for example, "{} < " will raise a TypeError instead of the current behaviour, which is returning a value which is, really, id({}) < id(). He also said that default equality comparison will continue to be identity-based. This means that x == y will never raise an exception, as is the situation is now. Here's his reason:
7
1975
by: Gary Brown | last post by:
Hi, In C#, how do you determine two objects are the "same" rather than "equal?" In C/C++ you can check the addresses and LISP provides a rich set of equality operators but C# appears ambiguous. Search of the on-line documentation of "equal" and "same" yielded nothing useful. Thanks,
6
4508
by: Edward Diener | last post by:
Now that operator overloading allows to ref classes to be compared for equality using == syntax, how does one compare the actual ref pointers ( ^ ) for equality instead ? As an example: SomeRefObject ^ obj1(..initialized somehow); SomeRefObject ^ obj2(..initialized somehow); if (obj1 == obj2) // This compares the objects themselves for equality
3
1750
by: toton | last post by:
Hi, I have a struct Point { int x, int y; } The points are stored in a std::vector<Pointpoints; (global vector) I want to add equality (operator == ) for the point, which will check equality based on the position of the point in the vector rather than its x,y or any other criterion. Thus 2 free point (which are not in the vector are always unequal ) and so on. How to add this kind of equality operator ? Is comparing memory location like...
0
9673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10221
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10169
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10003
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9050
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7546
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5440
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3730
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2924
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.