Hi *,
I have been looking for a definition or at least some workable concept
of "XML equality".
Searching on "XML equality" in comp.text.xml, microsoft.publi c.xsl and
microsoft.publi c.xml resulted in no hits
I also searched for: XML equality schema (single words) on the same
newsgroups gave very little and not-to-the-point links
I have read about from the commercial "XMLBooster " that it now
addresses these issues by generating code to:
- Check for equality among XML instances
- Compute the distance between two XML instances
- Compute the minimal set of changes required to go from one instance
to another, similar in spirit to what the diff Unix command does for
text files.
But it is hard to tell what is it exactly they mean by "equality among
XML instances" and "distance between two XML instances". I spent some
time at their web site and I think they are just using sale pitches. I
couldn't find any docs exacting or at least clarifying their
claims/terminology
I know xml is basically (structured) text and there aren't such
definitions for texts/natural languages' grammars (their usefulness and
validity actually is more of a semantic not a syntactic one)
Do you know of works dealing with the definition of such terms?
Thanks
otf 6 1550
Look for "xml diff" instead...
In article <11************ *********@g14g2 000cwa.googlegr oups.com>,
onetitfemme <on************ *@yahoo.com> wrote: I have been looking for a definition or at least some workable concept of "XML equality".
A natural definition would use the infoset. Norm Walsh has a
definition: http://norman.walsh.name/2004/05/19/infoset-equal
-- Richard
// - - - - - - - - - - - - - - - - - - - - Look for "xml diff" instead...
mgungora, this is how I started. search comp.text.xml for "OSS,
java-based XML Diff?"
I could not find much either, as a matter of fact no one replied to me
// - - - - - - - - - - - - - - - - - - - - I have been looking for a definition or at least some workable concept of "XML equality".
A natural definition would use the infoset. Norm Walsh has a definition:
http://norman.walsh.name/2004/05/19/infoset-equal
Richard, thank you for pointing me to norman walsh's article
// __
Infoset Equality
19 May 2004 (modified 11 Sep 2005)
Volume 7, Issue 86
by norman walsh http://norman.walsh.name/2004/05/19/infoset-equal
// __
in which he approaches the concept from the perspective of infosets
( http://www.w3.org/TR/xml-infoset/) is definitely a good start, but
there are a number of issues that I see right away by just looking at
his defs. for example:
// __ in def. 2:
2. Element Information Items
Two element information items are equal if the following properties
are equal:
- [namespace name]
- [local name]
- [children]
- [attributes]
Children are compared in order, attributes without respect to order.
// __
._ I would also include the path to the element, just the path, NOT
the content of all elements in the path(unless he understands it as
part of the "[namespace name]"). To me, it is very natural to include
the path to an element and I wonder why it escaped his considerations.
._ also, to even compare documents (and/or dox sections) they should
first have structural and type affinity on their schemas, at least on
the sections that are being compared,
._ the order of elements of similar children from the same path should
not really matter (this can be easily/practically solved by sorting
them all). These two sections of XML "instances" should be equal
<node4>
<children>young er child: Paul<children>
<children>old er child: Mary<children>
</node4>
and
<node4>
<children>old er child: Mary<children>
<children>young er child: Paul<children>
</node4>
._ if an attribute is not mandatory, should these two sections be the
same?
<node4>
<children>old er child: Mary<children>
<children>young er child: Paul<children>
</node4>
and
<node4>
<children adopted="true"> older child: Mary<children>
<children>young er child: Paul<children>
</node4>
Also I would be obvious that you should exclude comments while
comparing XML dox, but why ignoring processing instructions, when they
give important type and reference info that defines the included data?
Thanks
otf
In article <11************ **********@g49g 2000cwa.googleg roups.com>,
onetitfemme <on************ *@yahoo.com> wrote: ._ I would also include the path to the element, just the path, NOT the content of all elements in the path
I don't understand why you would do that. If the elements don't have
the same path from the root, you wouldn't be comparing them at all.
Unless you are considering comparison of fragments of documents, in
which case you probably don't care about the position in the document.
._ also, to even compare documents (and/or dox sections) they should first have structural and type affinity on their schemas, at least on the sections that are being compared,
XML documents aren't required to have any kind of schema. This would
be equality on documents+schem as, not documents.
._ the order of elements of similar children from the same path should not really matter (this can be easily/practically solved by sorting them all).
This requires knowledge of the interpretation of the document that is not
inherent in the document itself. Given some kind of schema, it might be
appropriate to interpret the children as a set rather than a sequence,
but in that case you are again not comparing documents themselves, but
the data models resulting from application of a schema to the documents.
._ if an attribute is not mandatory, should these two sections be the same?
As XML documents, they would be different. According to some
interpretation, they might be the same. Optional attributes
are not always interpreted as supplying optional information: their
absence may be as significant as their presence.
Also I would be obvious that you should exclude comments while comparing XML dox, but why ignoring processing instructions, when they give important type and reference info that defines the included data?
Processing instructions are used for many different purposes. But their
obvious canonical use is to specify the processing of (part of) the
document rather than its content.
-- Richard
> Richard Tobin wrote ...
Hi *, ._ I would also include the path to the element, just the path, NOT the content of all elements in the path
I don't understand why you would do that. If the elements don't have the same path from the root, you wouldn't be comparing them at all.
"If the elements don't have the same path from the root, you
wouldn't be comparing them at all"
otf: exactly! Here I might be a little biased and/or some intuition
artifacts might be kicking in. We theoretical physicists
"naturally" think this way. You may go LOL, but to us if more
people board a train, it might still reach its end, but the trajectory
will definitely not be the same ;-)
Jokes aside now, to me (in an ontology (well structure hierarchical
tree-like depedency)) the Path to an element is as important as the
element itself Unless you are considering comparison of fragments of documents, in which case you probably don't care about the position in the document.
"fragments of documents"
otf: am I considering, but I still care about the position in the
document. ._ also, to even compare documents (and/or dox sections) they should first have structural and type affinity on their schemas, at least on the sections that are being compared,
XML documents aren't required to have any kind of schema. This would be equality on documents+schem as, not documents.
"equality on documents+schem as, not documents."
otf: exactly! "structural and type affinity on their schemas ..."
should be very important to even consider any type of comparison
._ the order of elements of similar children from the same path should not really matter (this can be easily/practically solved by sorting them all).
This requires knowledge of the interpretation of the document that is not inherent in the document itself. Given some kind of schema, it might be appropriate to interpret the children as a set rather than a sequence, but in that case you are again not comparing documents themselves, but the data models resulting from application of a schema to the documents.
otf: granted! But how is it that you would not interpret the children
as a set, if no other indication has been explicitly indicated in the
schema?
Actually the data models resulting from the COMPLIANCE of documents to
a schema, so that they become actionable data for an XML application
._ if an attribute is not mandatory, should these two sections be the same? As XML documents, they would be different. According to some interpretation, they might be the same. Optional attributes are not always interpreted as supplying optional information: their absence may be as significant as their presence.
otf: OK. I think I have started to see that there might not be such
thing as "XML equality" (as you have e.g. for mathematical
magnitudes), but degrees thereof Also I would be obvious that you should exclude comments while comparing XML dox, but why ignoring processing instructions, when they give important type and reference info that defines the included data?
Processing instructions are used for many different purposes. But their obvious canonical use is to specify the processing of (part of) the document rather than its content.
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I am thinking of tones of web pages (and/or any other marked up dox)
as a huge forest of texts where "links" among them are not only
given though URLs, but though their structure as well.
I understood something from your comments when you talked about the
"position in the document" (of an element) I think I am missing
something. Even the path to the elements might not be enough to an
accurate description of "equality", but since "degrees thereof"
might be important as well, even the closed graphs to the point an
element is should be considered
Thanks
otf
just found a really good article which answers my XML diffing doubts to
a large extent http://www.mulberrytech.com/Extreme/...haffert01.html
Structure-Preserving Difference Search for XML Documents
by E. Schubert, S. Schaffert, and F. Bry
abstract:
Current XML differencing applications usually try to find a minimal
sequence of edit operations that transform one XML document to another
XML document (the so-called "edit script"). In our conviction, this
approach often produces increments that are unintuitive for human
readers and do not reflect the actual changes. We therefore propose in
this article a different approach trying to maximize the retained
structure instead of minimizing the edit sequence. Structure is thereby
not limited to the usual tree structure of XML - any kind of structural
relations can be considered (like parent-child, ancestor-descendant,
sibling, document order). In our opinion, this approach is very
flexible and able to adapt to the user's requirements. It produces more
readable results while still retaining a reasonably small edit
sequence.
Keywords: Web; XML; Difference This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Alexander Block |
last post by:
Hello newsgroup,
let's say I have a function like
template<class Type>
inline bool areEqual(const Type &a, const Type &b)
{
return ( a == b );
}
|
by: Ike Naar |
last post by:
In K&R "The C++ programming language (2nd ANSI C edition), the reference
manual states (paragraphs 7.9 and 7.10) that pointer comparison is
undefined for pointers that do not point to the same object.
So if we have
const char * foo = "foo" , * bar = "bar" ;
int foobar = ( foo == bar ) ;
would it mean that foobar is undefined?
|
by: Matt Burland |
last post by:
I'm a little confused about the way the default equality operator works with
classes. Here's the situation, I have two comboboxes that are each filled
with different object (i.e. ComboBox1 contains objects of class A, ComboBox2
contains objects of class B). What I'm trying to do is determine if a given
object is contained in one of the comboboxes, i.e.:
Combobox1.Items.Contains(MyA);
Combobox2.Items.Contains(MyB);
Now the problem is...
|
by: Marcel Sottnik |
last post by:
Hallo NG
Does anyone have an idea how could one implement, a general routine for
value equality ?
I mean something using Reflections to get all the members of a class and
compare them recursivelly until the comparison of value type is reached. I
found in Reflections only getting of public members. Although the internal
state of an object should not be interesting for evalution of equality I
would like to make a value equality check also...
|
by: spam.noam |
last post by:
Hello,
Guido has decided, in python-dev, that in Py3K the id-based order
comparisons will be dropped. This means that, for example, "{} < "
will raise a TypeError instead of the current behaviour, which is
returning a value which is, really, id({}) < id().
He also said that default equality comparison will continue to be
identity-based. This means that x == y will never raise an exception,
as is the situation is now. Here's his reason:
| |
by: Gary Brown |
last post by:
Hi,
In C#, how do you determine two objects are the "same" rather
than "equal?" In C/C++ you can check the addresses and LISP
provides a rich set of equality operators but C# appears ambiguous.
Search of the on-line documentation of "equal" and "same" yielded
nothing useful.
Thanks,
|
by: Edward Diener |
last post by:
Now that operator overloading allows to ref classes to be compared for
equality using == syntax, how does one compare the actual ref pointers (
^ ) for equality instead ?
As an example:
SomeRefObject ^ obj1(..initialized somehow);
SomeRefObject ^ obj2(..initialized somehow);
if (obj1 == obj2) // This compares the objects themselves for equality
|
by: toton |
last post by:
Hi,
I have a struct Point { int x, int y; }
The points are stored in a std::vector<Pointpoints; (global vector)
I want to add equality (operator == ) for the point, which will check
equality based on the position of the point in the vector rather than
its x,y or any other criterion. Thus 2 free point (which are not in
the vector are always unequal ) and so on.
How to add this kind of equality operator ? Is comparing memory
location like...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |