473,796 Members | 2,661 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Implement sorting in an XML file?

Hi all,

I am confused about how to sort an XML file. I mean how to *actually*
sort the data in the physical file, not how to display sorted data. I
am using a large XML file as a back-end database, and am making many
inserts and updates using the XmlDocument class. But I need to make the
XML file human readable too, and want to physically sort the data in
the file, every time an insert is made. At present I'm having to use a
tool like Stylus Studio to manually sort the data. Is there a way to do
it programmaticall y ?

My XML file is something like :

<BOOKDATA>
<BOOK>
<NAME>Book 1</NAME>
<AUTHOR>Tom</AUTHOR>
<PRICE>20.00</PRICE>
</BOOK>
<BOOK>
<NAME>Book 2</NAME>
<AUTHOR>Fred</AUTHOR>
<PRICE>30.00</PRICE>
</BOOK>
</BOOKDATA>

Thanks in advance,
Regards,
--------------------------------
From: Cerebrus99

Feb 6 '06 #1
7 1782


Cerebrus99 wrote:

I am confused about how to sort an XML file. I mean how to *actually*
sort the data in the physical file, not how to display sorted data. I
am using a large XML file as a back-end database,
But there are data bases which give you a lot of power like indexing
columns.

At present I'm having to use a
tool like Stylus Studio to manually sort the data. Is there a way to do
it programmaticall y ?


XSLT is a programming language suitable to transform XML to another XML
structure where sorting is one possible restructuring directly supported
with the xsl:sort element.
So you could write an XSLT stylesheet to sort your XML as needed and
then save the transformation result back.
..NET 1.x has XSLT support with XslTransform, .NET 2.0 with
XslCompiledTran sform.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Feb 6 '06 #2
Cerebrus99 wrote:
Hi all,

I am confused about how to sort an XML file. I mean how to *actually*
sort the data in the physical file, not how to display sorted data.
You can't sort it in-place. You have to write some code which sorts it
to another file and then copy it back (or the equivalent in whatever
environment you are using it).
I am using a large XML file as a back-end database, and am making many
inserts and updates using the XmlDocument class.
This sounds as if it may have serious efficiency implications if the
file really is "large". (How large is large for you? Some people would
consider a 500Gb file small. Others think 32kb is big.)
But I need to make the
XML file human readable too, and want to physically sort the data in
the file, every time an insert is made. At present I'm having to use a
tool like Stylus Studio to manually sort the data.
As Martin said, there are databases which will offer you ordered access.
Is there a way to do it programmaticall y ?

My XML file is something like :

<BOOKDATA>
<BOOK>
<NAME>Book 1</NAME>
<AUTHOR>Tom</AUTHOR>
<PRICE>20.00</PRICE>
</BOOK>
<BOOK>
<NAME>Book 2</NAME>
<AUTHOR>Fred</AUTHOR>
<PRICE>30.00</PRICE>
</BOOK>
</BOOKDATA>


The following XSLT code will sort that file on AUTHOR.

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="BOOKDATA ">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:for-each select="BOOK">
<xsl:sort select="AUTHOR"/>
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

But you really, *really* don't want to be running this after every
update: the overhead would be horrendous. Overnight, perhaps.

There are other, faster, non-XML ways to sort files, but they rely
on the file having a specific physical layout, and because they are
non-XML, if you break the format they expect, your data is trash.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Feb 6 '06 #3
Hi Martin and Peter,

Firstly, thanks for your prompt and helpful replies.

Firstly my reason for using XML here, instead of a database like SQL
Server, was more for a learning experiment, than performance
considerations.

I get your point. I need to use XSLT to sort the XML and then copy the
result back. I cannot sort it in-place. That really clears up my
doubts. Actually I wasn't aware that I could use XSLT to transform XML
into another XML structure. I will however have to study deeper as to
how to implement that. I am using .NET Framework 1.1. Since my XSLT
isn't very strong, could you suggest any links that give a step by step
guide on how to do this.(Copy an XML file to another). I'm still trying
to understand the sample code you've attached, Peter. :-(

As to the size of the XML file, I presume that it will grow to a
maximum of 1-2 MB. I figure that would not impose serious performance
implications to discourage the use of XML. But thanks Peter, for
reminding me that "large file" is a relative term, and I should have
been more precise.

P.S.: Peter, I found the "WTF" section on your site, very interesting !
;-)

Thanks a ton,
Regards,
Cerebrus99

Feb 7 '06 #4
Cerebrus99 wrote:
I get your point. I need to use XSLT to sort the XML and then copy the
result back. I cannot sort it in-place. That really clears up my
doubts. Actually I wasn't aware that I could use XSLT to transform XML
into another XML structure.
That's XSLT's primary purpose...T for Transformations .
I will however have to study deeper as to
how to implement that. I am using .NET Framework 1.1. Since my XSLT
isn't very strong, could you suggest any links that give a step by step
guide on how to do this.(Copy an XML file to another). I'm still trying
to understand the sample code you've attached, Peter. :-(
I can't help with .NET, I'm afraid, but as for the code:
<xsl:template match="*">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
This is an "identity transform" template: it matches all element nodes
which are not matched by any other template (the * is the wildcard),
and copies them exactly as they stand into the destination tree. In
each element thus processed, it then loops through any attributes and
copies them to the destination tree as well. You aren't using any
attributes in your example, but just in case. Finally it applies any
matching template to any child element nodes.
<xsl:template match="BOOKDATA ">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:for-each select="BOOK">
<xsl:sort select="AUTHOR"/>
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
This template matches a BOOKDATA element node. It again copies itself
to the destination tree, plus its attributes, but then it goes through
each BOOK element within it, sorts them by AUTHOR (you could specify a
different element type name if needed), and then performs an
"apply-templates" on itself ("." refers to the current context, in this
case each BOOK element node as it is handled), which makes the processor
search for a matching template...whic h in all cases will be matched by
the * template above.

The effect is to output everything as-is except that the BOOK elements
will be serialized (written out) in the sorted order.

It's probably a lousy way to sort the data in an XML document, but it
works.
As to the size of the XML file, I presume that it will grow to a
maximum of 1-2 MB.
That's pretty small, but it will take a measurable number of seconds
to process if there's the overhead of running XSLT afresh each time.
I figure that would not impose serious performance
implications to discourage the use of XML.


Not to discourage XML, but perhaps to discourage using XSLT to sort a
file repetitively in real time at the kind of rate you seem to be
indicating.

The real question is, why do you want to sort the file each time? If
it's so that the data appears sorted when the next access comes up,
that could be handled in many other different ways: it doesn't have
to be done by physically re-sorting the XML on disk afresh each time.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Feb 9 '06 #5
Hi Peter,

Thanks for that awesome explanation of the intricacies of XSLT. The good
news is that I managed to write an XSL file to sort my data completely as I
wanted. Thanks for your pointers ! I was able to use the XslTransform class
to transform my XML file into another sorted version.

However, I came up against another couple of problems :
1. My new "sorted.xml " file had it's XML declaration missing !

The first 2 lines in my Original XML doc were :
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="HtmlView. xslt" ?>

In sorted.xml, the first line now is :
<?xml-stylesheet type="text/xsl" href="HtmlView. xslt" ?>

My XSLT file has the following first 3 lines :
<?xml version="1.0" encoding="utf-8" ?>
<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no "/>

However, this happens only when I use my XmlDocument object as a parameter
for the XslTransform.Tr ansform() method :
-> xslt.Transform( MyXmlDoc, Nothing, MyXmlTextWriter )

If I simply use the file names as :
-> xslt.Transform( "Main.xml", "sorted.xml "), then I don't get this problem.

Upon googling for this problem, I found this :
http://msdn.microsoft.com/library/de...us/cpguide/htm
l/cpconinputsoutp utstoxsltransfo rm.asp

It does mention a discouraging statement : "The <xsl:output> statement is
ignored when the output of the XslTransform.Tr ansform method is an XmlReader
or XmlWriter." !!!
But it offers no solution or workaround in this regard. Any ideas on this ?

Thanks again,
Warm regards,
Cerebrus.

----------------------------------------------------------


Feb 9 '06 #6
Oops ! Forgot to answer your questions :
It's probably a lousy way to sort the data in an XML document, but it works. That's pretty small, but it will take a measurable number of seconds to process if there's the overhead of running XSLT afresh each time. Not to discourage XML, but perhaps to discourage using XSLT to sort a file repetitively in real time at the kind of rate you seem to be indicating.
I'm only implementing the sorting feature in my application as a Database
maintenance mechanism, that would probably be run rarely anyway. What other
more efficient ways would you suggest to implement sorting ?
The real question is, why do you want to sort the file each time? If it's

so that the data appears sorted when the next access comes up, that could be
handled in many other different ways: it doesn't have to be done by
physically re-sorting the XML on disk afresh each time.

Again, what other ways would you suggest ?

BTW, the MSDN link in my prev. post got broken up into 2 lines. So direct
clicking won't help. :-(

Thanks and Regards,

Cerebrus.
Feb 9 '06 #7
Cerebrus99 wrote:
Oops ! Forgot to answer your questions :
It's probably a lousy way to sort the data in an XML document, but it works.
That's pretty small, but it will take a measurable number of seconds to

process if there's the overhead of running XSLT afresh each time.
Not to discourage XML, but perhaps to discourage using XSLT to sort a file

repetitively in real time at the kind of rate you seem to be indicating.

I'm only implementing the sorting feature in my application as a Database
maintenance mechanism, that would probably be run rarely anyway.


Ah, OK. I had assumed from your original post that this was something
that would be happening every few seconds.
What other more efficient ways would you suggest to implement sorting ?


For a simple structure, the fastest is a non-XML sorter, but it means
the physical layout of the file becomes important, which goes against
the philosophy of XML, where some white-space can be irrelevant.

///Peter
Feb 9 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
1423
by: Lad | last post by:
Hi, I have a file of records of 4 fields each. Each field is separated by a semicolon. That is Filed1;Ffield2;Field3;Field4 But there may be also empty records such as ;;;; (only semicolons).
22
4173
by: mike | last post by:
If I had a date in the format "01-Jan-05" it does not sort properly with my sort routine: function compareDate(a,b) { var date_a = new Date(a); var date_b = new Date(b); if (date_a < date_b) { return -1; } else
20
4085
by: Xah Lee | last post by:
Sort a List Xah Lee, 200510 In this page, we show how to sort a list in Python & Perl and also discuss some math of sort. To sort a list in Python, use the “sort” method. For example: li=;
3
1784
by: Erik Harris | last post by:
I apologize if this is a stupid question - I'm relatively new to OOP. I have a property that must exist in a class in order to be used by another class. The property, however, does not change with each instance (it returns an instance of a delegate that points to the same method no matter what the instance). I thought the best way to make sure that Class1 could be used by Class2 would be to create an interface that defined this property,...
25
2238
by: Dan Stromberg | last post by:
Hi folks. Python appears to have a good sort method, but when sorting array elements that are very large, and hence have very expensive compares, is there some sort of already-available sort function that will merge like elements into a chain, so that they won't have to be recompared as many times? Thanks!
7
4829
by: Kamal | last post by:
Hello all, I have a very simple html table with collapsible rows and sorting capabilities. The collapsible row is hidden with css rule (display:none). When one clicks in the left of the expandable row, the hidden row is made visible with css. The problem is when i sort the rows, the hidden rows get sorted as well which i don't want and want to be moved (while sorting) relative to their parent rows. The following is my complete html code...
1
4162
by: =?Utf-8?B?YmJkb2J1ZGR5?= | last post by:
I have a grid view that pulls data from a dbf file. I set the Allow Sorting to true and I put my code in the Sorting event. The problem is that I can't get the sorting to work so I wrote some info to a text file to see what is happening. What I found out is that when I clicked on a column to sort it the Sorting function was continuously being called thus causing the Default Application Pool in IIS to stop. Here is the code I am using
3
7348
KevinADC
by: KevinADC | last post by:
If you are entirely unfamiliar with using Perl to sort data, read the "Sorting Data with Perl - Part One and Two" articles before reading this article. Beginning Perl coders may find this article uses unfamiliar terms and syntax. Intermediate and advanced Perl coders should find this article useful. The object of the article is to inform the reader, it is not about how to code Perl or how to write good Perl code, but to teach the Schwartzian...
5
4960
by: jrod11 | last post by:
hi, I found a jquery html table sorting code i have implemented. I am trying to figure out how to edit how many colums there are, but every time i remove code that I think controls how many colums there are, it crashes. There are currently 6 columns, and I only want 4. How do I remove the last two (discount and date)? Here is a link: http://www.jaredmoore.com/tablesorter/docs/salestable.html Here is some jquery js that I think...
0
9684
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9530
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10017
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9055
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6793
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5445
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5577
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3734
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2928
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.