473,398 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

About Xml serialization scalability and persistence

I am developing a Windows Service that is resident on the machine. The
program needs to synchronize certain object list in memory (an object
typed as List<Foo>) with disc, serializing and deserializing XML.

The first simplest technique I have used ATM is the following:

- At program start, if the XML file exists, deserialize it and create an
object with its content.
- Every time the list changes (an element is added or remove), I
serialize the entire object again and I write a new file (overwriting
the old one).

The problem is that I am a little bit concerned about this going to
production, because of the following matters:

- Scalability: What would happen if the list begins to grow to, for
example, 1000 or 10000 elements? (I suppose that the data serialization
would take much longer and the program would go much slower when the
list is modified).

- Data loss: what would happen if the computer looses energy in the
moment when the serializer is writing data to disc? (I suppose that I
would loose the last list written and I would get a new corrupt file, am
I right?).

Do you have any ideas to improve this behaviour?

Thanks in advance.
Andrés [ knocte ]

--
Jan 18 '07 #1
2 1755
Andres,

For performance reasons, the biggest question you need to ask is how
frequently this list will change and you'll be writing this list to disc?
If it's only occasionally, then this will work, but otherwise, I might look
for a different solution.

To ensure that this file is always written out, you might look at Messaging
(MSMQ). It can ensure that a message is processed, and can therefore ensure
that the message is written to disc.

I'd do this in the following manner:

When a synchronization request is made, write a message to the queue with
the new value.
Update the list with the new value.

On a separate thread, you have a queue monitor that looks for new messages
in the queue.
When it sees a new message, it peeks the queue (or reads the queue in a
transaction) and starts the file write process to a temporary file.
Upon completion of the file write process, the new temp file is copied over
the top of the old file and the temp file deleted.
The process then consumes the message from the queue (or commits the
transaction) and looks for the next message to process.

If the power fails while the write process is in progress, worst case is
that you end up with a partial temp file, which wastes disk space. Because
the message hasn't been consumed from the queue, the thread that monitors
the queue will see it after the failure and immediately write out the file
again.

Additionally, if you have multiple threads changing the list, you can ensure
that only one change is processed at a time (you could consume all of the
outstanding messages in one file update as well). You could also do batch
updates to the file. Once at startup, and once every 5 minutes or whatever.
This would limit the impact on the overall performance of the application.

Make sense?

Robert
""Andrés G. Aragoneses [ knocte ]"" <kn****@NO-SPAM-PLEASE-gmail.comwrote
in message news:ef**************@TK2MSFTNGP03.phx.gbl...
>I am developing a Windows Service that is resident on the machine. The
program needs to synchronize certain object list in memory (an object typed
as List<Foo>) with disc, serializing and deserializing XML.

The first simplest technique I have used ATM is the following:

- At program start, if the XML file exists, deserialize it and create an
object with its content.
- Every time the list changes (an element is added or remove), I serialize
the entire object again and I write a new file (overwriting the old one).

The problem is that I am a little bit concerned about this going to
production, because of the following matters:

- Scalability: What would happen if the list begins to grow to, for
example, 1000 or 10000 elements? (I suppose that the data serialization
would take much longer and the program would go much slower when the list
is modified).

- Data loss: what would happen if the computer looses energy in the moment
when the serializer is writing data to disc? (I suppose that I would loose
the last list written and I would get a new corrupt file, am I right?).

Do you have any ideas to improve this behaviour?

Thanks in advance.
Andrés [ knocte ]

--

Jan 18 '07 #2
Robert May escribió:
For performance reasons, the biggest question you need to ask is how
frequently this list will change and you'll be writing this list to disc?
If it's only occasionally, then this will work, but otherwise, I might look
for a different solution.

To ensure that this file is always written out, you might look at Messaging
(MSMQ). It can ensure that a message is processed, and can therefore ensure
that the message is written to disc.

I'd do this in the following manner:

When a synchronization request is made, write a message to the queue with
the new value.
Update the list with the new value.

On a separate thread, you have a queue monitor that looks for new messages
in the queue.
When it sees a new message, it peeks the queue (or reads the queue in a
transaction) and starts the file write process to a temporary file.
Upon completion of the file write process, the new temp file is copied over
the top of the old file and the temp file deleted.
The process then consumes the message from the queue (or commits the
transaction) and looks for the next message to process.

If the power fails while the write process is in progress, worst case is
that you end up with a partial temp file, which wastes disk space. Because
the message hasn't been consumed from the queue, the thread that monitors
the queue will see it after the failure and immediately write out the file
again.

Additionally, if you have multiple threads changing the list, you can ensure
that only one change is processed at a time (you could consume all of the
outstanding messages in one file update as well). You could also do batch
updates to the file. Once at startup, and once every 5 minutes or whatever.
This would limit the impact on the overall performance of the application.

Make sense?

Thanks for your comment.
I already thought about the temp file solution but wasn't sure if it was
the most correct. Perhaps I was dreaming about a more elegant one, that
involved some transactional way of accessing the disc (without not
deletting the original file until finally written the next one). I am
not going to use MSMQ because it's not portable, but the idea is
interesting.

Well, the application won't update the list very frequently but I am
still concerned about rewriting the whole object in each modification
(even if I create a thread so as to write the data in a larger
interval). Isn't there a way to write and remove the data incrementally?

Regards,

Andrés [ knocte ]

--
Jan 18 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: fdsl ysnh | last post by:
--- python-list-request@python.orgдµÀ: > Send Python-list mailing list submissions to > python-list@python.org > > To subscribe or unsubscribe via the World Wide Web, > visit >...
1
by: Ursula Peter-Czichi | last post by:
I have not planned data persistence at the beginning of a larger project. This now causes some problems. Doing some tests, I have figured out how to serialize and deserialize simple variables as...
2
by: Dominic | last post by:
Hi everybody, I'm planning to use serialization to persist an object (and possibly its child objects) in my application. However, I'm concerned about the backward compatibility issue. I'm...
10
by: Simon Harvey | last post by:
Hi everyone, Can anyone tell me if I declare a global variable in my pages code behind, is it persisted if the page does a post back, or do I need to add the object to the session object in...
1
by: Shane Story | last post by:
I am confused on serializtion. I have a class called picture. It has a bitmap member and several others. I have a collection class called pictures which inherits from collectionbase. How...
0
by: Jason Hales | last post by:
XML Serialization is a great feature. Recently I've found myself "leveraging" it on some of my classes and spending far too much time altering a class so that it's resultant XML message is in the...
8
by: Casper | last post by:
Hi, i read several articles about serialization. I know now that it is a process of converting an object into a stream of data so that it can be is easily transmittable over the network or can...
2
by: =?ISO-8859-2?Q?Rafa=B3_Grzybowski?= | last post by:
Hello there, I need to design classes, that can be serialized to XML like this: <?xml version="1.0"?> <Groups xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"...
0
myusernotyours
by: myusernotyours | last post by:
Hi all, Am trying to create a Java Desktop App that uses Java Persistence in Netbeans. The database is MS Access but I tried with Mysql and got the same error. When I run the app( Create the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.