473,396 Members | 1,995 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Design Advice: XML and Database Comparison VB.Net

Hello Everyone,

I'm trying to build a web-based application for a client of mine and I keep finding holes in my design, so I could use some guidance if anyone has any to offer. Let me try to explain what it is I'm trying to do and hopefully someone has an idea that's not going to take me a long time to implement and isn't above my not-so-advanced skill level. My attempts with data tables and arraylists have failed thus far. Looks like my Sunday just got ruined, here we go...

I have a XML file that is created with real estate data that is pulled from a database. This database feed is through a RETS connector, so I have to set up an "admin" section where a single connection is made, data is read, and the XML file is created. I'm not able to either run the entire website off the RETS connection or create a new database because of connection and compatibility issues, so I'm left with XML.

Now, currently I have it set up where the RETS connection is made and a new XML file is created with the new data overwriting the existing one. The problem I'm having is with finding only the new or old data so I can either A) download it and the images associated with it if it's new data or B) if it's data that is no longer in the database, remove it and delete the images. Like I said I've tried a bunch of things with data tables and arraylists, but my attempts have not been successful.

Expand|Select|Wrap|Line Numbers
  1. XML Layout---
  2. <properties>
  3. <details>
  4. <id>13515</id>
  5. <type>Single Family</type>
  6. <bedrooms>2</bedrooms>
  7. .....
  8. </details>
  9. <details>
  10. <id>1534</id>
  11. <type>Condo</type>
  12. <bedrooms>1</bedrooms>
  13. .....
  14. </properties>
  15.  
My attempt to remedy the issues I'm having was to read the current XML file at start (before writing the new one) and add the unique "id" field to a data table or arraylist. Then when I read the database, create another data table or arraylist as I'm writing the new XML. This would leave me with two sets of data: the old ids (real estate properties that were in the XML before connecting) and the new ids (real estate properties that were found when connecting to the database). Now once I have this data, here is what I need to do...

1. Remove any records that have duplicates along with their originals. This would leave me with only properties that have changed. (if you've seen my other posts recently, this is what I've been trying to do but last night it hit me that the next 2 tasks are going to be more difficult to implement)

2. Once I'm left with the rest of ids, I know that either A) they are old data that has been removed from the database or B) new data that has been added to the database since we last updated. Distinguishing between the 2 is my main problem. I already have the code set up to complete the tasks (download/delete images, etc), but my problem is making sure it's doing the correct thing with each id.

Little more on my RETS connection: It has 1700+ properties and I'm pulling a lot more data than the example XML above. The client wishes to run the app in the morning, updating the XML for the site so it has all the latest properties. I've already spent many hours getting the "base" data into my directories, now I have to make sure it's only finding the new properties and removing old ones so it runs for minutes rather than hours and hours like it took to get the 10,000+ photos.

One idea that has hit me as I'm typing this out is some kind of XML comparison. Would I be able to create 2 seperate XML files and compare/contrast them efficiently?

Thanks a million (again) if anyone can help me.
Sep 21 '08 #1
1 2115
mldisibio
190 Expert 100+
It's been awhile since I've manipulated XML data, but your synchronization task is common enough. I can give some general pointers.
1. I Hope that whatever searching through new xml or old xml you do you are using XPath queries along with either the .NET Xml libraries or with the MSXML DOM. Given that your xml schema is fairly straightforward, you should be able to read each id from one file and find or not find it in the second file.
Using System.Xml.Xpath.XPathNavigator.Select(...) (Framework) or XmlDomNode.SelectNodes(...) (MSXML) should eliminate any need for helper arrays or other structures...not that it was a bad idea...but simply not necessary. Essentially, selecting nodes with an xpath query returns a node list of id's and/or as many other fields you want to store or compare.

2. Consider this path for your synchronization:
a. Remove any nodes from yesterday's file whose ID is not found in today's file.
b. Create a third XmlDoc of new inserts: any node from today's file not in yesterday's file. As you add these nodes to the new xml doc, remove them completely from today's source file.
c. At this point, yesterday's file and today's file should have the same nodes and id's. Now I am assuming that you have an option of comparing the data before doing an additional time-consuming photo retrieval? Because if you already have all the data, you are done.
- If you can compare before retrieving more data:
You need to figure out how to most efficiently compare them to see if today's data represents an update, but that will need to be done somehow. I suggest that as you finish comparing each node, you remove it from today's file, which reduces search time on that one, and optionally, you remove it from yesterday's file and add it to the third "new" data to reduce the size of yesterday's file as well.
d. Finally, you simply combine the "new data" xml file with the "updated file". If you had to do a comparison, then yesterday's data is now the updated data (or optionally you already merged it one by one into the new file). If you skipped comparison because everything was already in today's file, then you just merge today's "updates" with the new "inserts".
I realize this is rather high level, but I hope it helps somewhat. Once you start writing node comparisons, you can tweak performance by using the correct node readers/writers. If any of this is unfamiliar, you can read about it starting at: Process Xml Data Using the XPath Data Model
Sep 23 '08 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: nmac | last post by:
Hi all, hopefully someone can offer some sagely advice regarding Production use of Jakarta's Tomcat. First, some brief background. My company have a servlet application that connects to a MySQL...
43
by: Davey | last post by:
I am planning on developing an application which will involve skills that I have very little experience of - therefore I would appreciate comments on my initial design thoughts. Overview on...
2
by: John C | last post by:
I am trying to develop a access database version 2002 from scratch and I am a novice programmer and need much direction. I have been researching and studying about relational database design and...
3
by: James Armstrong | last post by:
Hi all, (warning - long post ahead) I have been tasked with designing a database for my company which will store trade information (it is a financial firm). It will need to export this info...
6
by: Santosh | last post by:
Hello, I would like some input on choosing a datastructure and a algorithm. I have a text file which contains three strings(say name, phonenumber and city). The file contains a about a billion...
3
by: Rob Thomas | last post by:
Hi, I'm just getting started with real OO design and programming and am after a little advice. Basically, I've got a Customer class and an Agency class. The Agency class inherits all the...
4
by: Jerry | last post by:
Hello! I'm creating an application where users can enter their hours worked on certain projects and have them saved to a database. Right now there are about 60 different projects and that number...
23
by: JohnH | last post by:
I'm just recently come to work for an auto brokerage firm. My position involves performing mysterious rites, rituals and magick in order to get information out of their access database. This is...
2
by: existential.philosophy | last post by:
This is a new problem for me: I have some queries that open very slowly in design view. My benchmark query takes about 20 minutes to open in design view. That same query takes about 20 minutes...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.