469,281 Members | 2,486 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,281 developers. It's quick & easy.

Merging 2 different XML files...

Hi,

I have two files to merge using Java based on a similar text identifier:

File 1:
Expand|Select|Wrap|Line Numbers
  1. <ListRecords>
  2. <record> 
  3. <header> 
  4. <identifier>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier> 
  5. <datestamp>2007-05-29T15:55:00Z</datestamp> 
  6. <datestampasdatetime>2007-05-29T17:55:00+02:00</datestampasdatetime> 
  7. </header> 
  8. <metadata> 
  9. <lom xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"> 
  10. <general > 
  11. <identifier>
  12. <catalog>oai</catalog>
  13. <entry>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</entry>
  14. </identifier
  15. <title> 
  16. <langstring> 
  17. <value>Graduation mw. S. de Caralt</value> 
  18. <language>en</language> 
  19. </langstring> 
  20. </title> 
  21. <catalogentry> 
  22. <catalog>nl.wur.wurtv</catalog> 
  23. <entry> 
  24. <langstring> 
  25. <value>2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</value> 
  26. <language>x-none</language> 
  27. </langstring> 
  28. </entry> 
  29. </catalogentry> 
  30. <grouplanguage>en</grouplanguage> 
  31. <description> 
  32. <langstring> 
  33. <value>Sponge Culture: Learning from Biology and Ecology</value> 
  34. <language>en</language> 
  35. </langstring> 
  36. </description> 
  37. </general> 
  38. <lifecycle xmlns="" /> 
  39. <metametadata > 
  40. <metadatascheme>LORENET</metadatascheme> 
  41. </metametadata> 
  42. </lom> 
  43. </metadata> 
  44. </record>
  45. <….More Records here…..!>
  46. </ListRecords>
  47.  
File 2:
Expand|Select|Wrap|Line Numbers
  1. <ListRecords>
  2.  <record>
  3.  <header>
  4.   <identifier>some value herer</identifier> 
  5.   <datestamp>2008-07-14T09:23:25Z</datestamp> 
  6.   </header>
  7.  <metadata>
  8.  <group xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"">
  9.   <title>User manipulating this</title> 
  10.  <feed>
  11.   <title>My feed</title> 
  12.   <url>http://no.url.available</url> 
  13.  <item>
  14.   <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> 
  15.  <events>
  16.  <event>
  17.   <dateTime>2008-03-26T13:27:49.00</dateTime> 
  18.  <action>
  19.   <actionType>doSomeAtcion</actionType> 
  20.   </action>
  21.   </event>
  22.   </events>
  23.   </item>
  24.   </feed>
  25.   </group>
  26.   </metadata>
  27.   </record>
  28. <....More Records here....!>
  29. </ListRecords>
I want to merge <metadata> element and all its sub elements from file 1 into the file 2 within its <metadata element> based on unique text of element "<identifier>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier>" in file 1and similar ID <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> in file 2.

Any suggestions and guidelines will be highly appreciated.

Thnx.
Nov 26 '08 #1
11 5646
sorry i forgot to mention, i have to use Java to merge them...
Nov 26 '08 #2
jkmyoung
2,057 Expert 2GB
Need to define:
  • Rows/identifiers
  • Fields to be merged
  • Merging rules


Please correct if any of the following is wrong.
Assumptions from looking at the code:

Fields summarized in xpaths:
File 1
rows: /ListRecords/record
row id: header/identifier

File 2
rows: /ListRecords/record/metadata/group/item


Let's look at the seperate xml sections to be merged:
File 2:
Expand|Select|Wrap|Line Numbers
  1.  <item>
  2.   <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> 
  3.  <events>
  4.  <event>
  5.   <dateTime>2008-03-26T13:27:49.00</dateTime> 
  6.  <action>
  7.   <actionType>doSomeAtcion</actionType> 
  8.   </action>
  9.   </event>
  10.   </events>
  11.   </item>
  12.  
Is this technically a 'join' ? Eg are you just adding fields from one file to another, or are you copying over existing fields?


Since you're merging into a file I would recommend either:
1. DOM. Open both files with DOM. Add nodes to File1 DOM. Save back to file.
2. XSLT. Performance may be less than optimal, but code is much more maintainable.
Nov 26 '08 #3
Thnx. a lot for ur reply i was so worried about it as i have a deadline
Actually i want to join the record of similar id from file 2 into file 1 after the file 1 record for that id ends, the output might look like:
Expand|Select|Wrap|Line Numbers
  1. <ListRecords>
  2. <record>
  3. <header>
  4. <identifier>some value here</identifier> 
  5. <datestamp>2008-07-14T09:23:25Z</datestamp> 
  6. </header>
  7. <metadata>
  8. <group xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"">
  9. <title>User manipulating this</title> 
  10. <feed>
  11. <title>My feed</title> 
  12. <url>http://no.url.available</url> 
  13. <item>
  14. <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> 
  15. <events>
  16. <event>
  17. <dateTime>2008-03-26T13:27:49.00</dateTime> 
  18. <action>
  19. <actionType>doSomeAtcion</actionType>  
  20. </lom> 
  21. </action>
  22. </event>
  23. </events>
  24. </item>
  25. </feed>
  26. </group>
  27. <header> 
  28. <identifier>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier> 
  29. <datestamp>2007-05-29T15:55:00Z</datestamp> 
  30. <datestampasdatetime>2007-05-29T17:55:00+02:00</datestampasdatetime> 
  31. </header> 
  32. <metadata> 
  33. <lom xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"> 
  34. <general > 
  35. <identifier>
  36. <catalog>oai</catalog>
  37. <entry>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</entry>
  38. </identifier
  39. <title> 
  40. <langstring> 
  41. <value>Graduation mw. S. de Caralt</value> 
  42. <language>en</language> 
  43. </langstring> 
  44. </title> 
  45. <catalogentry> 
  46. <catalog>nl.wur.wurtv</catalog> 
  47. <entry> 
  48. <langstring> 
  49. <value>2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</value> 
  50. <language>x-none</language> 
  51. </langstring> 
  52. </entry> 
  53. </catalogentry> 
  54. <grouplanguage>en</grouplanguage> 
  55. <description> 
  56. <langstring> 
  57. <value>Sponge Culture: Learning from Biology and Ecology</value> 
  58. <language>en</language> 
  59. </langstring> 
  60. </description> 
  61. </general> 
  62. <lifecycle xmlns="" /> 
  63. <metametadata > 
  64. <metadatascheme>LORENET</metadatascheme> 
  65. </metametadata> 
  66. </lom> 
  67. </metadata> 
  68. </metadata>
  69. </record>
  70.  
i have to do this for almost 10 records for similar ids in both files
Nov 26 '08 #4
jkmyoung
2,057 Expert 2GB
Considering this, I would probably use xslt.

Driving xslt in Java (sample):

Expand|Select|Wrap|Line Numbers
  1. //set file names
  2. File file1 = new File("Filename1.xml");
  3. String filename2 = "Filename2.xml";
  4. File xslt = new File("FileXSLT.xslt");
  5. File dest = new File("resultFile.xml");
  6.  
  7. //build transformer
  8. TransformerFactory xformFactory = TransformerFactory.newInstance();
  9. transformer = xformFactory.newTransformer(new StreamSource(xslt)); 
  10.  
  11. // set file2 filename parameter
  12. transformer.setParameter("file2", FileName2);
  13.  
  14. // Modularization :( looks stupid, but actually makes it perform better.
  15. DocumentBuilderFactory docBuildFactory = DocumentBuilderFactory.newInstance();
  16. DocumentBuilder parser = docBuildFactory.newDocumentBuilder();
  17. Document document = parser.parse(file1);
  18.  
  19. transformer.transform(new StreamSource(source), new StreamResult(dest));
  20.  
For more info, google "java xslt transformation"

XSLT: Starting with a copy template, add template for the proper fields to merge them. I'm having trouble seeing which fields need to be merged, so I hope you can figure it out from the example.
Expand|Select|Wrap|Line Numbers
  1. <xsl:param name="file2" select="''"/><!-- defaults to empty string -->
  2. <xsl:variable name="doc2" select="document($file2)"/><!-- convert to nodes -->
  3.  
  4. <xsl:template match="*"><!-- copy template -->
  5.   <xsl:copy>
  6.     <xsl:copy-of select="@*"/>
  7.     <xsl:apply-templates/>
  8.   </xsl:copy>
  9. </xsl:template>
  10.  
  11. <xsl:template match="record">
  12.   <xsl:copy>
  13.     <xsl:copy-of select="@*"/>
  14.     <xsl:apply-templates/>
  15.     <!-- add in other stuff here -->
  16.     <xsl:copy-of select="$doc2/ListRecords/record/metadata/group/item[guid = current()/header/identifier]"/>
  17.   </xsl:copy>
  18. </xsl:template>
  19.  
Key line in all of this is:
<xsl:copy-of select="$doc2/ListRecords/record/metadata/group/item[guid = current()/header/identifier]"/>
Copy the item nodes which match the current node's id.

Customize this to merge as you need.
Nov 26 '08 #5
Thank you very much for the reply at-least i got the idea but problem is that i am totally new with XSLT so of-course have no time to start with tutorials due to deadline but still i am trying and i hope to solve it but in case i have any problems i will post them.
Nov 30 '08 #6
Hi,
Thnx. a lot for ur help and tried (still trying) but couldn't manage to write the XSLT file correctly and also its not possible to start with tutorial for xslt from beginning due to deadline so please help me so at-least when this first task is done i will be able to read more about it as tomorrow is deadline :-(

As top elements <ListRecords> and then <record> in both files.This means that this <record> is one unique record based on <identifer> value in file1 (line 4) and <guid> value in file2 (line 14). This unique record of these similar id's in both files have different data elements i mean different fields. I want to merge this unique record of mentioned ID from file 2 into file 1.

There is also this <metadata> element in both files, file1 (line 8 to 42) and in file2 (line 7 to 26) so i want to simply copy this <metadata> element and elements in between (sub-elements) till line 43 from file 2 into file 1 after file1's <metadata> element ends at line 26 and after that last element would be then simply <record>
There are 10 unique records in both files and final file should mention all of them in a similar way so i hope if one is correctly merged others follow the same template match.
Please help me as i am really worried and first task in a new language is always such headache

Best Regards
Nov 30 '08 #7
"There is also this <metadata> element in both files, file1 (line 8 to 42) and in file2 (line 7 to 26) so i want to simply copy this <metadata> element and elements in between (sub-elements) till line 43 from file 2 into file 1 after file1's <metadata> element ends at line 26 and after that last element would be then simply <record>"

Sorry a little mistake in above paragraph i want to merge record from file 1 into file 2 and not the other way around.
Nov 30 '08 #8
jkmyoung
2,057 Expert 2GB
Could you show us what you have so far? If you can get the first few fields copying correctly, then it'll be easier to figure out mistakes you're making with the rest.
Dec 1 '08 #9
Thnx. for the reply..
Actually i only changed the xpath u provided as i made a mistake while mentioning which file to copy so i have to copy data from file 1 into file 2 under record element based on that unique ID. So i only changed xpath in the sample u provided (i am not sure i did it write as i m messed up) so it is:

Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0"?>
  2. <xsl:stylesheet version = '1.0'
  3. xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
  4. <xsl:output method="xml" indent="yes"/>
  5. <xsl:param name="file1" select="''"/><!-- defaults to empty string -->
  6. <xsl:variable name="doc1" select="document($file1)"/><!-- convert to nodes --> 
  7. <xsl:template match="*"><!-- copy template -->
  8. <xsl:copy>
  9. <xsl:copy-of select="@*"/>
  10. <xsl:apply-templates/>
  11. </xsl:copy>
  12. </xsl:template> 
  13. <xsl:template match="record">
  14. <xsl:copy>
  15. <xsl:copy-of select="@*"/>
  16. <xsl:apply-templates/>
  17. <!-- copy data from file 1 into file 2 based on guid in file 2 -->
  18. <xsl:copy-of select="$doc1/ListRecords/record/header[identifier = current()/item//feed/guid]"/> <!-- dont know whether where will it copy that data and under which element of file 2 -->
  19. </xsl:copy>
  20. </xsl:template>
  21. </xsl:stylesheet>
  22.  
So i dont know how to copy all metadata files from file one into file 2 exactly after file 2 metadata element ends. I know i didnt do much...
Hope u would help to solve it.
Dec 1 '08 #10
jkmyoung
2,057 Expert 2GB
The easiest way I can think of (not the best programatically) is to have a last metadata template. Use xpath like: "metadata[not(following::metadata)]"
Expand|Select|Wrap|Line Numbers
  1. <xsl:template match="metadata[not(following::metadata)]">
  2.   <xsl:copy>
  3.     <xsl:copy-of select="@*"/>
  4.     <xsl:apply-templates/>
  5.   </xsl:copy>
  6.   <!-- add rest from other file -->
  7.   <xsl:copy-of select="$doc1//metadata"/>
  8. </xsl:template>
  9.  
Dec 1 '08 #11
Thnx. a lot for the help..
Yes it works but only in case i have one record in each of the files but when merging more records, would require some concrete appraoch...
Dec 2 '08 #12

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

3 posts views Thread by Patrick | last post: by
2 posts views Thread by Nikhil Prashar | last post: by
reply views Thread by Naresh Narwani | last post: by
12 posts views Thread by google_groups3 | last post: by
5 posts views Thread by ckoniecny | last post: by
10 posts views Thread by n o s p a m p l e a s e | last post: by
reply views Thread by Albert-jan Roskam | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.