473,320 Members | 1,870 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

problem with parsing xml

1
HEllo,

I've found a script (at this url [url]http://www.thescripts.com/forum/thread84554.html [/ur]) that is like the things i want to do;
i want to parse my xml file and modify the value of an attribute;
for example modify this
<nom name="pivot">
<information valeur="Niveau" type="Bon"/>
</nom>
in that
<nom name="pivot">
<information valeur="Niveau" type="Mauvais"/>
</nom>

So, i've modify the script

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -w
  2.  
  3. use strict;
  4. use XML::XPath;
  5. use XML::XPath::XMLParser;
  6. use XML::Twig;
  7.  
  8. # create an object to parse the file and field XPath queries
  9. # my $xpath = XML::XPath->new( filename => shift @ARGV );
  10. my $xpath = XML::XPath->new( filename => "client.xml" );
  11.  
  12. # apply the path from the command line and get back a list matches
  13. my $field;
  14. my @field = 'string';
  15.  
  16.  
  17. my $old_value = $xpath->find("//nom[\@name='pivot']/information/\@type" );
  18. #find("//nom[\@name='pivot']/information[\@type]/text()" );
  19.  
  20. print $old_value."\n";
  21.  
  22.  
  23. #qq{$field\[string() = "$old_value"]}
  24. my $new_value = 'Tres BOB';
  25. my $t = new XML::Twig( TwigRoots =>
  26. qq{$field\[string() = "$old_value"] => \&update} ,
  27. TwigPrintOutsideRoots => 1,);
  28. $t->parsefile( 'client2.xml' );
  29. $t->flush;
  30.  
  31. sub update
  32. {
  33. my( $t, $field_elt)= @_;
  34. $field_elt->set_text( $new_value);
  35. $field_elt->print;
  36. }



my xml file


<?xml version="1.0" encoding="windows-1250"?>
<root value="x">
<entreprise>some text</entreprise>
<info></info>
<client>
<nom name="pivot">
<information valeur="Niveau" type="Bon"/>
</nom>

<nom name="paul">
<information valeur="Niveau" type="Bon">xxx</information>
<information valeur="Solvable" type="Mauvais">zoooooooo</information>
</nom>
</client>
<client>
<nom name="albine">
<information valeur="Solvable" type="Bon">azer</information>
</nom>
</client>
<client>
<nom name="Terence">
<information valeur="Niveau" type="Tres bon"/>
<information valeur="Solvable" type="Bon"/>
<information valeur="Ancien" type="Oui"/>
</nom>
</client>
</root>



i obtains this errors , i don't understand ??
normaly the value of the attribute type must be changed.



Bon
Use of uninitialized value in concatenation (.) or string at C:\Documents and Se
ttings\donny\Bureau\bigs\parsr.pl line 25.
Can't use string ("[string() = "Bon"] => &update") as a HASH ref while "strict r
efs" in use at C:/Perl/site/lib/XML/Twig.pm line 1303.


thanks
Oct 19 '06 #1
1 2179
miller
1,089 Expert 1GB
Greetings,

I used your post as an excuse to learn a little bit more about XML. I have three solutions to your problem using different CPAN modules. I do not advocate that any of my implimentations are all that efficient, nor that they take advantage of all of the features that these modules have to offer. Nevertheless, I dug through the limitted manuals, source code, or outside references, and come up with workable code using the following:

1) XML::Simple
2) XML::XPath
3) XML::Twig

I will now include my code. I've left in any debugging information or intermediate attempts in comments '##'. You'll notice that the code is separated into three sections, one for each CPAN module utilized as a solution.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. # Goal:
  4. #   From: 
  5. #     <nom name="pivot">
  6. #     <information valeur="Niveau" type="Bon"/>
  7. #     </nom>
  8. #   To:
  9. #     <nom name="pivot">
  10. #     <information valeur="Niveau" type="Mauvais"/>
  11. #     </nom>
  12.  
  13. use strict;
  14.  
  15. my $file = 'client.xml';
  16.  
  17.  
  18. ###
  19. # Use XML::Simple
  20.  
  21. my $fileSimple =  'clientSimple.xml';
  22. print "XML::Simple to $fileSimple\n\n";
  23.  
  24. use XML::Simple;
  25. use File::Slurp qw(write_file);
  26.  
  27. my $ref = XMLin($file);
  28. ##$ref->{client}[0]{nom}{pivot}{information}{type} = 'Mauvais';
  29. foreach my $client (@{$ref->{client}}) {
  30.     if (exists $client->{nom}{pivot}) {
  31.         $client->{nom}{pivot}{information}{type} = 'Mauvais';
  32.         last;
  33.     }
  34. }
  35. XMLout($ref,
  36.     OutputFile => $fileSimple,
  37. );
  38.  
  39.  
  40. ###
  41. # Use XML::XPath
  42.  
  43. my $fileXPath = 'clientXPath.xml';
  44. print "XML::XPath to $fileXPath\n\n";
  45.  
  46. use XML::XPath;
  47. use XML::XPath::XMLParser;
  48. use File::Slurp qw(write_file);
  49.  
  50. # Create an object to parse the file and field XPath queries
  51. my $xp = XML::XPath->new( filename => "client.xml" );
  52.  
  53. # Pull Nodes: q{ type="Bon"} of q{<information valeur="Niveau" type="Bon" />}
  54. my $nodeset = $xp->find("//nom[\@name='pivot']/information[\@type='Bon']/\@type");
  55.  
  56. foreach my $node ($nodeset->get_nodelist) {
  57. ##    print "FOUND\n",
  58. ##        "\n", 
  59. ##        XML::XPath::XMLParser::as_string($node),"\n",
  60. ##        ref($node),"\n",
  61. ##        "\n";
  62.  
  63.     $node->setNodeValue("Mauvais");
  64. }
  65.  
  66. ##my @nodes = $xp->findnodes("//nom[\@name='pivot']/information");
  67. ##print XML::XPath::XMLParser::as_string($nodes[0]), "\n\n";
  68.  
  69. # Output Results
  70. my ($root) = $xp->findnodes('/');
  71. write_file($fileXPath,
  72.     q{<?xml version="1.0" encoding="windows-1250"?>}, "\n", # For some reason, this line doesn't carry over.
  73.     XML::XPath::XMLParser::as_string($root)
  74. );
  75.  
  76.  
  77. ###
  78. # Use XML::Twig
  79.  
  80. my $fileTwig = 'clientTwig.xml';
  81. print "XML::Twig to $fileTwig\n\n";
  82.  
  83. use XML::Twig;
  84.  
  85. open(TWIGFILE, ">$fileTwig") or die "open >$fileTwig: $!";
  86.  
  87. my $t = new XML::Twig( twig_handlers => {
  88. ##    qq{nom[\@name="pivot"]} => \&update, # process
  89.     qq{nom/information} => \&update, # process
  90.     __default__ => sub { $_[0]->flush; }, # flush anything else
  91. }, pretty_print => 'nice');
  92. $t->parsefile( 'client.xml' );
  93. $t->print( \*TWIGFILE );
  94. ##$t->flush;
  95.  
  96. close(TWIGFLE);
  97.  
  98. ##my $i = 0;
  99. sub update
  100. {
  101.     my($t, $field_information) = @_;
  102.  
  103.     ##print "Found " .++$i . "\n";
  104.  
  105.     my $field_nom = $field_information->parent;
  106.     return unless $field_nom->att("name") eq "pivot";
  107.  
  108.     return unless $field_information->att("valeur") eq "Niveau";
  109.  
  110.     ##print "Before:\n";
  111.     ##$field_nom->print;
  112.     ##print "\n\n";
  113.  
  114.     $field_information->set_att("type" => "Mauvais");
  115.  
  116.     ##print "After:\n";
  117.     ##$field_nom->print;
  118.     ##print "\n\n";
  119. }
  120.  
  121.  
  122. 1;
  123.  
  124. __END__
  125.  
Now, I'll discuss each of the results.

The XML::Simple module is the easiest one to use in my opinion. It doesn't take any real knowledge of XML terminology or syntax. Instead it simply parses the xml file into a perl data structure which you can navigate on your own. This is the method that I personally use for any of my projects. But I honestly, don't do much with XML, hence this experiment.

You'll notice that when the modified data is outputted, it is very different from the original XML. It might be possible to add options to XMLout to format the document a little closer to the input, but I leave such an endeavor up to you:
http://search.cpan.org/search?query=XML::Simple

clientSimple.xml
Expand|Select|Wrap|Line Numbers
  1. <opt entreprise="some text" value="x">
  2.   <client name="nom">
  3.     <paul>
  4.       <information type="Bon" valeur="Niveau">xxx</information>
  5.       <information type="Mauvais" valeur="Solvable">zoooooooo</information>
  6.     </paul>
  7.     <pivot name="information" type="Mauvais" valeur="Niveau" />
  8.   </client>
  9.   <client name="albine">
  10.     <information type="Bon" valeur="Solvable">azer</information>
  11.   </client>
  12.   <client name="Terence">
  13.     <information type="Tres bon" valeur="Niveau" />
  14.     <information type="Bon" valeur="Solvable" />
  15.     <information type="Oui" valeur="Ancien" />
  16.   </client>
  17.   <info></info>
  18. </opt>
  19.  
The next cpan module was XML::XPath. I took a long time researching this one to figure out how to get it to work. It would definitely help if there was better documentation, but as I learned, most of the docs are in the form of the xpath specification, which is long and arduous. The best help was provided by the .t scripts included in the install of the module. Unfortunately, none of these script describe the best way of outputting results, so the toString method still feels a little like a hack.

Nevertheless, the above method does work, and if you understand xpath's, this appears to be a very powerful method for accessing xml documents. This method also was the closest to outputting in exactly the format that the original file was in.

clientXPath.xml
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="windows-1250"?>
  2. <root value="x">
  3. <entreprise>some text</entreprise>
  4. <info />
  5. <client>
  6. <nom name="pivot">
  7. <information valeur="Niveau" type="Mauvais" />
  8. </nom>
  9. <nom name="paul">
  10. <information valeur="Niveau" type="Bon">xxx</information>
  11. <information valeur="Solvable" type="Mauvais">zoooooooo</information>
  12. </nom>
  13. </client>
  14. <client>
  15. <nom name="albine">
  16. <information valeur="Solvable" type="Bon">azer</information>
  17. </nom>
  18. </client>
  19. <client>
  20. <nom name="Terence">
  21. <information valeur="Niveau" type="Tres bon" />
  22. <information valeur="Solvable" type="Bon" />
  23. <information valeur="Ancien" type="Oui" />
  24. </nom>
  25. </client>
  26. </root>
  27.  
Finally we come to XML::Twig. This was ultimately the method that I am both most hopeful for, and also most frustrated by. It appears that the twig_handlers's do not all the full syntax allowed by xpath. I was able to do things like: q{nom[@name="pivot"]} and q{nom/information}, but I was not able to join the two statements. Instead I was ultimately forced to pull all nom/information elements; check the parent name attribute, and only then make an update. This felt awkward after the very direct nature of the XPath query, but it works.

It is very possible that I just haven't read enough about this module yet, but I leave any further research to you.

clientTwig.xml
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="windows-1250"?>
  2. <root value="x">
  3. <entreprise>some text</entreprise>
  4. <info></info>
  5. <client>
  6. <nom name="pivot">
  7. <information type="Mauvais" valeur="Niveau"/>
  8. </nom>
  9. <nom name="paul">
  10. <information type="Bon" valeur="Niveau">xxx</information>
  11. <information type="Mauvais" valeur="Solvable">zoooooooo</information>
  12. </nom>
  13. </client>
  14. <client>
  15. <nom name="albine">
  16. <information type="Bon" valeur="Solvable">azer</information>
  17. </nom>
  18. </client>
  19. <client>
  20. <nom name="Terence">
  21. <information type="Tres bon" valeur="Niveau"/>
  22. <information type="Bon" valeur="Solvable"/>
  23. <information type="Oui" valeur="Ancien"/>
  24. </nom>
  25. </client>
  26. </root>
  27.  
That's all the xml play that I intend to do for now. Hopefully one of the solutions will strike your fancy. Enjoy
Oct 23 '06 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: silviu | last post by:
I have the following XML string that I want to parse using the SAX parser. If I remove the portion of the XML string between the <audit> and </audit> tags the SAX is parsing correctly. Otherwise...
0
by: Prakash | last post by:
Hi all, I am trying a parse a xml document containing japanese text by constructing a DOMBuilder object. The document created after parsing is empty. If the xml document does not contain...
6
by: Ulrich Vollenbruch | last post by:
Hi all! since I'am used to work with matlab for a long time and now have to work with c/c++, I have again some problems with the usage of strings, pointers and arrays. So please excuse my basic...
10
by: Bryce Calhoun | last post by:
Hello, First of all, this is a .NET 1.1 component I'm creating. SUMMARY ----------------------- This component that I'm creating is, for all intents and purposes, a document parser (I'm...
4
by: Richard | last post by:
Hi, I like a demo on layers posted at http://www.echoecho.com/csslayers.htm. It displays two text phrases in separate layers visually overlapped, first with one on top and the other beneath,...
1
by: David Hirschfield | last post by:
Anyone out there use simpleparse? If so, I have a problem that I can't seem to solve...I need to be able to parse this line: """Cen2 = Cen(OUT, "Cep", "ies", wh, 544, (wh/ht));""" with this...
1
by: Martin Pöpping | last post by:
Hello, I´ve a problem with parsing a double value from an xml file. My code looks like this: int concept_id; double rank; XmlElement root = documentXMLString.DocumentElement; XmlNodeList...
27
by: comp.lang.tcl | last post by:
My TCL proc, XML_GET_ALL_ELEMENT_ATTRS, is supposed to convert an XML file into a TCL list as follows: attr1 {val1} attr2 {val2} ... attrN {valN} This is the TCL code that does this: set...
12
by: Julian | last post by:
Hi, I am having problems with a function that I have been using in my program to read sentences from a 'command file' and parse them into commands. the surprising thing is that the program works...
7
by: souravmallik | last post by:
Hello, I'm facing a big logical problem while writing a parser in VC++ using C. I have to parse a file in a chunk of bytes in a round robin fashion. Means, when I select a file, the parser...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.