Connecting Tech Pros Worldwide Help | Site Map

problem with parsing xml

Newbie
 
Join Date: Oct 2006
Posts: 1
#1: Oct 19 '06
HEllo,

I've found a script (at this url [url]http://www.thescripts.com/forum/thread84554.html [/ur]) that is like the things i want to do;
i want to parse my xml file and modify the value of an attribute;
for example modify this
Quote:
<nom name="pivot">
<information valeur="Niveau" type="Bon"/>
</nom>
in that
Quote:
<nom name="pivot">
<information valeur="Niveau" type="Mauvais"/>
</nom>

So, i've modify the script

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -w
  2.  
  3. use strict;
  4. use XML::XPath;
  5. use XML::XPath::XMLParser;
  6. use XML::Twig;
  7.  
  8. # create an object to parse the file and field XPath queries
  9. # my $xpath = XML::XPath->new( filename => shift @ARGV );
  10. my $xpath = XML::XPath->new( filename => "client.xml" );
  11.  
  12. # apply the path from the command line and get back a list matches
  13. my $field;
  14. my @field = 'string';
  15.  
  16.  
  17. my $old_value = $xpath->find("//nom[\@name='pivot']/information/\@type" );
  18. #find("//nom[\@name='pivot']/information[\@type]/text()" );
  19.  
  20. print $old_value."\n";
  21.  
  22.  
  23. #qq{$field\[string() = "$old_value"]}
  24. my $new_value = 'Tres BOB';
  25. my $t = new XML::Twig( TwigRoots =>
  26. qq{$field\[string() = "$old_value"] => \&update} ,
  27. TwigPrintOutsideRoots => 1,);
  28. $t->parsefile( 'client2.xml' );
  29. $t->flush;
  30.  
  31. sub update
  32. {
  33. my( $t, $field_elt)= @_;
  34. $field_elt->set_text( $new_value);
  35. $field_elt->print;
  36. }



my xml file


Quote:
<?xml version="1.0" encoding="windows-1250"?>
<root value="x">
<entreprise>some text</entreprise>
<info></info>
<client>
<nom name="pivot">
<information valeur="Niveau" type="Bon"/>
</nom>

<nom name="paul">
<information valeur="Niveau" type="Bon">xxx</information>
<information valeur="Solvable" type="Mauvais">zoooooooo</information>
</nom>
</client>
<client>
<nom name="albine">
<information valeur="Solvable" type="Bon">azer</information>
</nom>
</client>
<client>
<nom name="Terence">
<information valeur="Niveau" type="Tres bon"/>
<information valeur="Solvable" type="Bon"/>
<information valeur="Ancien" type="Oui"/>
</nom>
</client>
</root>



i obtains this errors , i don't understand ??
normaly the value of the attribute type must be changed.



Quote:
Bon
Use of uninitialized value in concatenation (.) or string at C:\Documents and Se
ttings\donny\Bureau\bigs\parsr.pl line 25.
Can't use string ("[string() = "Bon"] => &update") as a HASH ref while "strict r
efs" in use at C:/Perl/site/lib/XML/Twig.pm line 1303.


thanks
miller's Avatar
Moderator
 
Join Date: Oct 2006
Location: San Francisco, CA
Posts: 830
#2: Oct 23 '06

re: problem with parsing xml


Greetings,

I used your post as an excuse to learn a little bit more about XML. I have three solutions to your problem using different CPAN modules. I do not advocate that any of my implimentations are all that efficient, nor that they take advantage of all of the features that these modules have to offer. Nevertheless, I dug through the limitted manuals, source code, or outside references, and come up with workable code using the following:

1) XML::Simple
2) XML::XPath
3) XML::Twig

I will now include my code. I've left in any debugging information or intermediate attempts in comments '##'. You'll notice that the code is separated into three sections, one for each CPAN module utilized as a solution.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. # Goal:
  4. #   From: 
  5. #     <nom name="pivot">
  6. #     <information valeur="Niveau" type="Bon"/>
  7. #     </nom>
  8. #   To:
  9. #     <nom name="pivot">
  10. #     <information valeur="Niveau" type="Mauvais"/>
  11. #     </nom>
  12.  
  13. use strict;
  14.  
  15. my $file = 'client.xml';
  16.  
  17.  
  18. ###
  19. # Use XML::Simple
  20.  
  21. my $fileSimple =  'clientSimple.xml';
  22. print "XML::Simple to $fileSimple\n\n";
  23.  
  24. use XML::Simple;
  25. use File::Slurp qw(write_file);
  26.  
  27. my $ref = XMLin($file);
  28. ##$ref->{client}[0]{nom}{pivot}{information}{type} = 'Mauvais';
  29. foreach my $client (@{$ref->{client}}) {
  30.     if (exists $client->{nom}{pivot}) {
  31.         $client->{nom}{pivot}{information}{type} = 'Mauvais';
  32.         last;
  33.     }
  34. }
  35. XMLout($ref,
  36.     OutputFile => $fileSimple,
  37. );
  38.  
  39.  
  40. ###
  41. # Use XML::XPath
  42.  
  43. my $fileXPath = 'clientXPath.xml';
  44. print "XML::XPath to $fileXPath\n\n";
  45.  
  46. use XML::XPath;
  47. use XML::XPath::XMLParser;
  48. use File::Slurp qw(write_file);
  49.  
  50. # Create an object to parse the file and field XPath queries
  51. my $xp = XML::XPath->new( filename => "client.xml" );
  52.  
  53. # Pull Nodes: q{ type="Bon"} of q{<information valeur="Niveau" type="Bon" />}
  54. my $nodeset = $xp->find("//nom[\@name='pivot']/information[\@type='Bon']/\@type");
  55.  
  56. foreach my $node ($nodeset->get_nodelist) {
  57. ##    print "FOUND\n",
  58. ##        "\n", 
  59. ##        XML::XPath::XMLParser::as_string($node),"\n",
  60. ##        ref($node),"\n",
  61. ##        "\n";
  62.  
  63.     $node->setNodeValue("Mauvais");
  64. }
  65.  
  66. ##my @nodes = $xp->findnodes("//nom[\@name='pivot']/information");
  67. ##print XML::XPath::XMLParser::as_string($nodes[0]), "\n\n";
  68.  
  69. # Output Results
  70. my ($root) = $xp->findnodes('/');
  71. write_file($fileXPath,
  72.     q{<?xml version="1.0" encoding="windows-1250"?>}, "\n", # For some reason, this line doesn't carry over.
  73.     XML::XPath::XMLParser::as_string($root)
  74. );
  75.  
  76.  
  77. ###
  78. # Use XML::Twig
  79.  
  80. my $fileTwig = 'clientTwig.xml';
  81. print "XML::Twig to $fileTwig\n\n";
  82.  
  83. use XML::Twig;
  84.  
  85. open(TWIGFILE, ">$fileTwig") or die "open >$fileTwig: $!";
  86.  
  87. my $t = new XML::Twig( twig_handlers => {
  88. ##    qq{nom[\@name="pivot"]} => \&update, # process
  89.     qq{nom/information} => \&update, # process
  90.     __default__ => sub { $_[0]->flush; }, # flush anything else
  91. }, pretty_print => 'nice');
  92. $t->parsefile( 'client.xml' );
  93. $t->print( \*TWIGFILE );
  94. ##$t->flush;
  95.  
  96. close(TWIGFLE);
  97.  
  98. ##my $i = 0;
  99. sub update
  100. {
  101.     my($t, $field_information) = @_;
  102.  
  103.     ##print "Found " .++$i . "\n";
  104.  
  105.     my $field_nom = $field_information->parent;
  106.     return unless $field_nom->att("name") eq "pivot";
  107.  
  108.     return unless $field_information->att("valeur") eq "Niveau";
  109.  
  110.     ##print "Before:\n";
  111.     ##$field_nom->print;
  112.     ##print "\n\n";
  113.  
  114.     $field_information->set_att("type" => "Mauvais");
  115.  
  116.     ##print "After:\n";
  117.     ##$field_nom->print;
  118.     ##print "\n\n";
  119. }
  120.  
  121.  
  122. 1;
  123.  
  124. __END__
  125.  
Now, I'll discuss each of the results.

The XML::Simple module is the easiest one to use in my opinion. It doesn't take any real knowledge of XML terminology or syntax. Instead it simply parses the xml file into a perl data structure which you can navigate on your own. This is the method that I personally use for any of my projects. But I honestly, don't do much with XML, hence this experiment.

You'll notice that when the modified data is outputted, it is very different from the original XML. It might be possible to add options to XMLout to format the document a little closer to the input, but I leave such an endeavor up to you:
http://search.cpan.org/search?query=XML::Simple

clientSimple.xml
Expand|Select|Wrap|Line Numbers
  1. <opt entreprise="some text" value="x">
  2.   <client name="nom">
  3.     <paul>
  4.       <information type="Bon" valeur="Niveau">xxx</information>
  5.       <information type="Mauvais" valeur="Solvable">zoooooooo</information>
  6.     </paul>
  7.     <pivot name="information" type="Mauvais" valeur="Niveau" />
  8.   </client>
  9.   <client name="albine">
  10.     <information type="Bon" valeur="Solvable">azer</information>
  11.   </client>
  12.   <client name="Terence">
  13.     <information type="Tres bon" valeur="Niveau" />
  14.     <information type="Bon" valeur="Solvable" />
  15.     <information type="Oui" valeur="Ancien" />
  16.   </client>
  17.   <info></info>
  18. </opt>
  19.  
The next cpan module was XML::XPath. I took a long time researching this one to figure out how to get it to work. It would definitely help if there was better documentation, but as I learned, most of the docs are in the form of the xpath specification, which is long and arduous. The best help was provided by the .t scripts included in the install of the module. Unfortunately, none of these script describe the best way of outputting results, so the toString method still feels a little like a hack.

Nevertheless, the above method does work, and if you understand xpath's, this appears to be a very powerful method for accessing xml documents. This method also was the closest to outputting in exactly the format that the original file was in.

clientXPath.xml
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="windows-1250"?>
  2. <root value="x">
  3. <entreprise>some text</entreprise>
  4. <info />
  5. <client>
  6. <nom name="pivot">
  7. <information valeur="Niveau" type="Mauvais" />
  8. </nom>
  9. <nom name="paul">
  10. <information valeur="Niveau" type="Bon">xxx</information>
  11. <information valeur="Solvable" type="Mauvais">zoooooooo</information>
  12. </nom>
  13. </client>
  14. <client>
  15. <nom name="albine">
  16. <information valeur="Solvable" type="Bon">azer</information>
  17. </nom>
  18. </client>
  19. <client>
  20. <nom name="Terence">
  21. <information valeur="Niveau" type="Tres bon" />
  22. <information valeur="Solvable" type="Bon" />
  23. <information valeur="Ancien" type="Oui" />
  24. </nom>
  25. </client>
  26. </root>
  27.  
Finally we come to XML::Twig. This was ultimately the method that I am both most hopeful for, and also most frustrated by. It appears that the twig_handlers's do not all the full syntax allowed by xpath. I was able to do things like: q{nom[@name="pivot"]} and q{nom/information}, but I was not able to join the two statements. Instead I was ultimately forced to pull all nom/information elements; check the parent name attribute, and only then make an update. This felt awkward after the very direct nature of the XPath query, but it works.

It is very possible that I just haven't read enough about this module yet, but I leave any further research to you.

clientTwig.xml
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="windows-1250"?>
  2. <root value="x">
  3. <entreprise>some text</entreprise>
  4. <info></info>
  5. <client>
  6. <nom name="pivot">
  7. <information type="Mauvais" valeur="Niveau"/>
  8. </nom>
  9. <nom name="paul">
  10. <information type="Bon" valeur="Niveau">xxx</information>
  11. <information type="Mauvais" valeur="Solvable">zoooooooo</information>
  12. </nom>
  13. </client>
  14. <client>
  15. <nom name="albine">
  16. <information type="Bon" valeur="Solvable">azer</information>
  17. </nom>
  18. </client>
  19. <client>
  20. <nom name="Terence">
  21. <information type="Tres bon" valeur="Niveau"/>
  22. <information type="Bon" valeur="Solvable"/>
  23. <information type="Oui" valeur="Ancien"/>
  24. </nom>
  25. </client>
  26. </root>
  27.  
That's all the xml play that I intend to do for now. Hopefully one of the solutions will strike your fancy. Enjoy
Reply