By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,847 Members | 1,375 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,847 IT Pros & Developers. It's quick & easy.

dereferencing complex XML data structure

P: 1
Howdy.

I've been trying to parse the return of XML::TreePP from an xBRL file using perl and Data::Dumper.

The resulting Data::Dumper output looks like this:
Expand|Select|Wrap|Line Numbers
  1. $VAR1 = {
  2.           'xbrl' => {
  3.                       'dow:ComprehensiveIncomeloss' => [
  4.                                                          {
  5.                                                            '-decimals' => 'INF',
  6.                                                            '-contextRef' => 'YTD3Q06_unaudited',
  7.                                                            '#text' => '2960000000',
  8.                                                            '-unitRef' => 'USD'
  9.                                                          },
  10.                                                          {
  11.                                                            '-decimals' => 'INF',
  12.                                                            '-contextRef' => 'Q3Sept302006_unaudited',
  13.                                                            '#text' => '426000000',
  14.                                                            '-unitRef' => 'USD'
  15.                                                          },
  16.                                                          {
  17.                                                            '-decimals' => 'INF',
  18.                                                            '-contextRef' => 'YTD3Q07_unaudited',
  19.                                                            '#text' => '3090000000',
  20.                                                            '-unitRef' => 'USD'
  21.                                                          },
  22.                                                          {
  23.                                                            '-decimals' => 'INF',
  24.                                                            '-contextRef' => 'Q3Sept302007_unaudited',
  25.                                                            '#text' => '829000000',
  26.                                                            '-unitRef' => 'USD'
  27.                                                          }
  28.                                                        ],
  29.                       'dow:DistributionsFromNonconsolidatedAffiliates' => [
  30.                                                                             {
  31.                                                                               '-decimals' => 'INF',
  32.                                                                               '-contextRef' => 'YTD3Q06_unaudited',
  33.                                                                               '#text' => '4000000',
  34.                                                                               '-unitRef' => 'USD'
  35.                                                                             },



My code looks like:
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use Data::Dumper;
  4. use XML::TreePP;
  5.  
  6. $file=shift @ARGV;
  7.  
  8. my $tpp = XML::TreePP->new();
  9. $tpp->set( force_hash => [ '*' ] );
  10. $tree = $tpp->parsefile( $file );
  11. print Dumper($tree);
  12.  
  13. $root=each %{$tree}; print "root->$root\n";
  14. %hash= %{@ $tree{$root} };
  15.  
  16. foreach $key1 (sort keys %hash ) {
  17.         print "key1->$key1\n";
  18. }
And this prints the root (containing 'xbrl'), and the foreach loop give me the hash keys of the next level ('dow:ComprehensiveIncomeloss' and ''dow:DistributionsFromNonconsolidatedAffiliates' in the example above), but I can't seem to find a way to dereference the subsequent portions of the tree properly to access these sections:
Expand|Select|Wrap|Line Numbers
  1.  [
  2.                                                          {
  3.                                                            '-decimals' => 'INF',
  4.                                                            '-contextRef' => 'YTD3Q06_unaudited',
  5.                                                            '#text' => '2960000000',
  6.                                                            '-unitRef' => 'USD'
  7.                                                          },
  8.                                                          {
  9.                                                            '-decimals' => 'INF',
  10.                                                            '-contextRef' => 'Q3Sept302006_unaudited',
  11.                                                            '#text' => '426000000',
  12.                                                            '-unitRef' => 'USD'
  13.                                                          },
  14.                                                          {
  15.                                                            '-decimals' => 'INF',
  16.                                                            '-contextRef' => 'YTD3Q07_unaudited',
  17.                                                            '#text' => '3090000000',
  18.                                                            '-unitRef' => 'USD'
  19.                                                          },
  20.                                                          {
  21.                                                            '-decimals' => 'INF',
  22.                                                            '-contextRef' => 'Q3Sept302007_unaudited',
  23.                                                            '#text' => '829000000',
  24.                                                            '-unitRef' => 'USD'
  25.                                                          }
  26.                                                        ]
By fooling around, I can get the hash keys (-decimals, -contextRef, etc) from one of the hashes in the list of hashes, but I can't get everything.

What am I missing? This looks like the root of a tree ('xbrl') with a hash array (indexed by 'dow:ComprehensiveIncomeloss', etc.) of arrays of hashes. Am I correct in decoding the structure this way?

And then, how do I de-reference the thing 'dow:ComprehensiveIncomeloss' points to to walk the tree down into the list of hashes with key/value pairs of '-decimals' => 'INF', -contextRef' => 'YTD3Q06_unaudited', etc., per:
Expand|Select|Wrap|Line Numbers
  1. 'dow:ComprehensiveIncomeloss' => [
  2.                                                          {
  3.                                                            '-decimals' => 'INF',
  4.                                                            '-contextRef' => 'YTD3Q06_unaudited',
  5.                                                            '#text' => '2960000000',
  6.                                                            '-unitRef' => 'USD'
  7.                                                          },

I've been playing with this for more than a day (which makes me think I'm missing something basic in the structure) without success.

Any hints?

Thanks.


Richard
Jan 3 '08 #1
Share this Question
Share on Google+
1 Reply


rickumali
P: 20
. . . how do I de-reference the thing 'dow:ComprehensiveIncomeloss' points to to walk the tree down into the list of hashes with key/value pairs of '-decimals' => 'INF', -contextRef' => 'YTD3Q06_unaudited' . . .
After a very (very) quick glance, the data structure looks like has hash of arrays. And the arrays are anonymous. Look at perldsc, and grok the "Access and Printing of a HASH OF ARRAYS" examples.
Jan 4 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.