By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,837 Members | 1,842 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,837 IT Pros & Developers. It's quick & easy.

Need help in parsing the special characters using XML::Parser

P: 55
Dear All,

I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Expat. This data consists of some special characters like "Ý, Š, Ū, ť, », ě, ý, ż".
But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

Could anyone please help me out in solving this one.

Thanks alot.
Oct 24 '07 #1
Share this Question
Share on Google+
6 Replies


numberwhun
Expert Mod 2.5K+
P: 3,503
Dear All,

I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Expat. This data consists of some special characters like "Ý, Š, Ū, ť, », ě, ý, ż".
But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

Could anyone please help me out in solving this one.

Thanks alot.

Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

Regards,

Jeff
Oct 24 '07 #2

P: 55
Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

Regards,

Jeff
Here I am posing my code:

Expand|Select|Wrap|Line Numbers
  1. sub parse {
  2.   my $self = shift;
  3.   my $arg  = shift;
  4.   my @expat_options = ();
  5.   my ($key, $val);
  6.   while (($key, $val) = each %{$self}) {
  7.     push(@expat_options, $key, $val)
  8.       unless exists $self->{Non_Expat_Options}->{$key};
  9.   }
  10.  
  11.   my $expat = new XML::Parser::Expat(@expat_options, @_);
  12.   my %handlers = %{$self->{Handlers}};
  13.   my $init = delete $handlers{Init};
  14.   my $final = delete $handlers{Final};
  15.  
  16.   $expat->setHandlers(%handlers);
  17.  
  18.   if ($self->{Base}) {
  19.     $expat->base($self->{Base});
  20.   }
  21.  
  22.   &$init($expat)
  23.     if defined($init);
  24.  
  25.   my @result = ();
  26.   my $result;
  27.   eval {
  28.     $result = $expat->parse($arg);
  29.   };
  30.   my $err = $@;
  31.   if ($err) {
  32.     $expat->release;
  33.     die $err;
  34.   }
  35.  
  36.   if ($result and defined($final)) {
  37.     if (wantarray) {
  38.       @result = &$final($expat);
  39.     }
  40.     else {
  41.       $result = &$final($expat);
  42.     }
  43.   }
  44.  
  45.   $expat->release;
  46.  
  47.   return unless defined wantarray;
  48.   return wantarray ? @result : $result;
  49. }
  50.  
where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

The xml data will look like this :

Expand|Select|Wrap|Line Numbers
  1. <record>
  2.   <source-app >ABC</source-app>
  3.   <ref-type>6</ref-type>
  4.   <contributors>
  5.     <authors>
  6.       <author>
  7.         <style face="normal" font="default" size="100%">Dvo√ł√°k, Petr</style>
  8.       </author>
  9.     </authors>
  10.   </contributors>
  11.   <titles>
  12.     <title>
  13.       <style face="normal" font="default" size="100%">Systematick√° teologie I : √ł√*mskokatolick√° perspektiva</style>
  14.     </title>
  15.   </titles>
  16.   <pages>
  17.     <style>285 s.</style>
  18.   </pages>
  19.   <edition>
  20.     <style>1. vyd.</style>
  21.   </edition>
  22.   <keywords>
  23.     <keyword>
  24.       <style>u√®en√* katolick√© c√*rkve</style>
  25.     </keyword>
  26.   </keywords>
  27.   <dates>
  28.     <year>
  29.       <style>1996</style>
  30.     </year>
  31.   </dates>
  32.   <pub-location>
  33.     <style>Brno&#xD;Praha</style>
  34.   </pub-location>
  35.   <publisher>
  36.     <style>Centrum pro studium demokracie a kultury ;&#xD;√ąesk√° k√łes¬Ěansk√° akademie</style>
  37.   </publisher>
  38.   <notes>
  39.     <style>uspo√ł√°dali Francis S. Fiorenza a John P. Galvin ; [z angli√®tiny p√łeloŇĺili Petr Dvo√ł√°k ... et al.]&#xD;20 cm&#xD;Pozn.&#xD;Pozn. o autorech trakt√°t√Ļ&#xD;Zkratky&#xD;Bibliogr.&#xD;Odkazy na lit.&#xD;Jmenn√Ĺ a v√¨cn√Ĺ rejst√ł√*k</style>
  40.   </notes>
  41. </record>
Please have a look at it and help me.
Oct 25 '07 #3

P: 55
Here I am posing my code:

Expand|Select|Wrap|Line Numbers
  1. sub parse {
  2.   my $self = shift;
  3.   my $arg  = shift;
  4.   my @expat_options = ();
  5.   my ($key, $val);
  6.   while (($key, $val) = each %{$self}) {
  7.     push(@expat_options, $key, $val)
  8.       unless exists $self->{Non_Expat_Options}->{$key};
  9.   }
  10.  
  11.   my $expat = new XML::Parser::Expat(@expat_options, @_);
  12.   my %handlers = %{$self->{Handlers}};
  13.   my $init = delete $handlers{Init};
  14.   my $final = delete $handlers{Final};
  15.  
  16.   $expat->setHandlers(%handlers);
  17.  
  18.   if ($self->{Base}) {
  19.     $expat->base($self->{Base});
  20.   }
  21.  
  22.   &$init($expat)
  23.     if defined($init);
  24.  
  25.   my @result = ();
  26.   my $result;
  27.   eval {
  28.     $result = $expat->parse($arg);
  29.   };
  30.   my $err = $@;
  31.   if ($err) {
  32.     $expat->release;
  33.     die $err;
  34.   }
  35.  
  36.   if ($result and defined($final)) {
  37.     if (wantarray) {
  38.       @result = &$final($expat);
  39.     }
  40.     else {
  41.       $result = &$final($expat);
  42.     }
  43.   }
  44.  
  45.   $expat->release;
  46.  
  47.   return unless defined wantarray;
  48.   return wantarray ? @result : $result;
  49. }
  50.  
where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

The xml data will look like this :

Expand|Select|Wrap|Line Numbers
  1. <record>
  2.   <source-app >ABC</source-app>
  3.   <ref-type>6</ref-type>
  4.   <contributors>
  5.     <authors>
  6.       <author>
  7.         <style face="normal" font="default" size="100%">Dvo√ł√°k, Petr</style>
  8.       </author>
  9.     </authors>
  10.   </contributors>
  11.   <titles>
  12.     <title>
  13.       <style face="normal" font="default" size="100%">Systematick√° teologie I : √ł√*mskokatolick√° perspektiva</style>
  14.     </title>
  15.   </titles>
  16.   <pages>
  17.     <style>285 s.</style>
  18.   </pages>
  19.   <edition>
  20.     <style>1. vyd.</style>
  21.   </edition>
  22.   <keywords>
  23.     <keyword>
  24.       <style>u√®en√* katolick√© c√*rkve</style>
  25.     </keyword>
  26.   </keywords>
  27.   <dates>
  28.     <year>
  29.       <style>1996</style>
  30.     </year>
  31.   </dates>
  32.   <pub-location>
  33.     <style>Brno&#xD;Praha</style>
  34.   </pub-location>
  35.   <publisher>
  36.     <style>Centrum pro studium demokracie a kultury ;&#xD;√ąesk√° k√łes¬Ěansk√° akademie</style>
  37.   </publisher>
  38.   <notes>
  39.     <style>uspo√ł√°dali Francis S. Fiorenza a John P. Galvin ; [z angli√®tiny p√łeloŇĺili Petr Dvo√ł√°k ... et al.]&#xD;20 cm&#xD;Pozn.&#xD;Pozn. o autorech trakt√°t√Ļ&#xD;Zkratky&#xD;Bibliogr.&#xD;Odkazy na lit.&#xD;Jmenn√Ĺ a v√¨cn√Ĺ rejst√ł√*k</style>
  40.   </notes>
  41. </record>
Please have a look at it and help me.

Can anybody help me out on this
Oct 30 '07 #4

eWish
Expert 100+
P: 971
What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.
Oct 30 '07 #5

P: 55
What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.
The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.
Oct 31 '07 #6

P: 55
The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.
Could anybody help me out on this issue please?
Nov 15 '07 #7

Post your reply

Sign in to post your reply or Sign up for a free account.