473,382 Members | 1,375 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Need help in parsing the special characters using XML::Parser

Dear All,

I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Expat. This data consists of some special characters like ", , , , , , , ".
But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

Could anyone please help me out in solving this one.

Thanks alot.
Oct 24 '07 #1
6 4405
numberwhun
3,509 Expert Mod 2GB
Dear All,

I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Expat. This data consists of some special characters like ", , , , , , , ".
But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

Could anyone please help me out in solving this one.

Thanks alot.

Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

Regards,

Jeff
Oct 24 '07 #2
Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

Regards,

Jeff
Here I am posing my code:

Expand|Select|Wrap|Line Numbers
  1. sub parse {
  2.   my $self = shift;
  3.   my $arg  = shift;
  4.   my @expat_options = ();
  5.   my ($key, $val);
  6.   while (($key, $val) = each %{$self}) {
  7.     push(@expat_options, $key, $val)
  8.       unless exists $self->{Non_Expat_Options}->{$key};
  9.   }
  10.  
  11.   my $expat = new XML::Parser::Expat(@expat_options, @_);
  12.   my %handlers = %{$self->{Handlers}};
  13.   my $init = delete $handlers{Init};
  14.   my $final = delete $handlers{Final};
  15.  
  16.   $expat->setHandlers(%handlers);
  17.  
  18.   if ($self->{Base}) {
  19.     $expat->base($self->{Base});
  20.   }
  21.  
  22.   &$init($expat)
  23.     if defined($init);
  24.  
  25.   my @result = ();
  26.   my $result;
  27.   eval {
  28.     $result = $expat->parse($arg);
  29.   };
  30.   my $err = $@;
  31.   if ($err) {
  32.     $expat->release;
  33.     die $err;
  34.   }
  35.  
  36.   if ($result and defined($final)) {
  37.     if (wantarray) {
  38.       @result = &$final($expat);
  39.     }
  40.     else {
  41.       $result = &$final($expat);
  42.     }
  43.   }
  44.  
  45.   $expat->release;
  46.  
  47.   return unless defined wantarray;
  48.   return wantarray ? @result : $result;
  49. }
  50.  
where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

The xml data will look like this :

Expand|Select|Wrap|Line Numbers
  1. <record>
  2.   <source-app >ABC</source-app>
  3.   <ref-type>6</ref-type>
  4.   <contributors>
  5.     <authors>
  6.       <author>
  7.         <style face="normal" font="default" size="100%">Dvoøák, Petr</style>
  8.       </author>
  9.     </authors>
  10.   </contributors>
  11.   <titles>
  12.     <title>
  13.       <style face="normal" font="default" size="100%">Systematická teologie I : ø*mskokatolická perspektiva</style>
  14.     </title>
  15.   </titles>
  16.   <pages>
  17.     <style>285 s.</style>
  18.   </pages>
  19.   <edition>
  20.     <style>1. vyd.</style>
  21.   </edition>
  22.   <keywords>
  23.     <keyword>
  24.       <style>uèen* katolické c*rkve</style>
  25.     </keyword>
  26.   </keywords>
  27.   <dates>
  28.     <year>
  29.       <style>1996</style>
  30.     </year>
  31.   </dates>
  32.   <pub-location>
  33.     <style>Brno&#xD;Praha</style>
  34.   </pub-location>
  35.   <publisher>
  36.     <style>Centrum pro studium demokracie a kultury ;&#xD;Èeská køesanská akademie</style>
  37.   </publisher>
  38.   <notes>
  39.     <style>uspoøádali Francis S. Fiorenza a John P. Galvin ; [z angliètiny pøeložili Petr Dvoøák ... et al.]&#xD;20 cm&#xD;Pozn.&#xD;Pozn. o autorech traktátù&#xD;Zkratky&#xD;Bibliogr.&#xD;Odkazy na lit.&#xD;Jmenný a vìcný rejstø*k</style>
  40.   </notes>
  41. </record>
Please have a look at it and help me.
Oct 25 '07 #3
Here I am posing my code:

Expand|Select|Wrap|Line Numbers
  1. sub parse {
  2.   my $self = shift;
  3.   my $arg  = shift;
  4.   my @expat_options = ();
  5.   my ($key, $val);
  6.   while (($key, $val) = each %{$self}) {
  7.     push(@expat_options, $key, $val)
  8.       unless exists $self->{Non_Expat_Options}->{$key};
  9.   }
  10.  
  11.   my $expat = new XML::Parser::Expat(@expat_options, @_);
  12.   my %handlers = %{$self->{Handlers}};
  13.   my $init = delete $handlers{Init};
  14.   my $final = delete $handlers{Final};
  15.  
  16.   $expat->setHandlers(%handlers);
  17.  
  18.   if ($self->{Base}) {
  19.     $expat->base($self->{Base});
  20.   }
  21.  
  22.   &$init($expat)
  23.     if defined($init);
  24.  
  25.   my @result = ();
  26.   my $result;
  27.   eval {
  28.     $result = $expat->parse($arg);
  29.   };
  30.   my $err = $@;
  31.   if ($err) {
  32.     $expat->release;
  33.     die $err;
  34.   }
  35.  
  36.   if ($result and defined($final)) {
  37.     if (wantarray) {
  38.       @result = &$final($expat);
  39.     }
  40.     else {
  41.       $result = &$final($expat);
  42.     }
  43.   }
  44.  
  45.   $expat->release;
  46.  
  47.   return unless defined wantarray;
  48.   return wantarray ? @result : $result;
  49. }
  50.  
where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

The xml data will look like this :

Expand|Select|Wrap|Line Numbers
  1. <record>
  2.   <source-app >ABC</source-app>
  3.   <ref-type>6</ref-type>
  4.   <contributors>
  5.     <authors>
  6.       <author>
  7.         <style face="normal" font="default" size="100%">Dvoøák, Petr</style>
  8.       </author>
  9.     </authors>
  10.   </contributors>
  11.   <titles>
  12.     <title>
  13.       <style face="normal" font="default" size="100%">Systematická teologie I : ø*mskokatolická perspektiva</style>
  14.     </title>
  15.   </titles>
  16.   <pages>
  17.     <style>285 s.</style>
  18.   </pages>
  19.   <edition>
  20.     <style>1. vyd.</style>
  21.   </edition>
  22.   <keywords>
  23.     <keyword>
  24.       <style>uèen* katolické c*rkve</style>
  25.     </keyword>
  26.   </keywords>
  27.   <dates>
  28.     <year>
  29.       <style>1996</style>
  30.     </year>
  31.   </dates>
  32.   <pub-location>
  33.     <style>Brno&#xD;Praha</style>
  34.   </pub-location>
  35.   <publisher>
  36.     <style>Centrum pro studium demokracie a kultury ;&#xD;Èeská køesanská akademie</style>
  37.   </publisher>
  38.   <notes>
  39.     <style>uspoøádali Francis S. Fiorenza a John P. Galvin ; [z angliètiny pøeložili Petr Dvoøák ... et al.]&#xD;20 cm&#xD;Pozn.&#xD;Pozn. o autorech traktátù&#xD;Zkratky&#xD;Bibliogr.&#xD;Odkazy na lit.&#xD;Jmenný a vìcný rejstø*k</style>
  40.   </notes>
  41. </record>
Please have a look at it and help me.

Can anybody help me out on this
Oct 30 '07 #4
eWish
971 Expert 512MB
What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.
Oct 30 '07 #5
What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.
The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.
Oct 31 '07 #6
The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.
Could anybody help me out on this issue please?
Nov 15 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Rutger Claes | last post by:
I'm having troubles getting the euro sign through an XML parser. With the following test code: <?php $string = "<root><test>€</test></root>"; $parser = xml_parser_create();...
6
by: Don HO | last post by:
Hi, I'm developing a project in C++ under MS Windows (without MFC). I want to use an xml file as the configuration file of the program. The problem is : after downloading xerces, I realized...
2
by: Magnus Heino | last post by:
Hi. Are there any patterns or other design techniques that could be used when implementing a xml parser that needs to be able to handle different versions of a schema? Let's say that I write...
16
by: Mike | last post by:
Does anyone know of a minimal/mini/tiny/small xml parser in c? I'm looking for something small that accepts a stream or string, builds a c structure, and then returns an opaque pointer to that...
8
by: Elmar Brandt | last post by:
Hello, we are looking for a fast XML parser. The XML-files are very big (>2GB) and we want to convert them into other formats via XSLT. Has anyone an idea? With best regards Elmar Brandt
3
by: Harry | last post by:
Hello, I want to know if I can modify a xml-file with a xml-parser or can I only read the information out of it. If it is possible, I searching for xml-Parser for c++ (win32). I tried with...
6
by: mahesh.kanakaraj | last post by:
Hi Folks, This is my first post to this group, and I really am not sure whether this is the right group to ask my question. If its not an appropriate question to this group, please correct me...
4
by: ChillyRoll | last post by:
Hello guys, I am looking for a parser in PHP that can return all the attributes of XML entities. I know how to read the XML Entities, but I have got a problem with reading attributes. So I will...
3
by: Eric Kaplan | last post by:
anyone used the PUG XML parser? http://trac.zeitherrschaft.org/zzub/browser/trunk/src/plugins/lunar/pugxml.h?rev=1561 Is it an easy XML parser to use? any tutorial / sample code on how to...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.