469,307 Members | 1,527 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,307 developers. It's quick & easy.

xml::dom and character merging

1
Hiya,

I have a simple perl script that reads in an XML file (in utf8) using XML::DOM::Parser and spits it out again as follows:

my $inFile = $ARGV[0];
-f $inFile or die "$0:
the input file $inFile could not be opened.\n";
my $writeOutFile = '>' . $ARGV[1];
open(OUT, $writeOutFile) or die "$0: the output file $writeOutFile could not be created.\n";
binmode OUT, ":utf8";
my $parser = XML::DOM::Parser->new();
my $inDoc = $parser->parsefile($inFile);
print OUT $inDoc->toString;
$inDoc->dispose;

The input file looks like:

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="highlight.xsl"?>
<PAPER> <METADATA><FILENO>W06-3117</FILENO> <APPEARED> <CONFERENCE>Workshop</CONFERENCE> <YEAR>2006</YEAR> </APPEARED> </METADATA>
<BODY> ...blah... </BODY>
</PAPER>

My problem is that, while the input file is displayed correctly, the output is not: some of the characters seem to have been merged and are displayed differently e.g. `aacute' followed by `n' becomes a box glyph containing 4 characters (in Firefox) and \u386e (in emacs). Can anyone explain what's happening here? Is there something I'm neglecting to do when I create my parser, perhaps, to prevent this happening? As far as I know, my files are all correctly encoded in utf8.
Thanks,
Anna
Feb 9 '07 #1
1 1042
dorinbogdan
839 Expert 512MB
Hi,
Did you succeed to solve the problem ?
If yes, please let me know, in order to close the thread.
Thanks,
Dorin.
Mar 21 '07 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

4 posts views Thread by Carlo Sette | last post: by
4 posts views Thread by Skip Montanaro | last post: by
5 posts views Thread by Mike McGavin | last post: by
1 post views Thread by Greg Wogan-Browne | last post: by
3 posts views Thread by Sanjib Biswas | last post: by
9 posts views Thread by Lie | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.