Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 23rd, 2006, 12:55 PM
kellner
Guest
 
Posts: n/a
Default perl, XML::LibXML: encoding problems while changing attributes on an XML string

Hello,

I'm parsing a chunk of XML code and would like to add attribute values
to individual tags if these are lacking. This is with perl 5.8.6,
libxml2 2.6.17, XML::LibXML 1.58.

Basically, I have the parser add the attribute values to the respective
nodes and then use the toString method of XML::LibXML::Document to
write the modified text to a scalar. Both the original and the modified
text evaluate properly as utf8, but the modified text doesn't print
properly on the console, nor does it get entered as utf8 into a MySQL
database.

I don't really understand what's going on, and on what level the
error(s) could be located (console encoding, perl encoding, XML
encoding), and would appreciate any help I can get ...

Here's the code:
------------------------------------------------

#!/usr/bin/perl

use strict;
use XML::LibXML;
use Encode 'decode_utf8';
use vars qw ($parser $p);
$parser = XML::LibXML->new();
my $version = XML::LibXML::LIBXML_DOTTED_VERSION;
print "libxml2 $version\n-------------\nXML::LibXML
$XML::LibXML::VERSION\n-------------------\n";


$p->{text} = qq|
<p>
<q who="Blabla">pramāṇavārttikasvavṛtti*īkā </qAnd this is
some further text.<br/>And even more text.<br/>And more.
<q who="Blabla2">The second quotation!</q>.
pramāṇavārttikasvavṛtti*īkā.
</p>|;

my $a = &validate_text($p->{text});
print "$a \n";

sub validate_text {
my $text = shift;
if (decode_utf8($text)) { print "TEXT is utf8\n";} else { print "is not
utf8\n";}
print "TESTING $text\n";
my $id = 1;
my $doc = $parser->parse_string($text);
my $root = $doc->getDocumentElement;

my @quotations = $root->findnodes('q');
foreach my $q (@quotations) {
unless ($q->hasAttribute('id')) { print "NO ID\n";
$q->setAttribute('id', "$id"); ++$id;}
else { print "HAS ID\n";}
my $id_new = $q->getAttribute('id');
print "NEW ID: $id_new\n";
}

my $newtext= $root->toString;
if (decode_utf8($newtext)) { print "NEW TEXT is utf8\n";} else { print
"is not utf8\n";}
return ($newtext);
}
------------------------------------------------------------

I know that I can set a document encoding by creating a new $doc
altogether, but I don't want to do this in this case, as the
createDocument method prepends an xml version string to the created
document, and this messes up the routines which process the code
afterwards.

Thanks in advance,

Birgit Kellner

 

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles