473,403 Members | 2,338 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

perl, XML::LibXML: encoding problems while changing attributes on an XML string

Hello,

I'm parsing a chunk of XML code and would like to add attribute values
to individual tags if these are lacking. This is with perl 5.8.6,
libxml2 2.6.17, XML::LibXML 1.58.

Basically, I have the parser add the attribute values to the respective
nodes and then use the toString method of XML::LibXML::Document to
write the modified text to a scalar. Both the original and the modified
text evaluate properly as utf8, but the modified text doesn't print
properly on the console, nor does it get entered as utf8 into a MySQL
database.

I don't really understand what's going on, and on what level the
error(s) could be located (console encoding, perl encoding, XML
encoding), and would appreciate any help I can get ...

Here's the code:
------------------------------------------------

#!/usr/bin/perl

use strict;
use XML::LibXML;
use Encode 'decode_utf8';
use vars qw ($parser $p);
$parser = XML::LibXML->new();
my $version = XML::LibXML::LIBXML_DOTTED_VERSION;
print "libxml2 $version\n-------------\nXML::LibXML
$XML::LibXML::VERSION\n-------------------\n";
$p->{text} = qq|
<p>
<q who="Blabla">pramāṇavārttikasvavṛtti*īkā </qAnd this is
some further text.<br/>And even more text.<br/>And more.
<q who="Blabla2">The second quotation!</q>.
pramāṇavārttikasvavṛtti*īkā.
</p>|;

my $a = &validate_text($p->{text});
print "$a \n";

sub validate_text {
my $text = shift;
if (decode_utf8($text)) { print "TEXT is utf8\n";} else { print "is not
utf8\n";}
print "TESTING $text\n";
my $id = 1;
my $doc = $parser->parse_string($text);
my $root = $doc->getDocumentElement;

my @quotations = $root->findnodes('q');
foreach my $q (@quotations) {
unless ($q->hasAttribute('id')) { print "NO ID\n";
$q->setAttribute('id', "$id"); ++$id;}
else { print "HAS ID\n";}
my $id_new = $q->getAttribute('id');
print "NEW ID: $id_new\n";
}

my $newtext= $root->toString;
if (decode_utf8($newtext)) { print "NEW TEXT is utf8\n";} else { print
"is not utf8\n";}
return ($newtext);
}
------------------------------------------------------------

I know that I can set a document encoding by creating a new $doc
altogether, but I don't want to do this in this case, as the
createDocument method prepends an xml version string to the created
document, and this messes up the routines which process the code
afterwards.

Thanks in advance,

Birgit Kellner

Jul 23 '06 #1
0 1385

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Iain | last post by:
Folks, I'm having a problem with charset encodings that I desparately need some help with. I don't even pretend to know the basics about charsets, so please forgive my ignorance. I am...
0
by: nicolas | last post by:
Hello, I hesitated a lot before posting this topic here, but it seems more appropriate than posting it in c.l.perl. I would like to know how to change output encoding when using the perl...
5
by: Waldy | last post by:
Hi there, how do you set the encoding format of an XML string? When I was outputting the XML to a file you can specify the encoding format like so: XmlTextWriter myWriter; myWriter = new...
4
by: M.Posseth | last post by:
Hello I have this code that generates a xml string Dim sw As New System.IO.StringWriter Dim objXML As New XmlTextWriter(sw) Dim strXml As String With objXML
2
by: lprisr | last post by:
Hi, I have double byte characters in the content that I am returning using Web Services. However, the encoding in the xml file returned by Web Services is utf-8 and I am unable to read the...
4
by: Terry Olsen | last post by:
I use the following code to create an XML string: Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim tw As New StringWriter Dim xml...
4
by: Christina | last post by:
Hey Guys, Currently, I am using the below code: Dim oReqDoc as XmlDocument Dim requiredBytes As Byte() requiredBytes = System.Text.UTF8Encoding.UTF8.GetBytes(oReqDoc.InnerXml). Here, I am...
0
by: aeden.jameson | last post by:
Hi, I was wondering if somone could provide an outline of how I would go about implementing serialization that turns a string dictionary to XML attributes. For example, suppose I have ...
2
by: Bostonian | last post by:
I am loading an Xml from a physical file and passing it to XSL transformation. When i load the file (C# code), InnerXML shows backslash before all the XML attributes. Xsl transformation crashes...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.