By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,220 Members | 1,473 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,220 IT Pros & Developers. It's quick & easy.

LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

P: n/a
Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML

The sctript below:

- accepts a value from a form text field;

- builds XML document around it,

- deparses the document to the string using toString(),

- parses the string into the XML document using parse_string()

- transforms XML document into HTML document using XSL
transformation

Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:

================================================== ===================

:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24================================================ =====================

I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.

Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.

Vlajko Knezic,

Toronto, Ontario

---------------------------------------------------------------------------------------------------------------------

test.cgi

#! c:/Perl/bin/Perl.exe

use CGI;

use XML::LibXML;

use XML::LibXSLT;

use CGI::Carp qw( fatalsToBrowser );

use Encode;

my $mDocument = XML::LibXML::Document-> new();

my $parser = XML::LibXML->new();

$mDocument->setEncoding("UTF8");

my $mCGI = new CGI;

print $mCGI->header;

my $mTest_text = $mCGI->param('test');;

my $mTest = $mDocument-> createElement("test");

my $mTestText = $mDocument-> createElement("test_text");

$mTestText->appendTextNode($mTest_text);

$mTest->appendChild($mTestText);

$mDocument->setDocumentElement( $mTest );

$mDocument->setEncoding("UTF8");

my $mTestXML = $mDocument->toString();

my $mParsedTestXML = $parser->parse_string($mTestXML);

my $mParsedXMLXSL = $parser->parse_file('test.xsl');

my $mParserXSL = XML::LibXSLT->new();

my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);

my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);

my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);

print $mPrintPageHTML;

test.xsl

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>

<xsl:template match="//test">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

</head>

<html>

<body>

<xsl:value-of select="test_text"/>

<form name="test" type="post" target="_self">

<input type="text" name="test" /><input type="submit" name="button"/>

</form>

</body>

</html>

</xsl:template>

</xsl:stylesheet>


Jul 19 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Vlajko Knezic wrote:
$mDocument->setEncoding("UTF8");
my $mCGI = new CGI;
my $mTest_text = $mCGI->param('test');;


This is the point, you need to encode $mTest_text into
UTF8 before doing anything with that string. You have
promised the XML library that you will be working with
UTF8, therefore it is up to you to ensure that everything
is UTF8 (not ISO8859-1).

Any further questions should be posted to comp.lang.perl.misc
and not this newsgroup (comp.lang.perl is defunct).
-Joe
Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.