Connecting Tech Pros Worldwide Help | Site Map

LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 19th, 2005, 05:17 AM
Vlajko Knezic
Guest
 
Posts: n/a
Default LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML



The sctript below:

- accepts a value from a form text field;

- builds XML document around it,

- deparses the document to the string using toString(),

- parses the string into the XML document using parse_string()

- transforms XML document into HTML document using XSL
transformation



Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:

================================================== ===================

:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24================================================ =====================



I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.



Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.



Vlajko Knezic,

Toronto, Ontario



---------------------------------------------------------------------------------------------------------------------

test.cgi



#! c:/Perl/bin/Perl.exe



use CGI;

use XML::LibXML;

use XML::LibXSLT;

use CGI::Carp qw( fatalsToBrowser );

use Encode;



my $mDocument = XML::LibXML::Document-> new();

my $parser = XML::LibXML->new();



$mDocument->setEncoding("UTF8");

my $mCGI = new CGI;

print $mCGI->header;

my $mTest_text = $mCGI->param('test');;



my $mTest = $mDocument-> createElement("test");

my $mTestText = $mDocument-> createElement("test_text");

$mTestText->appendTextNode($mTest_text);

$mTest->appendChild($mTestText);

$mDocument->setDocumentElement( $mTest );

$mDocument->setEncoding("UTF8");

my $mTestXML = $mDocument->toString();

my $mParsedTestXML = $parser->parse_string($mTestXML);



my $mParsedXMLXSL = $parser->parse_file('test.xsl');

my $mParserXSL = XML::LibXSLT->new();

my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);

my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);

my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);

print $mPrintPageHTML;



test.xsl



<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>

<xsl:template match="//test">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

</head>

<html>

<body>

<xsl:value-of select="test_text"/>

<form name="test" type="post" target="_self">

<input type="text" name="test" /><input type="submit" name="button"/>

</form>

</body>

</html>

</xsl:template>

</xsl:stylesheet>







  #2  
Old July 19th, 2005, 05:17 AM
Joe Smith
Guest
 
Posts: n/a
Default Re: LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Vlajko Knezic wrote:
[color=blue]
> $mDocument->setEncoding("UTF8");
> my $mCGI = new CGI;
> my $mTest_text = $mCGI->param('test');;[/color]

This is the point, you need to encode $mTest_text into
UTF8 before doing anything with that string. You have
promised the XML library that you will be working with
UTF8, therefore it is up to you to ensure that everything
is UTF8 (not ISO8859-1).

Any further questions should be posted to comp.lang.perl.misc
and not this newsgroup (comp.lang.perl is defunct).
-Joe
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,662 network members.