473,408 Members | 1,730 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML

The sctript below:

- accepts a value from a form text field;

- builds XML document around it,

- deparses the document to the string using toString(),

- parses the string into the XML document using parse_string()

- transforms XML document into HTML document using XSL
transformation

Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:

================================================== ===================

:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24================================================ =====================

I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.

Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.

Vlajko Knezic,

Toronto, Ontario

---------------------------------------------------------------------------------------------------------------------

test.cgi

#! c:/Perl/bin/Perl.exe

use CGI;

use XML::LibXML;

use XML::LibXSLT;

use CGI::Carp qw( fatalsToBrowser );

use Encode;

my $mDocument = XML::LibXML::Document-> new();

my $parser = XML::LibXML->new();

$mDocument->setEncoding("UTF8");

my $mCGI = new CGI;

print $mCGI->header;

my $mTest_text = $mCGI->param('test');;

my $mTest = $mDocument-> createElement("test");

my $mTestText = $mDocument-> createElement("test_text");

$mTestText->appendTextNode($mTest_text);

$mTest->appendChild($mTestText);

$mDocument->setDocumentElement( $mTest );

$mDocument->setEncoding("UTF8");

my $mTestXML = $mDocument->toString();

my $mParsedTestXML = $parser->parse_string($mTestXML);

my $mParsedXMLXSL = $parser->parse_file('test.xsl');

my $mParserXSL = XML::LibXSLT->new();

my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);

my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);

my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);

print $mPrintPageHTML;

test.xsl

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>

<xsl:template match="//test">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

</head>

<html>

<body>

<xsl:value-of select="test_text"/>

<form name="test" type="post" target="_self">

<input type="text" name="test" /><input type="submit" name="button"/>

</form>

</body>

</html>

</xsl:template>

</xsl:stylesheet>


Jul 19 '05 #1
1 27653
Vlajko Knezic wrote:
$mDocument->setEncoding("UTF8");
my $mCGI = new CGI;
my $mTest_text = $mCGI->param('test');;


This is the point, you need to encode $mTest_text into
UTF8 before doing anything with that string. You have
promised the XML library that you will be working with
UTF8, therefore it is up to you to ensure that everything
is UTF8 (not ISO8859-1).

Any further questions should be posted to comp.lang.perl.misc
and not this newsgroup (comp.lang.perl is defunct).
-Joe
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of...
10
by: pekka niiranen | last post by:
Hi there, I have two files "my.utf8" and "my.utf16" which both contain BOM and two "a" characters. Contents of "my.utf8" in HEX: EFBBBF6161 Contents of "my.utf16" in HEX: FEFF6161
3
by: Olav | last post by:
I need to do some cross-platform XML in C++. Previously I have used Xerces, but some Googeling indicated that many people prefer libxml. Also it is UTF-8, which is easier for us (Xerces is...
9
by: Thomas Podlesak | last post by:
I need a check, if a file is utf8 encoded. I only found the php-functions 'iconv' and 'recode'. But it seems it´s not possible to determine the encoding with them. Isn´t there any similar...
2
by: Claudio Cicali | last post by:
Hi, I'm trying to restore a pg_dump-backed up database from one server to another. The problem is that the db is "mixed encoded" in UTF-8 and LATIN1... (weird but, yes it is ! It was ported once...
2
by: ranjithkumar | last post by:
I am using mysql and have some data in my application in the latin1 charset. I have a necessity to support the utf 8 charset. Now I want to migrate the data between these two charset. The normal...
4
by: uday.sen | last post by:
Hi, I need to convert a string from UTF8 to wide character (wchar_t *). I perform the same in windows using: MultiByteToWideChar(CP_UTF8, 0, pInput, -1, pOutput, nLen); However, in linux...
4
by: Peter Münster | last post by:
Hello, str_word_count() does not seem to work with locale "fr_FR.utf8". The output of the following script is string(10) "fr_FR.utf8" Array ( =bi =re ) I think, that "bière" should be...
3
by: geoplab | last post by:
I have this error message when compiling mx_init.c by linking libxml library as follow: gcc `xml2-config --cflags` -c mx_init.c `xml2-config --libs` -g -Wall gcc: -lxml2: linker input file...
39
by: alex | last post by:
I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.