473,602 Members | 2,775 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML

The sctript below:

- accepts a value from a form text field;

- builds XML document around it,

- deparses the document to the string using toString(),

- parses the string into the XML document using parse_string()

- transforms XML document into HTML document using XSL

Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:

=============== =============== =============== =============== =========

:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_te xt>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test _text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24============= =============== =============== =============== ===========

I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.

Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.

Vlajko Knezic,

Toronto, Ontario



#! c:/Perl/bin/Perl.exe

use CGI;

use XML::LibXML;

use XML::LibXSLT;

use CGI::Carp qw( fatalsToBrowser );

use Encode;

my $mDocument = XML::LibXML::Do cument-> new();

my $parser = XML::LibXML->new();

$mDocument->setEncoding("U TF8");

my $mCGI = new CGI;

print $mCGI->header;

my $mTest_text = $mCGI->param('test'); ;

my $mTest = $mDocument-> createElement(" test");

my $mTestText = $mDocument-> createElement(" test_text");

$mTestText->appendTextNode ($mTest_text);

$mTest->appendChild($m TestText);

$mDocument->setDocumentEle ment( $mTest );

$mDocument->setEncoding("U TF8");

my $mTestXML = $mDocument->toString();

my $mParsedTestXML = $parser->parse_string($ mTestXML);

my $mParsedXMLXSL = $parser->parse_file('te st.xsl');

my $mParserXSL = XML::LibXSLT->new();

my $mParsedXSL = $mParserXSL->parse_styleshe et($mParsedXMLX SL);

my $mPageHTML = $mParsedXSL->transform($mPa rsedTestXML);

my $mPrintPageHTML = $mParsedXSL->output_string( $mPageHTML);

print $mPrintPageHTML ;


<?xml version="1.0"?>

<xsl:styleshe et xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"

<xsl:output method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="ye s"/>

<xsl:template match="//test">


<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>




<xsl:value-of select="test_te xt"/>

<form name="test" type="post" target="_self">

<input type="text" name="test" /><input type="submit" name="button"/>






Jul 19 '05 #1
1 27692
Vlajko Knezic wrote:
$mDocument->setEncoding("U TF8");
my $mCGI = new CGI;
my $mTest_text = $mCGI->param('test'); ;

This is the point, you need to encode $mTest_text into
UTF8 before doing anything with that string. You have
promised the XML library that you will be working with
UTF8, therefore it is up to you to ensure that everything
is UTF8 (not ISO8859-1).

Any further questions should be posted to comp.lang.perl. misc
and not this newsgroup (comp.lang.perl is defunct).
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of column, ALTER TABLE `icons` CHANGE `name_farsi` `name_farsi` VARCHAR( 99 ) CHARACTER SET utf8 COLLATE utf8_persian_ci DEFAULT NULL and change default charset of database like below code :
by: pekka niiranen | last post by:
Hi there, I have two files "my.utf8" and "my.utf16" which both contain BOM and two "a" characters. Contents of "my.utf8" in HEX: EFBBBF6161 Contents of "my.utf16" in HEX: FEFF6161
by: Olav | last post by:
I need to do some cross-platform XML in C++. Previously I have used Xerces, but some Googeling indicated that many people prefer libxml. Also it is UTF-8, which is easier for us (Xerces is UTF-16) . So I decided to use libxml++ for my prototype. On Windows it took much longer to set up than Xerces, had to find some obscure DLLs before it would run, Got a GPF at a point, and did the rest of the prototype in Xerces.
by: Thomas Podlesak | last post by:
I need a check, if a file is utf8 encoded. I only found the php-functions 'iconv' and 'recode'. But it seems it´s not possible to determine the encoding with them. Isn´t there any similar function to the 'file'-command on linux for php?
by: Claudio Cicali | last post by:
Hi, I'm trying to restore a pg_dump-backed up database from one server to another. The problem is that the db is "mixed encoded" in UTF-8 and LATIN1... (weird but, yes it is ! It was ported once from a hypersonic db... that screwed up something and now I'm fighting with that...). So, trying to restore that db into a UTF-8 encoded new one, gives me errors ("invalid unicode character..."), but importing it
by: ranjithkumar | last post by:
I am using mysql and have some data in my application in the latin1 charset. I have a necessity to support the utf 8 charset. Now I want to migrate the data between these two charset. The normal way I do migration is as follows: Taking a dump of the data with the currently running mysql converting the necessary parameters in the mysql settings and starting the mysql with utf8 support droping the database.
by: uday.sen | last post by:
Hi, I need to convert a string from UTF8 to wide character (wchar_t *). I perform the same in windows using: MultiByteToWideChar(CP_UTF8, 0, pInput, -1, pOutput, nLen); However, in linux this API is not available. However, there exists mbstowcs() API, which converts multibyte string to wide character. But will this API convert UTF8 encoded string to wide character? Or this
by: Peter Münster | last post by:
Hello, str_word_count() does not seem to work with locale "fr_FR.utf8". The output of the following script is string(10) "fr_FR.utf8" Array ( =bi =re ) I think, that "bière" should be recognized as word. Here is the test-script:
by: geoplab | last post by:
I have this error message when compiling mx_init.c by linking libxml library as follow: gcc `xml2-config --cflags` -c mx_init.c `xml2-config --libs` -g -Wall gcc: -lxml2: linker input file unused because linking not done gcc: -lz: linker input file unused because linking not done gcc: -lpthread: linker input file unused because linking not done gcc: -lm: linker input file unused because linking not done How can I fix this error?
by: alex | last post by:
I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f ISO-8859-1 -t UTF-8 mydb.sql mydb_utf8.sql mysqlCREATE DATABASE mydb_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.