By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,077 Members | 1,237 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,077 IT Pros & Developers. It's quick & easy.

utf8 pragma - strange behavior

P: n/a
I am trying to understand how to work with Unicode in Perl. I have
read the relevant man pages (perluniintro, perlunicode, etc.) and have
written severl scripts to test/verifiy my understanding. However, I
created a script that has unexpected output. The script is below and
it contains some UTF-8 encoded characters which represent all five
Spanish accented vowels plus the enye (n with a tilde over it) in upper
and lower case. I hope that this post comes through as UTF-8 encoded
as the source code is. I am posting from Google groups which does use
UTF-8 encoding.

BEGIN CODE >>
#!/usr/bin/perl

use warnings;
use strict;
#use utf8;
use Encode;

# using utf8 causes the characters to be printed in latin-1 encoding

my %table = (
# spanish
# hexidecimal UTF-8 => actual UTF-8
'0xc381' => chr(hex('c3')) . chr(hex('81')), # '',
'0xc389' => encode("utf8", "\x{00c9}"), # '',
'0xc38d' => '',
'0xc393' => '',
'0xc391' => '',
'0xc39a' => '',
'0xc3a1' => '',
'0xc3a9' => '',
'0xc3ad' => '',
'0xc3b3' => '',
'0xc3b1' => '',
'0xc3ba' => '',
);

foreach (sort keys %table) {
print "$_ = $table{$_}\n";
}
<< END CODE

When the 'use utf8' line is commented out, the script outputs the UTF-8
characters correctly. However, when the utf8 pragma is used, the
characters that are actually hard coded into the hash as UTF-8 (not the
or ) are printed in Latin-1. To my understanding, in Perl 5.8.x,
the only effect of the utf8 pragma is to tell the parser that literals
and variables may contain UTF-8 encoded characters. However in
practice, the utf8 pragma is effecting the script's output.

I have tested the script on Mac OSX 10.3.8 with Perl 5.8.1 and on
Fedora Core (not sure which version) running perl 5.8.3.

Can anyone explain why the utf8 pragma effects the output of the script?

Jul 19 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
ryang wrote:
I am trying to understand how to work with Unicode in Perl. I have
read the relevant man pages (perluniintro, perlunicode, etc.) and have
written severl scripts to test/verifiy my understanding. However, I
created a script that has unexpected output. The script is below and
Welcome to the club. :-)
Can anyone explain why the utf8 pragma effects the output of the script?


My problem (different post) is slightly different, but
I'm going to try commenting out the pragma to see what happens.
Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.