473,385 Members | 1,890 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

utf8 pragma - strange behavior

I am trying to understand how to work with Unicode in Perl. I have
read the relevant man pages (perluniintro, perlunicode, etc.) and have
written severl scripts to test/verifiy my understanding. However, I
created a script that has unexpected output. The script is below and
it contains some UTF-8 encoded characters which represent all five
Spanish accented vowels plus the enye (n with a tilde over it) in upper
and lower case. I hope that this post comes through as UTF-8 encoded
as the source code is. I am posting from Google groups which does use
UTF-8 encoding.

BEGIN CODE >>
#!/usr/bin/perl

use warnings;
use strict;
#use utf8;
use Encode;

# using utf8 causes the characters to be printed in latin-1 encoding

my %table = (
# spanish
# hexidecimal UTF-8 => actual UTF-8
'0xc381' => chr(hex('c3')) . chr(hex('81')), # 'Á',
'0xc389' => encode("utf8", "\x{00c9}"), # 'É',
'0xc38d' => 'Í',
'0xc393' => 'Ó',
'0xc391' => 'Ñ',
'0xc39a' => 'Ú',
'0xc3a1' => 'á',
'0xc3a9' => 'é',
'0xc3ad' => 'í',
'0xc3b3' => 'ó',
'0xc3b1' => 'ñ',
'0xc3ba' => 'ú',
);

foreach (sort keys %table) {
print "$_ = $table{$_}\n";
}
<< END CODE

When the 'use utf8' line is commented out, the script outputs the UTF-8
characters correctly. However, when the utf8 pragma is used, the
characters that are actually hard coded into the hash as UTF-8 (not the
Á or É) are printed in Latin-1. To my understanding, in Perl 5.8.x,
the only effect of the utf8 pragma is to tell the parser that literals
and variables may contain UTF-8 encoded characters. However in
practice, the utf8 pragma is effecting the script's output.

I have tested the script on Mac OSX 10.3.8 with Perl 5.8.1 and on
Fedora Core (not sure which version) running perl 5.8.3.

Can anyone explain why the utf8 pragma effects the output of the script?

Jul 19 '05 #1
1 3945
ryang wrote:
I am trying to understand how to work with Unicode in Perl. I have
read the relevant man pages (perluniintro, perlunicode, etc.) and have
written severl scripts to test/verifiy my understanding. However, I
created a script that has unexpected output. The script is below and
Welcome to the club. :-)
Can anyone explain why the utf8 pragma effects the output of the script?


My problem (different post) is slightly different, but
I'm going to try commenting out the pragma to see what happens.
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of...
12
by: Chris Mullins | last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue. Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and B.2. I'm having trouble with the following...
10
by: Steven T. Hatton | last post by:
Stroustrup says this: http://www.research.att.com/~bs/bs_faq2.html#macro "So, what's wrong with using macros?" "And yes, I do know that there are things known as macros that doesn't...
6
by: Joseph Geretz | last post by:
Writing an Outlook AddIn with C#. For the user interface within Outlook I'm adding matching pairs of Toolbar buttons and Menu items. All of the buttons and menu items are wired up to send events to...
15
by: muttaa | last post by:
Hello all, I'm a beginner in C...May i like to know the difference between a #pragma and a #define.... Also,yet i'm unclear what a pragma is all about as i can find topics on it only in...
4
by: EmeraldShield | last post by:
(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader) I have a very strange problem that I cannot explain with a UTF8 Readline() although this could exist in other types of encoding,...
26
by: Rick | last post by:
I'm told that "#pragma once" has made it into the ISO standard for either C or C++. I can't find any reference to that anywhere. If it's true, do any of you have a reference I can use? ...
2
by: aleemakhtar1 | last post by:
wat is use of pragma directive in embedded sys ??
5
by: raashid bhatt | last post by:
What is #pragma once used for and what is #WIN#@_LEN_AND_MEAN
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.