473,395 Members | 2,446 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Problems with utf8, locale and regex

I have made this testcase:
-----------------------
#!/usr/bin/perl
#use locale;
#use encoding 'iso-8859-1';
use utf8;
binmode(STDOUT, ":utf8");

print "\\x{00D8}:\n";
test("\x{00D8}");

print "\nØ:\n";
test("Ø");

sub test {
my $chr = shift;
print "ord: " . ord($chr) . ", '$chr', lc: " . lc($chr) . "\n";
print "isutf8: " . utf8::is_utf8($chr) . "\n";
$chr =~ /$chr/i && print "Caseinsensitive matches\n";
$chr =~ /$chr/ && print "Casesensitive matches\n";
}

-----------------------

The weirdest thing here is that if "use locale" is enabled, the case
insensitive test in the last test() will fail. Without use encoding it
will work in the first version (which does not get the utf8-flag), but not
in the last. Without use locale both works.

If I run the program with "use encoding.." enabled, both versions will
have the utf8-flag, and both fails. It will also print the result in
ISO-8859-1, even though I have the binmode() later.

It doesn't seem to matter what the locale is. I have tried no_NO.UTF-8 and
en_US.UTF-8. lc($chr) works in both cases, but it only sorts arrays
correctly with no_NO.

I save the file in UTF-8 mode.
Dec 5 '07 #1
0 2322

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it...
3
by: Ksenia Marasanova | last post by:
Hi, I have some problems with locale module. On my workstation, changing locale doesn't have effect: Python 2.3 (#1, Sep 13 2003, 00:49:11) on darwin Type "help", "copyright", "credits" or...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
4
by: Harald Armin Massa | last post by:
Hello, I am looking for a method to convince cx_Oracle and oracle to encode it's replies in UTF8. For the moment I have to... cn=cx_Oracle.connect("user","password", "database")...
8
by: csanjith | last post by:
Hi, i have a situaion where i need to convert the characters entered in an text field to upper case using C. The configuration id utf8 environment in which user can enter any character (single ,...
4
by: Peter Münster | last post by:
Hello, str_word_count() does not seem to work with locale "fr_FR.utf8". The output of the following script is string(10) "fr_FR.utf8" Array ( =bi =re ) I think, that "bière" should be...
1
by: amandeep.bhatia1 | last post by:
Hello Friends, I am working on a project to support internationalization for a existing project. While supporting UTF-8 I am facing a problem , while doing POC. I have a C string which I...
2
by: whatdoineed2do | last post by:
hi, i've not done much i18n stuff before but i was wondering how, if given a string (encoded as russian or kanji mb) how i would get this converted to utf8. i'm working with forte compiler on...
0
by: damonwischik | last post by:
I use emacs 22 and python-mode. Emacs can display utf8 characters (e.g. when I open a utf8-encoded file with Chinese, those characters show up fine), and I'd like to see utf8-encoded output from my...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.