473,320 Members | 1,974 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Ubunu - Linux - Unicode - encoding

Hello NG, a little longer question,

I'm working on our project DrPython and try fix bugs in Linux,
(on windows, it works very good now with latin-1 encoding).

On Windows, it works good now, using setappdefaultencoding and the right
encoding for open with styled text control with the right encoding the
files. (I see the german Umlauts äöü and the "strong 's'" "ß")

The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

If I want to append this file to a list, I get somehow latin-1, cannot
decode 'utf-8'.

sys.setappdefaultencoding(self.prefs.defaultencodi ng) would be the
easiest solution which should be the same aus sys.setdefaultencoding in
linux.

Why is there a setappdefaultencoding on Windows and
sys.setdefaultencoding on linux.

I googled, and I found a strange solution (sys.setdefaultencoding is not
available)

import sys
reload (sys)

only then this function is available.
Why is this setdefaultencoding otherwise not working on linux?

(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).

Is there a system wide linux language setting (encoding), which I have
to install and adjust?

I know, there are the methods encode, unicode, decode, but how do I
know, when they are needed, I don't want to replace all the source for
encode, ... for string access.
So setappdefaultencoding would be the easiest way.

Should I use also/instead the wx.SetDefaultPyEncoding in DrPython?

This would be the easiest solution, setappdefaultencoding, (getting it
from preferences) but it doesn't work.

Beside I tried other editors like spe, pype, boa, ulipad, but none of
them displayed the file, which have german umlauts in the filesnames,
correctly.

Thank you verrrry much in advance for a possible solution.

Feb 1 '07 #1
6 3422
Il Thu, 01 Feb 2007 16:02:52 +0100, Franz Steinhaeusler ha scritto:
The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt
Could you please tell us a) which filesystem is that partition using (winxp
may be installed on fat32 or ntfs partitions) and b) which driver are you
using to read that partition (may be vfat, ntfs or fuse/ntfs-3g) and, last
but not least, c) which options are passed to that driver?

--
Alan Franzoni <al***************@gmail.com>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
Feb 1 '07 #2
On 1 Feb, 16:02, Franz Steinhaeusler <franz.steinhaeus...@gmx.at>
wrote:
>
The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

If I want to append this file to a list, I get somehow latin-1, cannot
decode 'utf-8'.
You mean that you expect the filename in UTF-8, but it arrives as
ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
library functions or through a GUI toolkit? What does
sys.getfilesystemencoding report?

[...]
Why is this setdefaultencoding otherwise not working on linux?
My impression was that you absolutely should not change the default
encoding. Instead, you should react to encoding information provided
by your sources of data. For example, sys.stdin.encoding tells you
about the data from standard input.
(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).
This sounds like a locale issue...
Is there a system wide linux language setting (encoding), which I have
to install and adjust?
I keep running into this problem when installing various
distributions. Generally, the locale needs to agree with the encoding
of the filenames in your filesystem, so that if you've written files
with UTF-8 filenames, you'll only see them with their proper names if
the locale you're using is based on UTF-8 - things like en_GB.utf8 and
de_AT.utf8 would be appropriate. Such locales are often optional
packages, as I found out very recently, and you may wish to look at
the language-pack-XX and language-pack-XX-base packages for Ubuntu
(substituting XX for your chosen language). Once they are installed,
typing "locale -a" will let you see available locales, and I believe
that changing /etc/environment and setting the LANG variable there to
one of the available locales may offer some kind of a solution.

Another thing I also discovered very recently, after doing a
debootstrap installation of Ubuntu, was that various terminals
wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
locale being set up, even though other desktop applications were happy
to accept and display the characters. I thought this was a keyboard
issue, compounded by the exotic nested X server plus User Mode Linux
solution I was experimenting with, but I think locales were the main
problem.

Paul

Feb 1 '07 #3
On Thu, 1 Feb 2007 16:42:02 +0100, Alan Franzoni
<al*******************@geemail.invalidwrote:
>Il Thu, 01 Feb 2007 16:02:52 +0100, Franz Steinhaeusler ha scritto:
>The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

Could you please tell us a) which filesystem is that partition using (winxp
may be installed on fat32 or ntfs partitions) and b) which driver are you
using to read that partition (may be vfat, ntfs or fuse/ntfs-3g) and, last
but not least, c) which options are passed to that driver?
Hallo Alan, thank you for answering.

a) FAT32
b) hm, don't know, I mounted it in fstab as FAT32#
in fstab there is:
c)/dev/hda1 /winxp auto rw,user,auto 0

One problem is still, and is a little OT perhaps here, but
nevertheless,:

If I copy files with german umlauts (äöü and strong 's' ß), these
filenames are not copied properly, and that characters are replaces
by little square symbols.

Is there anythin to set up on ubuntu to copy this properly.
I have only installed the english language, maybe that is the problem.
--
Franz Steinhaeusler
Feb 1 '07 #4
On 1 Feb 2007 08:24:07 -0800, "Paul Boddie" <pa**@boddie.org.uk>
wrote:
>On 1 Feb, 16:02, Franz Steinhaeusler <franz.steinhaeus...@gmx.at>
wrote:
>>
The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

If I want to append this file to a list, I get somehow latin-1, cannot
decode 'utf-8'.

You mean that you expect the filename in UTF-8, but it arrives as
ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
library functions or through a GUI toolkit? What does
sys.getfilesystemencoding report?
Hello Paul,

I set the sysencoding already to 'latin-1', but obviously the value
is ignored and it takes 'utf-8' (?)

I get it with
thelist = os.listdir(directory) and the directory is a string, not
unicode.
>
[...]
>Why is this setdefaultencoding otherwise not working on linux?

My impression was that you absolutely should not change the default
encoding.
Aha.

>Instead, you should react to encoding information provided
by your sources of data. For example, sys.stdin.encoding tells you
about the data from standard input.
>(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).

This sounds like a locale issue...
Hm, a setting in linux.
>
>Is there a system wide linux language setting (encoding), which I have
to install and adjust?

I keep running into this problem when installing various
distributions. Generally, the locale needs to agree with the encoding
of the filenames in your filesystem, so that if you've written files
with UTF-8 filenames, you'll only see them with their proper names if
the locale you're using is based on UTF-8 - things like en_GB.utf8 and
de_AT.utf8 would be appropriate. Such locales are often optional
packages, as I found out very recently, and you may wish to look at
the language-pack-XX and language-pack-XX-base packages for Ubuntu
(substituting XX for your chosen language). Once they are installed,
typing "locale -a" will let you see available locales, and I believe
that changing /etc/environment and setting the LANG variable there to
one of the available locales may offer some kind of a solution.
Ah thank you very much for that enlightment!
>
Another thing I also discovered very recently, after doing a
debootstrap installation of Ubuntu, was that various terminals
wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
locale being set up, even though other desktop applications were happy
to accept and display the characters.
That sound familar to me! ;)
I thought this was a keyboard
issue, compounded by the exotic nested X server plus User Mode Linux
solution I was experimenting with, but I think locales were the main
problem.

Paul
So that is not exactly simple. :)

Thank you very much for this precise answer!
--
Franz Steinhaeusler
Feb 1 '07 #5
Il Thu, 01 Feb 2007 20:57:53 +0100, Franz Steinhäusler ha scritto:
If I copy files with german umlauts (äöü and strong 's' ß), these
filenames are not copied properly, and that characters are replaces
by little square symbols.
Yes... I, myself, am italian, and I found no problem in using accented
letter (òèàìù). Since you say there's a problem as well in Nautilus and
other Ubuntu software, I suppose there's something wrong with your linux
setup, not with Python.

Or, at least: you should try solving that problem first, then check what
happens with python.

Try appending this options in your fstab as hda1 mount options:

iocharset=iso8859-15

unmount & remount and check what does happen.

--
Alan Franzoni <al***************@gmail.com>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
Feb 1 '07 #6
On Fri, 2 Feb 2007 00:12:45 +0100, Alan Franzoni
<al*******************@geemail.invalidwrote:
>Il Thu, 01 Feb 2007 20:57:53 +0100, Franz Steinhäusler ha scritto:
>If I copy files with german umlauts (äöü and strong 's' ß), these
filenames are not copied properly, and that characters are replaces
by little square symbols.

Yes... I, myself, am italian, and I found no problem in using accented
letter (òèàìù). Since you say there's a problem as well in Nautilus and
other Ubuntu software, I suppose there's something wrong with your linux
setup, not with Python.

Or, at least: you should try solving that problem first, then check what
happens with python.

Try appending this options in your fstab as hda1 mount options:

iocharset=iso8859-15

unmount & remount and check what does happen.

Thank you again, I will give it a try!
--
Franz Steinhaeusler
Feb 2 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
12
by: Mike Dee | last post by:
A very very basic UTF-8 question that's driving me nuts: If I have this in the beginning of my Python script in Linux: #!/usr/bin/env python # -*- coding: UTF-8 -*- should I - or should I...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: hezhenjie | last post by:
Hi, all: I just need to parse a unicode file, and assume to get data one line by one line. I use _wfopen(), fgetws(), wcslen(), wcsstr(), making it work normally on Windows platform. However,...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
2
by: rsd | last post by:
Hi, I'm trying get Samsung YH-920 mp3 player to work with Debian GNU/Linux. To do that I need to run http://www.paul.sladen.org/toys/samsung-yh-925/yh-925-db-0.1.py script, the idea behind the...
8
by: sonald | last post by:
Hi, I am using python2.4.1 I need to pass russian text into python and validate the same. Can u plz guide me on how to make my existing code support the russian text. Is there any module...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.