472,121 Members | 1,518 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,121 software developers and data experts.

Foreign Character Problems In Python 2.5 and Tkinter

Hi,

I'm writing a small text editor type application with Python 2.5 and
Tkinter. I'm using the Tk text widget for input and output, and the
problem is that when I try to save its contents to a .txt file, any
Scandinavian letters such as "äöå ÄÖÅ" are saved incorrectly and show up
as a mess when I open the .txt file in Windows Notepad.

It seems that the characters will only get mixed if the user has typed
them into the widget, but if the program has outputted them, they are
saved correctly.

The function that is saving the file is as follows:

try:
file = open(self.currentSaveFile, 'w+')
file.write(self.text.get(0.0, END))
except IOError:
tkMessageBox.showwarning('Save File', 'An error occurred while trying to
save \"' + self.currentSaveFile + '\"', parent=self.frame)
finally:
file.close()

Sometimes its output in the file is "äöå ÄÖÅ" for "äöå ÄÖÅ" and
sometimes it gives me the error: "UnicodeEncodeError: 'ascii' codec
can't encode characters in position 33-35: ordinal not in range(128)"
I have tried changing it to:

try:
file = codecs.open(savefilename, 'w+', 'utf-8', 'ignore')
file.write(unicode(self.text.get(0.0, END), 'utf-8', 'ignore'))
self.currentSaveFile = savefilename
except IOError:
tkMessageBox.showwarning('Save File', 'An error occurred while trying to
save \"' + self.currentSaveFile + '\"', parent=self.frame)
finally:
file.close()

Which does save the user-typed characters correctly, but loses any
newlines and "äöå" characters outputted by the program.

I have no idea how solve this problem, and would appreciate any help.
Oct 13 '07 #1
6 2276
Juha S. kirjoitti:
problem is that when I try to save its contents to a .txt file, any
Scandinavian letters such as "äöå ÄÖÅ" are saved incorrectly and show up
as a mess when I open the .txt file in Windows Notepad.

It seems that the characters will only get mixed if the user has typed
them into the widget, but if the program has outputted them, they are
saved correctly.
Did you define the encoding for the source file and
put u (for unicode) in front of your strings. The
following piece produces proper UTF-8. Couldn't test with
Notepad though, no Windows here.

Note this message is also encoded in UTF-8, so should be
your editor. I can't believe we are still messing with this
stuff in 2007. In old bad days it was easy, you should
only learn to read { as ä, | as ö etc... and vice versa
with localized terminals -- C code looked rather exotic
with a-umlauts everywhere ;)
#!/usr/bin/python
# -*- coding: utf-8 -*-

from Tkinter import *
import codecs

class Application(Frame):
def save(self):
FILE = codecs.open("outfile.txt", "w", "utf-8")
FILE.write(u"START - åäöÅÄÖ\n")
FILE.write(self.text_field.get(0.0, END))
FILE.write(u"END - åäöÅÄÖ\n")
FILE.close()
self.quit()

def __init__(self, master=None):
Frame.__init__(self, master)
self.grid()

self.text_field = Text(self, width=40, height=10)
self.text_field.grid()

self.save_button = Button(self, text="save and exit", command=self.save)
self.save_button.grid()

if __name__ == "__main__":
app = Application()
app.mainloop()

Oct 13 '07 #2
Thanks for the reply. I made changes to my code according to your
example. Now any Scandinavian characters that are outputted by the
program are missing in the Tk text box.

I'm using a loading function like this to load the data that is to be
outputted by the program:

def loadWords(self, filename):
ret = []

try:
file = codecs.open(filename, 'r', 'utf-8', 'ignore')
for line in file:
if line.isspace() == False: #Must skip blank lines (read
only lines that contain text).
line = line.replace(u'\n', u'')
ret.append(line)
except IOError:
tkMessageBox.showwarning(u'Open File', u'An error occurred
wile trying to load \"' + filename + u'\"', parent=self.frame)
finally:
file.close()

return ret
Also, the newlines are still lost when saving the text widget contents
to a file. I'm inserting the program generated text to the text widget
through "text.insert(END, txt + u'\n\n')".
Janne Tuukkanen wrote:
Juha S. kirjoitti:
>problem is that when I try to save its contents to a .txt file, any
Scandinavian letters such as "äöå ÄÖÅ" are saved incorrectly and show up
as a mess when I open the .txt file in Windows Notepad.

It seems that the characters will only get mixed if the user has typed
them into the widget, but if the program has outputted them, they are
saved correctly.

Did you define the encoding for the source file and
put u (for unicode) in front of your strings. The
following piece produces proper UTF-8. Couldn't test with
Notepad though, no Windows here.

Note this message is also encoded in UTF-8, so should be
your editor. I can't believe we are still messing with this
stuff in 2007. In old bad days it was easy, you should
only learn to read { as ä, | as ö etc... and vice versa
with localized terminals -- C code looked rather exotic
with a-umlauts everywhere ;)
#!/usr/bin/python
# -*- coding: utf-8 -*-

from Tkinter import *
import codecs

class Application(Frame):
def save(self):
FILE = codecs.open("outfile.txt", "w", "utf-8")
FILE.write(u"START - åäöÅÄÖ\n")
FILE.write(self.text_field.get(0.0, END))
FILE.write(u"END - åäöÅÄÖ\n")
FILE.close()
self.quit()

def __init__(self, master=None):
Frame.__init__(self, master)
self.grid()

self.text_field = Text(self, width=40, height=10)
self.text_field.grid()

self.save_button = Button(self, text="save and exit", command=self.save)
self.save_button.grid()

if __name__ == "__main__":
app = Application()
app.mainloop()

Oct 13 '07 #3
Sat, 13 Oct 2007 16:13:21 +0300, Juha S. kirjoitti:
Thanks for the reply. I made changes to my code according to your
example. Now any Scandinavian characters that are outputted by the
program are missing in the Tk text box.
file = codecs.open(filename, 'r', 'utf-8', 'ignore')
Remove that 'ignore'. If you then get error which complains,
that utf-8 codec can't handle the file, you've found the culprit.
The file might be in iso-8859-1.
JanneT

Oct 13 '07 #4
Thanks! Opening and saving the file with the iso-8859-1 codec seems to
handle the characters correctly. Now the only problem left are the
missing newlines in the output file. I tried googling for the iso code
for newline and entering it in a Python string as '\x0A' but it doesn't
work in the output file which still loses the newlines.
Janne Tuukkanen wrote:
Sat, 13 Oct 2007 16:13:21 +0300, Juha S. kirjoitti:

>Thanks for the reply. I made changes to my code according to your
example. Now any Scandinavian characters that are outputted by the
program are missing in the Tk text box.


> file = codecs.open(filename, 'r', 'utf-8', 'ignore')

Remove that 'ignore'. If you then get error which complains,
that utf-8 codec can't handle the file, you've found the culprit.
The file might be in iso-8859-1.
JanneT

Oct 13 '07 #5
On Oct 13, 5:22 pm, "Juha S." <jusa...@gmail.comwrote:
Thanks! Opening and saving the file with the iso-8859-1 codec seems to
handle the characters correctly. Now the only problem left are the
missing newlines in the output file. I tried googling for the iso code
for newline and entering it in a Python string as '\x0A' but it doesn't
work in the output file which still loses the newlines.

Janne Tuukkanen wrote:
Sat, 13 Oct 2007 16:13:21 +0300, Juha S. kirjoitti:
Thanks for the reply. I made changes to my code according to your
example. Now any Scandinavian characters that are outputted by the
program are missing in the Tk text box.
file = codecs.open(filename, 'r', 'utf-8', 'ignore')
Remove that 'ignore'. If you then get error which complains,
that utf-8 codec can't handle the file, you've found the culprit.
The file might be in iso-8859-1.
JanneT
As a noob I've struggled a bit, but basically what I've come up with
is =if the information is strings and especially strings stored in
any style of list/dict, it takes a loop to write the lines to file
myfile[ i ] + '\n' to keep each line for Python I/O purposes. If
you're done with Python manipulation and want WIN, MAC, or UNIX to
begin file I/O, then, you need the consideration of <newline-char>
from the os module, or code it in yourself, e.g. '\r\n'. The fact you
are using codec iso-latin-1 (or iso-8859-1) doesn't change the '\n'
from Python's viewpoint -- that is: '\n' is still '\n'. When your
efforts are I/O with binary encoding the data, it's all Python's
viewpoint.

Oct 13 '07 #6
ni********************@yahoo.com wrote:
As a noob I've struggled a bit, but basically what I've come up with
is =if the information is strings and especially strings stored in
any style of list/dict, it takes a loop to write the lines to file
myfile[ i ] + '\n' to keep each line for Python I/O purposes. If
you're done with Python manipulation and want WIN, MAC, or UNIX to
begin file I/O, then, you need the consideration of <newline-char>
from the os module, or code it in yourself, e.g. '\r\n'. The fact you
are using codec iso-latin-1 (or iso-8859-1) doesn't change the '\n'
from Python's viewpoint -- that is: '\n' is still '\n'. When your
efforts are I/O with binary encoding the data, it's all Python's
viewpoint.

Ah, it was so simple. I replaced any '\n' characters with 'os.linesep'
in the source as you suggested, and now everything works beautifully.
Thanks for the help, guys!
Oct 15 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by black | last post: by
reply views Thread by Adelein and Jeremy | last post: by
1 post views Thread by Michael Yanowitz | last post: by
reply views Thread by Thomas P. | last post: by
3 posts views Thread by Thomas Ploch | last post: by
1 post views Thread by jmalone | last post: by
8 posts views Thread by karthikbalaguru | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.