By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,423 Members | 1,327 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,423 IT Pros & Developers. It's quick & easy.

utf-8 read/write file

P: n/a
Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(*,Đ,Ž,Č,Ć).
How can I read/write from/to file in utf-8 encoding?
I read file with fileinput.input.

thanks
Oct 8 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Oct 8, 12:49*pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).
Can you show us what you have so far?
How can I read/write from/to file in utf-8 encoding?
import codecs
data = codecs.open("my-utf8-file.txt").read()
I read file with fileinput.input.

thanks
Oct 8 '08 #2

P: n/a
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
>Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).

Can you show us what you have so far?
>How can I read/write from/to file in utf-8 encoding?

import codecs
data = codecs.open("my-utf8-file.txt").read()
>I read file with fileinput.input.

thanks
I have tried with codecs, but when i use encoding="utf-8" i get this error on
word : ivot

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\getcontent.py" , line 43, in <module>
encoding="utf-8").readlines()
File "C:\Python25\Lib\codecs.py", line 626, in readlines
return self.reader.readlines(sizehint)
File "C:\Python25\Lib\codecs.py", line 535, in readlines
data = self.read()
File "C:\Python25\Lib\codecs.py", line 424, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte
i just need to read from file1.txt, process (its simple text processing) some
words and write them to file2.txt without loss of croatian characters. ()
Oct 8 '08 #3

P: n/a
On Oct 8, 5:55*pm, gigs <g...@hi.t-com.hrwrote:
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte
Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent
Oct 8 '08 #4

P: n/a
Kent Johnson wrote:
On Oct 8, 5:55 pm, gigs <g...@hi.t-com.hrwrote:
>Benjamin wrote:
>>On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte

Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent
This data wasnt in utf-8 probably, today i get another one utf-8 and its working

thanks
Oct 9 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.