467,219 Members | 1,479 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,219 developers. It's quick & easy.

utf-8 read/write file

Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(*,Đ,Ž,Č,Ć).
How can I read/write from/to file in utf-8 encoding?
I read file with fileinput.input.

thanks
Oct 8 '08 #1
  • viewed: 4351
Share:
4 Replies
On Oct 8, 12:49*pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).
Can you show us what you have so far?
How can I read/write from/to file in utf-8 encoding?
import codecs
data = codecs.open("my-utf8-file.txt").read()
I read file with fileinput.input.

thanks
Oct 8 '08 #2
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
>Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).

Can you show us what you have so far?
>How can I read/write from/to file in utf-8 encoding?

import codecs
data = codecs.open("my-utf8-file.txt").read()
>I read file with fileinput.input.

thanks
I have tried with codecs, but when i use encoding="utf-8" i get this error on
word : ivot

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\getcontent.py" , line 43, in <module>
encoding="utf-8").readlines()
File "C:\Python25\Lib\codecs.py", line 626, in readlines
return self.reader.readlines(sizehint)
File "C:\Python25\Lib\codecs.py", line 535, in readlines
data = self.read()
File "C:\Python25\Lib\codecs.py", line 424, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte
i just need to read from file1.txt, process (its simple text processing) some
words and write them to file2.txt without loss of croatian characters. ()
Oct 8 '08 #3
On Oct 8, 5:55*pm, gigs <g...@hi.t-com.hrwrote:
Benjamin wrote:
On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte
Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent
Oct 8 '08 #4
Kent Johnson wrote:
On Oct 8, 5:55 pm, gigs <g...@hi.t-com.hrwrote:
>Benjamin wrote:
>>On Oct 8, 12:49 pm, Bruno <Br...@hi.t-com.hrwrote:
Hi!
I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(,,,,).
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte

Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent
This data wasnt in utf-8 probably, today i get another one utf-8 and its working

thanks
Oct 9 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

38 posts views Thread by Haines Brown | last post: by
32 posts views Thread by Wolfgang Draxinger | last post: by
3 posts views Thread by Richard Connamacher | last post: by
23 posts views Thread by Steven T. Hatton | last post: by
1 post views Thread by David Bertoni | last post: by
22 posts views Thread by Albert Oppenheimer | last post: by
6 posts views Thread by archana | last post: by
7 posts views Thread by Jimmy Shaw | last post: by
35 posts views Thread by Bjoern Hoehrmann | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.