Hi Fredrik and Terry,
Well I got this on IDLE I think I have done something wrong.
>>import codecs
f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required
after that I tried the read binary mode and tried reading the firt 32
bytes and this is what I got.
>>f = open("C:\Documents and Settings\\admin\\My Documents\\corpus\\dainaikAikya collected by sushant.txt","rb")
f.read(32)
'\xef\xbb\xbf\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa5\x80
\xe0\xa4\xa6\xe0\xa4\xbf\xe0\xa4\xb2\xe0\xa5\x8d
\xe0\xa4\xb2\xe0\xa5\x80,'
Now based on my knowledge of Unicode I think this is a utf-8 file (the
first 3 bytes \xef\xbb\xbf), please correct me if I am wrong. How do I
read this?
Atul.
PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.
On Jul 25, 6:21*am, Terry Reedy <tjre...@udel.eduwrote:
Atul. wrote:
Hello All,
I wanted to know what encoding should I use to open the files with
Devanagaricharacters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?
You cannot hurt your machine by giving that a try.
This is a general comment for all beginners. *Before posting, open the
interactive interpreter (or IDLE) and try something(s). *If the result
puzzles you, copy and paste into a post. *Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.