By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,828 Members | 1,068 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,828 IT Pros & Developers. It's quick & easy.

Encoding for Devanagari Script.

P: n/a
Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

Thanks in Advance.

Regards,
Atul.
Jul 24 '08 #1
Share this Question
Share on Google+
5 Replies

P: n/a
Atul. skrev:
I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?
Are we talking about existing files? If you don't know what encoding
the files use, you could always try using the UTF-8 codec; it's very
likely to complain if you're attempting to decode something that's isn't
UTF-8.

If that doesn't work, it's a bit trickier -- there are several ways to
encode Unicode, and then there's ISCII as well. If you cannot sort it
out, try running this:
>>f = open("myfile.txt", "rb")
f.read(32)
on one of your files, and post the result, and chances are that someone
will be able to identify the encoding.

</F>

Jul 24 '08 #2

P: n/a


Atul. wrote:
Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?
You cannot hurt your machine by giving that a try.

This is a general comment for all beginners. Before posting, open the
interactive interpreter (or IDLE) and try something(s). If the result
puzzles you, copy and paste into a post. Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.

Jul 25 '08 #3

P: n/a
Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.
>>import codecs
f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

after that I tried the read binary mode and tried reading the firt 32
bytes and this is what I got.
>>f = open("C:\Documents and Settings\\admin\\My Documents\\corpus\\dainaikAikya collected by sushant.txt","rb")
f.read(32)
'\xef\xbb\xbf\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa5\x80
\xe0\xa4\xa6\xe0\xa4\xbf\xe0\xa4\xb2\xe0\xa5\x8d
\xe0\xa4\xb2\xe0\xa5\x80,'

Now based on my knowledge of Unicode I think this is a utf-8 file (the
first 3 bytes \xef\xbb\xbf), please correct me if I am wrong. How do I
read this?

Atul.

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

On Jul 25, 6:21*am, Terry Reedy <tjre...@udel.eduwrote:
Atul. wrote:
Hello All,
I wanted to know what encoding should I use to open the files with
Devanagaricharacters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

You cannot hurt your machine by giving that a try.

This is a general comment for all beginners. *Before posting, open the
interactive interpreter (or IDLE) and try something(s). *If the result
puzzles you, copy and paste into a post. *Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.
Jul 28 '08 #4

P: n/a
Atul. wrote:
Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.
>>>import codecs
f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

Only slightly. You're importing the codecs module
but you're not using it. So you're *actually* using
the built-in open function, which doesn't have an
encoding parameter. It does have a third param
which is to do with the buffer size.

Just change your code to use codecs.open ("...")
and, I suggest, either use raw strings for your
filename (r"c:\docume...") or use the other kind
of slash ("c:/documen..."). Otherwise you might
run into some problems.

TJG
Jul 28 '08 #5

P: n/a
Thanks, Tim that did work. I will proceed with my playing around now.

Thanks a ton.

Atul.
>
Only slightly. You're importing the codecs module
but you're not using it. So you're *actually* using
the built-in open function, which doesn't have an
encoding parameter. It does have a third param
which is to do with the buffer size.

Just change your code to use codecs.open ("...")
and, I suggest, either use raw strings for your
filename (r"c:\docume...") or use the other kind
of slash ("c:/documen..."). Otherwise you might
run into some problems.

TJG
Jul 28 '08 #6

This discussion thread is closed

Replies have been disabled for this discussion.