473,387 Members | 1,882 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Encoding for Devanagari Script.

Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

Thanks in Advance.

Regards,
Atul.
Jul 24 '08 #1
5 2082
Atul. skrev:
I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?
Are we talking about existing files? If you don't know what encoding
the files use, you could always try using the UTF-8 codec; it's very
likely to complain if you're attempting to decode something that's isn't
UTF-8.

If that doesn't work, it's a bit trickier -- there are several ways to
encode Unicode, and then there's ISCII as well. If you cannot sort it
out, try running this:
>>f = open("myfile.txt", "rb")
f.read(32)
on one of your files, and post the result, and chances are that someone
will be able to identify the encoding.

</F>

Jul 24 '08 #2


Atul. wrote:
Hello All,

I wanted to know what encoding should I use to open the files with
Devanagari characters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?
You cannot hurt your machine by giving that a try.

This is a general comment for all beginners. Before posting, open the
interactive interpreter (or IDLE) and try something(s). If the result
puzzles you, copy and paste into a post. Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.

Jul 25 '08 #3
Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.
>>import codecs
f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

after that I tried the read binary mode and tried reading the firt 32
bytes and this is what I got.
>>f = open("C:\Documents and Settings\\admin\\My Documents\\corpus\\dainaikAikya collected by sushant.txt","rb")
f.read(32)
'\xef\xbb\xbf\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa5\x80
\xe0\xa4\xa6\xe0\xa4\xbf\xe0\xa4\xb2\xe0\xa5\x8d
\xe0\xa4\xb2\xe0\xa5\x80,'

Now based on my knowledge of Unicode I think this is a utf-8 file (the
first 3 bytes \xef\xbb\xbf), please correct me if I am wrong. How do I
read this?

Atul.

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

On Jul 25, 6:21*am, Terry Reedy <tjre...@udel.eduwrote:
Atul. wrote:
Hello All,
I wanted to know what encoding should I use to open the files with
Devanagaricharacters. I was thinking of UTF-8 but was not sure, any
leads on this? Anyone used it earlier?

You cannot hurt your machine by giving that a try.

This is a general comment for all beginners. *Before posting, open the
interactive interpreter (or IDLE) and try something(s). *If the result
puzzles you, copy and paste into a post. *Or if more appropriate, open
the Python manuals and search a bit, or try a search engine.
Jul 28 '08 #4
Atul. wrote:
Hi Fredrik and Terry,

Well I got this on IDLE I think I have done something wrong.
>>>import codecs
f = open("C:\Documents and Settings\admin\My Documents\corpus\dainaikAikya collected by sushant.txt","r", "utf_8")

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = open("C:\Documents and Settings\admin\My Documents\corpus
\dainaikAikya collected by sushant.txt","r", "utf_8")
TypeError: an integer is required

PS: the above code I wrote using the information from the Library
Reference pdf section 4.8 "Codecs". Something wrong I am doing? Please
do let me know.

Only slightly. You're importing the codecs module
but you're not using it. So you're *actually* using
the built-in open function, which doesn't have an
encoding parameter. It does have a third param
which is to do with the buffer size.

Just change your code to use codecs.open ("...")
and, I suggest, either use raw strings for your
filename (r"c:\docume...") or use the other kind
of slash ("c:/documen..."). Otherwise you might
run into some problems.

TJG
Jul 28 '08 #5
Thanks, Tim that did work. I will proceed with my playing around now.

Thanks a ton.

Atul.
>
Only slightly. You're importing the codecs module
but you're not using it. So you're *actually* using
the built-in open function, which doesn't have an
encoding parameter. It does have a third param
which is to do with the buffer size.

Just change your code to use codecs.open ("...")
and, I suggest, either use raw strings for your
filename (r"c:\docume...") or use the other kind
of slash ("c:/documen..."). Otherwise you might
run into some problems.

TJG
Jul 28 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Klaus Alexander Seistrup | last post by:
Hi, After upgrading my Python interpreter to 2.3.1 I constantly get warnings like this: DeprecationWarning: Non-ASCII character '\xe6' in file mumble.py on line 2, but no encoding declared;...
2
by: gnv | last post by:
Hi all, I am writing a cross-browser(i.e. 6 and netscape 7.1) javascript program to save an XML file to local file system. I have an xml string like below: var xmlStr = "<?xml version="1.0"...
5
by: DbNetLink | last post by:
I am trying to convert some Japanese text encoded as Shift-JIS/ISO-2022-JP to UTF-8 so I can store all data in my database with a common encoding. My problem is the encoding conversion code works...
1
by: Antonio | last post by:
Hi Jon, I have to say thanks, you solved my problem, the only question is where did you see that it was ISO-8859-1, I can't discover this while I'm interested because I need to translate also to...
2
by: velle | last post by:
My headache is growing while playing arround with unicode in Python, please help this novice. I have chosen to divide my problem into a few questions. Python 2.3.4 (#1, Feb 2 2005, 12:11:53) ...
3
by: LiMBi | last post by:
Hi, Is there a way to encode "??????????? ??????????" to "ถามนิดนึงคะ ตรงที่เป็น" and vice versa. Thanks
5
by: =?Utf-8?B?TWFyaw==?= | last post by:
Hi... Have another thread going on in scripting.jscript trying to work around some deficiencies in the way IE and IIS interact. The nub of it is this: ASP.Net explicitly sets an output...
5
by: Andy Fish | last post by:
Hi, using HTML 4.01 (not xhtml), I have recently discovered that this: <script>var x='</script>';</script> is not valid HTML - the fact that there is an end script tag in quotes causes the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.