By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,996 Members | 1,498 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,996 IT Pros & Developers. It's quick & easy.

How to read space separated file in python?

P: n/a
Hi all,

I want to read file which is mapping file. Used in to map character from ttf
to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 *
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not properly
shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?

Regards,
Ginovation
Nov 21 '08 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On Fri, 21 Nov 2008 14:16:13 +0530, ganesh gajre wrote:
Hi all,

I want to read file which is mapping file. Used in to map character from
ttf to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 *
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not
properly shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?
Well, because you said please...

I assume the encoding of the second column is utf-8. You need something
like this:
# Untested.
column0 = []
column1 = []
for line in open('somefile', 'r'):
a, b = line.split()
column0.append(a)
column1.append(b.decode('utf-8'))

--
Steven
Nov 21 '08 #2

P: n/a
ganesh gajre wrote:
Hi all,

I want to read file which is mapping file. Used in to map character from
ttf to unicode.
eg

Map file contain data in the following way:

0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 *
8 ८
9 ९

Like this. Please use any unicode editor to view the text if it not
properly shown.

Now i want to read both the character separately like:

str[0]=0 and str2[0]=०

How can i do this?

please give me solution?
Read the file:
>>import codecs
pairs = [line.split() for line in codecs.open("ganesh.txt",
encoding="utf-8")]
>>pairs[0]
[u'0', u'\u0966']

Create the conversion dictionary:
>>trans = dict((ord(s), t) for s, t in pairs)
Do the translation:
>>print u"01109876".translate(trans)
०११०९८*६

You may have to use int(s) instead of ord(s) in your actual conversion code:
>>trans = dict((int(s), t) for s, t in pairs)
print u"\x00\x01\x09".translate(trans)
०१९

Peter
Nov 21 '08 #3

P: n/a
On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
a, b = line.split()
Note that in a case like this, you may want to consider using
partition instead of split:

a, sep, b = line.partition(' ')

This way, if there happens to be more than one space (for example,
because the Unicode character you're mapping to happens to be a
space), it'll still work. It also better encodes the intention, which
is to split only on the first space in the line, rather than on every
space.

(It so happens I ran into exactly this issue yesterday, though my
delimiter was a colon.)

Cheers,
- Joe

Nov 21 '08 #4

P: n/a
Joe Strout wrote:
On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
> a, b = line.split()

Note that in a case like this, you may want to consider using partition
instead of split:

a, sep, b = line.partition(' ')

This way, if there happens to be more than one space (for example,
because the Unicode character you're mapping to happens to be a space),
it'll still work. It also better encodes the intention, which is to
split only on the first space in the line, rather than on every space.

(It so happens I ran into exactly this issue yesterday, though my
delimiter was a colon.)
Joe:

In the special case of the None first argument (the default for the
str.split() method) runs of whitespace *are* treated as single
delimiters. So line.split() is not the same as line.split(' ').

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Nov 21 '08 #5

P: n/a
On Nov 21, 2008, at 9:00 AM, Steve Holden wrote:
Joe Strout wrote:
>On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
>> a, b = line.split()

Note that in a case like this, you may want to consider using
partition
instead of split:

a, sep, b = line.partition(' ')

This way, if there happens to be more than one space (for example,
because the Unicode character you're mapping to happens to be a
space),
it'll still work. It also better encodes the intention, which is to
split only on the first space in the line, rather than on every
space.
In the special case of the None first argument (the default for the
str.split() method) runs of whitespace *are* treated as single
delimiters. So line.split() is not the same as line.split(' ').
Right -- so using split() gives you the wrong answer for two different
reasons. Try these:
>>line = "1 x"
a, b = line.split() # b == "x", which is correct
>>line = "2 "
a, b = line.split() # correct answer would be b == " "
ValueError: need more than 1 value to unpack
>>line = "3 x and here is some extra stuff"
a, b = line.split() # correct answer would be b == "x and here
is some extra stuff"
ValueError: too many values to unpack

Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).

Cheers,
- Joe

Nov 21 '08 #6

P: n/a
En Fri, 21 Nov 2008 14:13:23 -0200, Joe Strout <jo*@strout.netescribi:
Right -- so using split() gives you the wrong answer for two different
reasons. Try these:
>>line = "1 x"
>>a, b = line.split() # b == "x", which is correct
>>line = "2 "
>>a, b = line.split() # correct answer would be b == " "
ValueError: need more than 1 value to unpack
>>line = "3 x and here is some extra stuff"
>>a, b = line.split() # correct answer would be b == "x and here is
some extra stuff"
ValueError: too many values to unpack

Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).
split takes an additional argument too:

pyline = "3 x and here is some extra stuff"
pya, b = line.split(None, 1)
pya
'3'
pyb
'x and here is some extra stuff'

But it still fails if the line contains no spaces. partition is more
robust in those cases

--
Gabriel Genellina

Nov 21 '08 #7

P: n/a
Joe Strout wrote:
[...]
Partition handles these cases correctly (at least, within the OP's
specification that the value of "b" should be whatever comes after the
first space).
I believe if you read the OP's post again you will see that he specified
two non-space items per line.

You really *love* being right, don't you? ;-) You say partition "...
better encodes the intention, which is to split only on the first space
in the line, rather than on every space". Your mind-reading abilities
are clearly superior to mine.

Anyway, sorry to have told you something you already knew. It's true
that partition has its place, and is too often overlooked. Particularly
by me.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Nov 21 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.