By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,855 Members | 1,988 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,855 IT Pros & Developers. It's quick & easy.

Parsing/Splitting Line

P: n/a
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.

Nov 21 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
ac*****@gmail.com wrote:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
1. Look for "slicing" or "slice" or "slices" in the Python tutorial.
2. Write some code.
3. Run it.

Nov 21 '06 #2

P: n/a
At Tuesday 21/11/2006 02:59, ac*****@gmail.com wrote:
>I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
>>line = "12340001 2 -3"
for j in range(0,len(line),4):
.... print line[j:j+4], int(line[j:j+4])
....
1234 1234
0001 1
2 2
-3 -3
>>>

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 21 '06 #3

P: n/a
On 2006-11-21, ac*****@gmail.com <ac*****@gmail.comwrote:
I have a text file and each line is a list of values. The
values are not delimited, but every four characters is a value.
How do I get python to split this kind of data? Thanks.
Check out _Text Processing in Python_, Chapter 2, "PROBLEM:
Column statistics for delimited or flat-record files".
URL:http://gnosis.cx/TPiP/

--
Neil Cerutti
Nov 21 '06 #4

P: n/a
El Martes, 21 de Noviembre de 2006 02:59, ac*****@gmail.com escribió:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
You can define a function very easy to make it. For example you can do:

# split-line in 'n' characters
import sys

def splitLine(line, n):
"""split line in 'n' characters"""
x = 0
y = 0
while line >= n:
y = x + n
if line[x:y] == '':
break
yield line[x:y]
x += n

if __name__ == '__main__':
# i get the line-split from the command line
# but you can get it from a file
for x in splitLine(sys.argv[1], int(sys.argv[2])):
print x

--
Kaufmann Manuel
Nov 21 '06 #5

P: n/a

Neil Cerutti wrote:
On 2006-11-21, ac*****@gmail.com <ac*****@gmail.comwrote:
I have a text file and each line is a list of values. The
values are not delimited, but every four characters is a value.
How do I get python to split this kind of data? Thanks.

Check out _Text Processing in Python_, Chapter 2, "PROBLEM:
Column statistics for delimited or flat-record files".
URL:http://gnosis.cx/TPiP/
Hmmmm ... the elementary notion "do line[start:end] in a loop" is well
buried, just behind this:

# Adjust offsets to Python zero-based indexing,
# and also add final position after the line
num_positions = len(self.column_positions)
offsets = [(pos-1) for pos in self.column_positions]
offsets.append(len(line))

Folk who are burdened with real-world flat files (example: several
hundred thousand lines each of 996 bytes wide) might want to consider
moving the set-up of "offsets" out of the once-per line splitter()
method to the __init__() method :-)

Cheers,
John

Nov 21 '06 #6

P: n/a
Manuel Kaufmann wrote:
El Martes, 21 de Noviembre de 2006 02:59, ac*****@gmail.com escribió:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.

You can define a function very easy to make it. For example you can do:

# split-line in 'n' characters
import sys

def splitLine(line, n):
"""split line in 'n' characters"""
x = 0
y = 0
while line >= n:
The intent appears to be that "line" refers to a str object, while "n"
refers to an int object. Comparison of such disparate objects is
guaranteed to produce a reproducible (but not necessarily meaningful)
result.

For example:

| >>'' 2
| True

You need to reconsider what you really want to have happen when there
is a trailing short slice. Possibilities are:

(a) silently ignore it -- what I guess your intent was, but the least
attractive IMO
(b) raise an exception -- overkill IMO
(c) just tack it on the end (which is what your code is currently doing
*accidentally*) -- and mention this in the docs and let the caller do
what they want with it.
y = x + n
if line[x:y] == '':
break
yield line[x:y]
x += n

if __name__ == '__main__':
# i get the line-split from the command line
# but you can get it from a file
for x in splitLine(sys.argv[1], int(sys.argv[2])):
print x
HTH,
John

Nov 21 '06 #7

P: n/a
ac*****@gmail.com wrote:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
>>splitline("helloiamsuperman")
['hell', 'oiam', 'supe', 'rman']
or if you care about remainders...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}|.+$' % size, line)
>>splitline("helloiamsupermansd")
['hell', 'oiam', 'supe', 'rman', 'sd']
noah
Nov 22 '06 #8

P: n/a
Noah Rawlins wrote:

I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
>>splitline("helloiamsuperman")
['hell', 'oiam', 'supe', 'rman']
there are laws against such use of regular expressions in certain
jurisdictions.

</F>

Nov 22 '06 #9

P: n/a
Fredrik Lundh schrieb:
Noah Rawlins wrote:

>I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
> >>splitline("helloiamsuperman")
['hell', 'oiam', 'supe', 'rman']

there are laws against such use of regular expressions in certain
jurisdictions.
.... and in particularly bad cases, you will be punished by Perl
not less than 5 years ...

Georg
Nov 22 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.