By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,404 Members | 1,074 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,404 IT Pros & Developers. It's quick & easy.

An Odd Little Script

P: n/a
Hello-

I have a task which -- dare I say -- would be easy in <asbestos_undies>
Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer gr************@novasyshealth.com
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Greg Lindstrom wrote:
I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line.


Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
--
Michael Hoffman
Jul 18 '05 #2

P: n/a
Greg Lindstrom wrote:
Hello-

I have a task which -- dare I say -- would be easy in <asbestos_undies> Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All but the first record, that is; it's always 107 bytes long. What I would like to do is strip out all linefeeds from the file, read the character in position 107 (the end of segment delimiter) and then replace all of the end of segment characters with linefeeds, making a file where each segment is on its own line. Currently, some vendors supply files with linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files can be quite large, so I'd rather not be making extra copies unless it's absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the right direction. How can I perform the above task while keeping my sanity?
Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer gr************@novasyshealth.com
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.


This should be fairly simple, but maybe not ;)
# get the end of segment character
# this is not optimal but should be a start
f = open('yourrecord', 'r')
eos = f.seek(107).read(1)
r = f.read()
f.close()
r = r.replace('\r', '')
r = r.replace('\n', '')
r = r.replace(eos, '\n')
f = open('yourrecord', 'w')
f.write(r)
f.close()

hth,
M.E.Farmer

Jul 18 '05 #3

P: n/a
Michael Hoffman wrote:
Greg Lindstrom wrote:
I have a file with varying length records. All but the first record,
that is; it's always 107 bytes long. What I would like to do is strip
out all linefeeds from the file, read the character in position 107
(the end of segment delimiter) and then replace all of the end of
segment characters with linefeeds, making a file where each segment is
on its own line.

Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107


N.B. this is a zero-based 107. If you are using one-based coordinates,
then this is actually position 108.
--
Michael Hoffman
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.