473,385 Members | 2,210 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Parsing ascii file

Hello ,

I have a file that contains the following data (example) and does NOT have
any line feeds:

11 22 33 44 55 66 77 88 99 00 aa bb cc
dd ....to 128th byte 11 22 33 44 55 66 77 88 99
00 aa bb cc dd .... and so on

record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
finishes at 256 and so on. there can be as many as 5000 record per file. I
would like to parse the file and retreive the value at field at byte 64-65
and conduct an arithmetical operation on the field (sum them all up).

Can I do this with python?

if I was to use awk it would look something like this :

cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
{print SUM}'
Regards
Dean
Jul 18 '05 #1
2 2050
diablo wrote:
Hello ,

I have a file that contains the following data (example) and does NOT have
any line feeds:

11 22 33 44 55 66 77 88 99 00 aa bb cc
dd ....to 128th byte 11 22 33 44 55 66 77 88
99
00 aa bb cc dd .... and so on

record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
finishes at 256 and so on. there can be as many as 5000 record per file. I
would like to parse the file and retreive the value at field at byte 64-65
and conduct an arithmetical operation on the field (sum them all up).

Can I do this with python?

if I was to use awk it would look something like this :

cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
{print SUM}'


Is it an ascii or a binary file? I'm not entire sure from your description.
In the following I assume binary data, but it should be easy to modify the
value() function if those two bytes are ascii digits.

import struct, sys
from itertools import imap

def fold(instream, width=80):
while 1:
line = instream.read(width)
if not line: break
yield line

def value(line, start=64): # may be an "off by one" bug
# return int(line[start:start+2]))
return struct.unpack("h", line[start:start+2])[0]

if __name__ == "__main__":
try:
filename = sys.argv[1]
except IndexError:
instream = sys.stdin
else:
instream = file(filename)

print sum(imap(value, fold(instream, 128)))

Peter

Jul 18 '05 #2
"diablo" <dl******@btinternet.com> writes:
Hello , I have a file that contains the following data (example) and does NOT have
any line feeds: 11 22 33 44 55 66 77 88 99 00 aa bb cc
dd ....to 128th byte 11 22 33 44 55 66 77 88 99
00 aa bb cc dd .... and so on record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
finishes at 256 and so on. there can be as many as 5000 record per file. I
would like to parse the file and retreive the value at field at byte 64-65
and conduct an arithmetical operation on the field (sum them all up). Can I do this with python? if I was to use awk it would look something like this : cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
{print SUM}'


You can use stdin.read(128) to get consecutive records and slicing to extract
the fields. Something like:

from sys import stdin
sum = 0
while True:
record = stdin.read(128)
if not record: break
sum += int(record[64:65])
print sum

Frankly, I'd stick with the Awk version unless it's a pedagogical exercise.
Actually I'd go further and have a script that simplys sums up all the numbers
in the input and add 'cut' into the pipeline to extract the columns first.

Eddie
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Odd-R. | last post by:
I use xml.dom.minidom to parse some xml, but when input contains some specific caracters(æ, ø and å), I get an UnicodeEncodeError, like this: UnicodeEncodeError: 'ascii' codec can't encode...
9
by: Hemang Shah | last post by:
Hello fellow Coders! ok, I"m trying to write a very simple application in C#. (Yes its my first program) What I want to do is : 1) Open a binary file 2) Search this file for a particular...
1
by: Thomas Kowalski | last post by:
Hi, I have to parse a plain, ascii text file (on local HD). Since the file might be many millions lines long I want to improve the efficiency of my parsing process. The resulting data structure...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
4
by: R Wood | last post by:
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a...
8
by: lokeshrajoria | last post by:
Hello Friends, i am putting some problem here. can anybody know where i am wrong in this code. actully i am spliting string in this format as a sample.s37 file.here some format...
31
by: broli | last post by:
I need to parse a file which has about 2000 lines and I'm getting told that reading the file in ascii would be a slower way to do it and so i need to resort to binary by reading it in large...
8
by: lawrence k | last post by:
I have to parse some FTP logs, which are full of several thousand lines like this: Thu Sep 4 11:39:04 2008 FTP command: Client "74.231.146.2", "TYPE A" Thu Sep 4 11:39:04 2008 FTP...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.