By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,002 Members | 1,020 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,002 IT Pros & Developers. It's quick & easy.

Extract data from ASCII file

P: n/a
Ren
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?
Jul 18 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
With Python 2.3:
def splitter( line ): .... line = line[9:] # skip prefix
.... while line:
.... prefix, line = line[:4],line[4:]
.... yield prefix[2:]+prefix[:2]
.... for number in splitter( ':10000000E7280530AC00A530AD00AD0B0528AC0BE2'): .... print number
....
28E7
3005
00AC
30A5
00AD
0BAD
2805
0BAC
E2

If you want to convert the hexadecimal strings to actual integers, use
int( prefix, 16 ).

HTH,
Mike

Ren wrote:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #2

P: n/a
Ren wrote:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2


Say the file is called data.txt
Try this:
---------------------------------
def process(line):
line=line[9:]
result=[]
for i in range(0,32,4):
result.append( line[i+2:i+4] + line[i:i+2] )
return result

for line in open("data.txt"):
print process(line)
---------------------------------
For your single example data line, it prints
['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC']

It's a list containing the 8 extracted hexadecimal strings.
Instead of printing the list you can do whatever you want with it.
If you need more info, just ask.

--Irmen de Jong
Jul 18 '05 #3

P: n/a
Ren,
If you go here:

http://www.python.org/doc/current/tu...00000000000000

about half way down the page it talks about string slicing.

wes

Ren wrote:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?


Jul 18 '05 #4

P: n/a
rl*******@sbec.com (Ren) wrote in message news:<36*************************@posting.google.c om>...
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?


The first response only works with python-2.3 (yield is a newly
reserved word).

The second response did not work for me and left off the last couple
values.

You might want to try this. It iterates down the list, grabbing two
characters at a time, reversing them and appending them to a list. It
also allows a second list argument to store the first 8 digits
(mutable lists are passed by reference)

-------------------------------------------------------
from types import *

def process(line,key):
""" Pass in a string type (line) and
an empty list to store the key """
if type(key) is ListType and key == []:
key.append(line[1:8])
else:
print "Key not ListType or not empty"
result=[]
line=line[9:]
while line:
k2,k1 = line[:2],line[2:4]
line=line[4:]
result.append(k1+k2)
return result
-------------------------------------------------------
Jul 18 '05 #5

P: n/a
Ren <rl*******@sbec.com> wrote:
Suppose I have a file containing several lines similar to this:

:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.

I can do this in C++ and Pascal, but it seems like Python may be more
suited for the task.

How is this accomplished using Python?


1. Use FIXEDWIDTH in Awk.

2. Use string slice in Python.

3. Use variable operation in (Bash) shell.

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution for data management and processing.
Jul 18 '05 #6

P: n/a
How is this accomplished using Python?


Check the struct documentation.

- Josiah
Jul 18 '05 #7

P: n/a
el*******@bah.com (eleyg) wrote:
:10000000E7280530AC00A530AD00AD0B0528AC0BE2

The data I want to extract are 8 hexadecimal strings, the first of
which is E728, like this:

:10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2

Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0530 needs to be 3005 and so on.


The first response only works with python-2.3 (yield is a newly
reserved word).

The second response did not work for me and left off the last couple
values.


The third response uses typechecking and stores a value in an
unreachable place ...

Maybe the feachur-less code is better (tested very lightly):

def asBytes(line,offset):
""" split a line into 2-char chunks, starting at offset'"""
res = []
for i in range(offset,len(line),2):
res.append(line[i:i+2])
return res

def asWords(line,offset=0,swapbytes=0):
"""split a line into words that have maximally 4 chars,
starting at offset, optionally swapping 2-char chunks"""
res = []
flip = 0
for b in asBytes(line,offset):
if flip:
if swapbytes:
res.append(b+prev)
else:
res.append(prev+b)
else:
prev = b
flip = 1-flip
if flip:
res.append(b)
return res

def test():
line =":10000000E7280530AC00A530AD00AD0B0528AC0BE2"
print asWords(line,offset=9,swapbytes=1)

if __name__=='__main__':
test()

output is:

['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC', 'E2']

Anton
Jul 18 '05 #8

P: n/a
Ren
What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.
"Mike C. Fletcher" <mc******@rogers.com> wrote in message news:<ma**************************************@pyt hon.org>...
With Python 2.3:
>>> def splitter( line ): ... line = line[9:] # skip prefix
... while line:
... prefix, line = line[:4],line[4:]
... yield prefix[2:]+prefix[:2]
... >>> for number in splitter( ':10000000E7280530AC00A530AD00AD0B0528AC0BE2'):
... print number
...

............snip............... _______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #9

P: n/a
Ren wrote:
What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.


Umm... it's just a variable name :-)

--Irmen
Jul 18 '05 #10

P: n/a
Ren wrote:
What is 'prefix' used for? I searched the docs and didn't come up with
anything that seemed appropriated.

It's just the name (variable) I used to store the "prefix" of the rest
of the line. It could just as easily have been called "vlad", but using
simple, descriptive names for variables makes the code easier to read
(in most cases, this being the obvious counter-example). In Python when
you assign to something:

x, y = v, t

you are creating a (possibly new) bound name (if something of the same
name exists in a higher namespace it is shadowed by this bound name, so
even if there was a built-in function called "prefix" my assignment to
the name would have shadowed the name).

This line here says:

prefix, line = line[:4],line[4:]

that is, assign the name "prefix" to the result of slicing the line from
the starting index to index 4, and assign the name "line" to the result
of slicing from index 4 to the ending index. Under the covers the
right-hand-side of the expression is creating a two-element tuple, then
that tuple is unpacked to assign it's elements to the two variables on
the left-hand-side.

Python is a fairly small language, if a linguistic construct works a
particular way in one context it *normally* works that way in every
context (unless the programmer explicitly changes that (and that's
generally *only* done by meta-programmers seeking to create
domain-specific functionality, and even then as a matter of style, it's
kept to a minimum to avoid confusing people (and in this particular
case, AFAIK there's no way to override variable assignment (though (evil
;) ) people have proposed adding such a hook on numerous occasions)))).

The later line is simply manipulating the (string) object now referred
to as "prefix":

result.append( prefix[2:]+prefix[:2] )

that is, take the result of slicing from index 2 to the end and add it
to the result of slicing from the start to index 2. This has the effect
of reversing the order of the 2-byte hexadecimal encodings of "characters".

Oh, and since someone took issue with my use of (new in Python 2.2)
yield (luddites :) ;) ), here's a non-generator version using the same
basic code pattern:
def splitter( line ): .... line = line[9:] # skip prefix
.... result = []
.... while line:
.... prefix, line = line[:4],line[4:]
.... result.append( prefix[2:]+prefix[:2] )
.... return result
.... splitter( ':10000000E7280530AC00A530AD00AD0B0528AC0BE2')

['28E7', '3005', '00AC', '30A5', '00AD', '0BAD', '2805', '0BAC', 'E2']

Have fun :) ,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Jul 18 '05 #11

P: n/a
"Mike C. Fletcher" <mc******@rogers.com> wrote:
Oh, and since someone took issue with my use of (new in Python 2.2)
yield (luddites :) ;) ), here's a non-generator version using the same
basic code pattern:
def splitter( line ):

... line = line[9:] # skip prefix
... result = []
... while line:
... prefix, line = line[:4],line[4:]
... result.append( prefix[2:]+prefix[:2] )
... return result


The basic problem with this code pattern is that it makes a lot of
large slices of the line. With a small line there is no problem but it
looks like it doesn't scale well.

After reconsidering all alternatives I finally favor a variant of
Irmen's code, but without slicing the whole line and -after all-
definitely *using* yield because it seems appropriate here.

def process(line,offset):
for i in xrange(offset,len(line),4):
yield line[i+2:i+4] + line[i:i+2]

def test():
line = ":10000000E7280530AC00A530AD00AD0B0528AC0BE2"
print '\n'.join(process(line,9))

if __name__=='__main__':
test()

output is:

28E7
3005
00AC
30A5
00AD
0BAD
2805
0BAC
E2

Anton
Jul 18 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.