473,788 Members | 2,867 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

text file parsing (awk -> python)

Hi list,

I have an awk program that parses a text file which I would like to
rewrite in python. The text file has multi-line records separated by
empty lines and each single-line field has two subfields:

node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1

and this I would like to parse into a list of dictionaries like so:

mydict[0] = { 'node':10, 'x':-1, 'y':1 }
mydict[1] = { 'node':11, 'x':-2, 'y':1 }
mydict[2] = { 'node':12, 'x':-3', 'y':1 }

But the names of the fields (node, x, y) keeps changing from file to
file, even their number is not fixed, sometimes it is (node, x, y, z).

What would be the simples way to do this?
Nov 22 '06 #1
3 6269
Daniel Nogradi wrote:
I have an awk program that parses a text file which I would like to
rewrite in python. The text file has multi-line records separated by
empty lines and each single-line field has two subfields:

node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1

and this I would like to parse into a list of dictionaries like so:

mydict[0] = { 'node':10, 'x':-1, 'y':1 }
mydict[1] = { 'node':11, 'x':-2, 'y':1 }
mydict[2] = { 'node':12, 'x':-3', 'y':1 }

But the names of the fields (node, x, y) keeps changing from file to
file, even their number is not fixed, sometimes it is (node, x, y, z).

What would be the simples way to do this?
data = """node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1
"""

def open(filename):
from cStringIO import StringIO
return StringIO(data)

converters = dict(
x=int,
y=int
)

def name_value(line ):
name, value = line.split(None , 1)
return name, converters.get( name, str.rstrip)(val ue)

if __name__ == "__main__":
from itertools import groupby
records = []

for empty, record in groupby(open("r ecords.txt"), key=str.isspace ):
if not empty:
records.append( dict(name_value (line) for line in record))

import pprint
pprint.pprint(r ecords)
Nov 22 '06 #2
I have an awk program that parses a text file which I would like to
rewrite in python. The text file has multi-line records separated by
empty lines and each single-line field has two subfields:

node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1

and this I would like to parse into a list of dictionaries like so:

mydict[0] = { 'node':10, 'x':-1, 'y':1 }
mydict[1] = { 'node':11, 'x':-2, 'y':1 }
mydict[2] = { 'node':12, 'x':-3', 'y':1 }

But the names of the fields (node, x, y) keeps changing from file to
file, even their number is not fixed, sometimes it is (node, x, y, z).

What would be the simples way to do this?

data = """node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1
"""

def open(filename):
from cStringIO import StringIO
return StringIO(data)

converters = dict(
x=int,
y=int
)

def name_value(line ):
name, value = line.split(None , 1)
return name, converters.get( name, str.rstrip)(val ue)

if __name__ == "__main__":
from itertools import groupby
records = []

for empty, record in groupby(open("r ecords.txt"), key=str.isspace ):
if not empty:
records.append( dict(name_value (line) for line in record))

import pprint
pprint.pprint(r ecords)

Thanks very much, that's exactly what I had in mind.

Thanks again,
Daniel
Nov 22 '06 #3
Peter Otten, your solution is very nice, it uses groupby splitting on
empty lines, so it doesn't need to read the whole files into memory.

But Daniel Nogradi says:
But the names of the fields (node, x, y) keeps changing from file to
file, even their number is not fixed, sometimes it is (node, x, y, z).
Your version with the converters dict fails to convert the number of
node, z fields, etc. (generally using such converters dict is an
elegant solution, it allows to define string, float, etc fields):
converters = dict(
x=int,
y=int
)

I have created a version with a RE, but it's probably too much rigid,
it doesn't handle files with the z field, etc:

data = """node 10
y 1
x -1

node 11
x -2
y 1
z 5

node 12
x -3
y 1
z 6"""

import re
unpack = re.compile(r"(\ D+) \s+ ([-+]? \d+) \s+" * 3, re.VERBOSE)

result = []
for obj in unpack.finditer (data):
block = obj.groups()
d = dict((block[i], int(block[i+1])) for i in xrange(0, 6, 2))
result.append(d )

print result
So I have just modified and simplified your quite nice solution (I have
removed the pprint, but it's the same):

def open(filename):
from cStringIO import StringIO
return StringIO(data)

from itertools import groupby

records = []
for empty, record in groupby(open("r ecords.txt"), key=str.isspace ):
if not empty:
pairs = ([k, int(v)] for k,v in map(str.split, record))
records.append( dict(pairs))

print records

Bye,
bearophile

Nov 22 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
19468
by: Ratnakar Pedagani | last post by:
Hi, I'm trying to parse the text file, which is of size more than 2mb. I'm using the following sample code Open "c:\sim1.txt" For Input As #1 Do While Not EOF(1) Input #1, Data If (InStr(Data, "Summary")) Then
2
8425
by: Dan Jacobson | last post by:
An old dog can't learn new tricks, so where's the a2py awk to python translator? Perl has a2p. E.g. today I wonder how to do '{print $1}', well with a2p I know how to do it in perl, but with python I am supposed to hunker down with the manuals.
2
11480
by: Russell Klopfer | last post by:
Hello. I would like to know how I can parse a plain-text file. All I want to do is be able to sequentially extract each word from a document. Similar to the StringTokenizer in Java. Is there a module for this? or an easy way to do it with regular expressions? Thanks!
8
7460
by: Imran | last post by:
hello all, I have to parse a text file and get some value in that. text file content is as follows. ####TEXT FILE CONTENT STARTS HERE ##### /start first 0x1234 AC /end
4
9758
by: Carsten Kraft | last post by:
Hello Newsgroup, I think this is easy for you: I want to save the data line by line into an string array. eg. Text file: Array Line 1 Line1
10
1877
by: ghazanfar | last post by:
hi, i have text file of the form atom_trace('emotion_response_level(a1, 1.56072)', emotion_response_level(a1, 1.56072), ). and atom_trace('goto(a1, a3)', goto(a1, a3), ).
3
4882
by: vinodmalraj | last post by:
Am new to perl language , would really help if some of you assist me how to use a regex say for example this is my log file 000046571|1000025|CUSTOMER|27-JUN-2007 06:27:59|005|DEFAULT 000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|013|DEFAULT 000046572|1000026|ACTIVATE|16-JUL-2007 12:33:13|018|MENU i want to take only the 6th field(DEFAULT) by using my following perl srcipt cut -d \| -f6 <filename> | sort |...
2
2591
by: python | last post by:
I'm parsing a text file for a proprietary product that has the following 2 directives: #include <somefile> #define <name<value> Defined constants are referenced via <#name#syntax. I'm looking for a single text stream that results from processing a file containing these directives. Even better would be an iterator(?) type
4
8267
watertraveller
by: watertraveller | last post by:
Hi. I'm new to batch files, and relatively new to the Windows command line in general. I'm making a batch file for the Windows XP command line. I want to examine, for each line of a text file, what the first few characters are. I want to count up how many lines started with one set of characters, how many started with another, etc. And I want to output that count into a different text file. I know how to read in the text and output it to...
0
9498
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9967
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6750
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4070
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3674
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.