473,748 Members | 6,664 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Design mini-lanugage for data input

This is an entry I just added to ASPN. It is a somewhat novel technique I
have employed quite successfully in my code. I repost it here for more
explosure and discussions.

http://aspn.activestate.com/ASPN/Coo.../Recipe/475158

wy
------------------------------------------------------------------------
Title: Design mini-lanugage for data input
Description:

Many programs need a set of initial data. For ease of use and flexibility,
design a mini-language for your input data. Use Python's superb text
handling capability to parse and build the data structure from the input
text.

Source: Text Source
# this is an example to demonstrate the programming technique

DATA = """
# data souce: http://www.mongabay.com/igapo/world_...ics_by_pop.htm
# Country / Captial / Area [sq. km] / 2002 Population Estimate
China / Beijing / 9,596,960 / 1,284,303,705
India / New Delhi / 3,287,590 / 1,045,845,226
United States / Washington DC / 9,629,091 / 280,562,489
Indonesia / Jakarta / 1,919,440 / 231,328,092
Russia / Moscow / 17,075,200 / 144,978,573
"""

def initData():
""" parse and return a country list of (name, captial, area,
population) """

countries = []
for line in DATA.splitlines ():

# filter out blank lines/comment lines
line = line.strip()
if not line or line.startswith ('#'):
continue

# 4 fields separated by '/'
parts = map(string.stri p, line.split('/'))
country, captial, area, population = parts

# remove commas in numbers
area = int(area.replac e(',',''))
population = int(population. replace(',','') )

countries.appen d((country, captial, area, population))

return countries
def findLargestCoun try(countries):
# your algorithm here
def main():
countries = initData()
print findLargestCoun try(countries)
Discussion:

Problem
-------

Many programs need a set of initial data. The simplest way is to construct
Python data structure directly as shown below. This is often not ideal.
Algorithm and data structure tend to change. Python program statements is
likely differ literally from its data source, which might be text pulled
from web pages or other place. This means a great deal of effort is often
needed to format and maintain the input as Python statements.

This is a sample program that initialize some geographical data.

# map of country -> (captial, area, population)
COUNTRIES = {}
COUNTRIES['China'] = ('Beijing', 9596960, 1284303705)
COUNTRIES['India'] = ('New Delhi', 3287590, 1045845226)
COUNTRIES['United States'] = ('Washington DC', 9629091, 280562489)
COUNTRIES['Indonesia'] = ('Jakarta', 1919440, 231328092)
COUNTRIES['Russia'] = ('Moscow', 17075200, 144978573)
Mini-language
-------------

A more flexible approach is to define a mini-lanugage to describe the
data. This can be as simple as formatting data into a multiple-line string.

1. Define the data format in text. It should mirror the data source and
designed for ease for human editing.

2. Define the data structure.

3. Write glue code to parse the input data and initialize the data
structure.

In the example above we use one line for each record. Each record has four
fields, Country, captial, area and population, separated by slashes. One
of the immediate benefit is that we no longer need to type so many quotes
for every string literal. This concise data format is much easiler to read
and edit than Python statements.

The parser simply break down the input text using splitlines() and then
loop through them line by line. It is useful to account for some extra
white space so that it is more flexible for human editor. In this case the
numbers (area, population) from the data source contains commas. Rather
than manually edit them out, they are copied as is into the text as is.
Then they are parsed into integer using

area = int(area.replac e(',',''))

Slash is chosen as the separator (rather than the more common comma)
because it does not otherwise appear in the data. A record is parsed into
field using

line.split('/')

Don't forget to remove extra white space using string.strip()

Finally it built a data structure of list of country record as tuple of
(country, captial, area, population). It is just as easy to turn them into
objects or any other data structure as desired.

The mini-language technique can be refined to represent more complex, more
structured input. It makes transformation and maintenance of input data
much easier.
Mar 21 '06 #1
3 2163
Hmm,
Do you know about JSON and YAML?
http://en.wikipedia.org/wiki/JSON
http://en.wikipedia.org/wiki/YAML

They have the advantage of being maintained by a group of people and
being available for a number of languages. (as well as NOT being XML
:-)

- Cheers, Paddy.
--
http://paddy3118.blogspot.com/

Mar 21 '06 #2
Yes. But they have different motivations.

The mini-language concept is to design an input format that is convenient
for human editor and that is close to the semi-structured data source. I
think the benefit from ease of editing and flexibility would justify
writing a little parsing code.

JSON is mainly designed for data exchange between programs. You can hand
edit JSON data (as well as XML or Python statement) but it is not the most
convenient.

Just consider you don't have to enter two quotes for every string object
is almost liberating. These quotes are only artifacts for structured data
format. The idea to design a format convenient for human and let code to
parse and built the data structure.

wy
Hmm,
Do you know about JSON and YAML?
http://en.wikipedia.org/wiki/JSON
http://en.wikipedia.org/wiki/YAML

They have the advantage of being maintained by a group of people and
being available for a number of languages. (as well as NOT being XML
:-)

- Cheers, Paddy.
--
http://paddy3118.blogspot.com/


Mar 21 '06 #3
P.S. Also it is a 'mini-language' because it is an ad-hoc design that is
good enough and can be easily implemented for a given problem. This is
oppose to a general purpose solution like XML that is one translation from
the original data format and carries too much baggages.
Just consider you don't have to enter two quotes for every string object
is almost liberating. These quotes are only artifacts for structured
data format. The idea to design a format convenient for human and let
code to parse and built the data structure.

wy

Mar 21 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
2735
by: Emmanuel Charruau | last post by:
Hi, I am looking for a class or any information which would allow me to make communicate mini-module in c++. I have been looking on the net for some examples of such implementation, but I did not find any good one. This is why I am asking here if some of you know where to find some information on the internet about such implementation.
21
1738
by: Litron | last post by:
Appologies, this isn't really a javascript specific question.... Just wondering what the current size standard is for web-page design... it used to be 800 x 600 pxls (which seems quite small these days). Any suggestions. Thanx in advance, Litron
1
1524
by: Jeff S | last post by:
Hello all, I'm trying to design a schema from which I can generate a typed dataset class. I'm having problems incorporating choice and enumerations in the schema and getting the xml results that I want. In the following example I want the TopElement to contain either a Car element *or* a Truck element, but not both. So I tried to use a Choice tag to accomplish this. However, The Car element is of type CarType which is a simpleType...
0
1255
by: Tim Smith | last post by:
Hi, I have been considering how to personalize mini functional applications and I was wondering if there is an easy way to do the following for an client application design a) User runs myapp.exe b) User login c) Personalized toolbar appear d) Each icon on the toolbar is a separate application which inherits
0
1615
by: tbatwork828 | last post by:
VS 2005. I have compiled my dlls/exes in Release mode and also setting Debug Info="full"under Project - Properties - Build - select "Release" under "Configuration" drop down - the Advanced button at the bottom. When I compile, I get all pdbs for each dll/exe. We are running Win XP. At the client, my app then crashes, and a dump is created on the user machine where the app runs, but it is completely empty. Of course, I would like to get...
5
1611
by: ganeshokade | last post by:
Dear Experts, I have to write a C# program with the following requirements. I have to make two components (call C1 and C2) both of which can be included by an end user into his project. I have not decided what they have to be - an ActiveX component or a class library. Each of these objects has a set of properties. End users will create forms that contain these components. Now there is a requirement that the value of one of the properties...
23
2381
by: JoeC | last post by:
I am a self taught programmer and I have figured out most syntax but desigining my programs is a challenge. I realize that there are many ways to design a program but what are some good rules to follow for creating a program? I am writing a map game program. I created several objects: board object that is an array of integers each number 0-5 is a kind of terrain, a terrain object that is an array of terrain types and each number of...
5
2931
by: =?Utf-8?B?R3VpbmVhcGln?= | last post by:
Hi, I just wrote a mini C# lab for myself, I think it may be useful for others, so I shared it on my blog. If you often need to write only serveral lines of code and don't want to waste time on waiting for Visual Studio startup, maybe you will be interested in it. http://blog.darkthread.net/blogs/darkthreadtw/archive/2008/05/27/mini-csharp-lab.aspx
8
5625
by: kcroyals1 | last post by:
Is anyone having the problem where Visual Studio 2008 hangs for minutes when switching to Design view of an aspx page? I know there's a hotfix, but it has been no help. My co-worker might have hit on something that it is related to Office 2007. This is because we have VS2008 at home with no problems, but we use Office 2003. A co-worker of ours has the same setup (Office 2003 and VS2008) and has no problems. I know Microsoft knows it's...
0
8987
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9534
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9316
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8239
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6793
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3303
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2777
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2211
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.