473,561 Members | 3,656 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python plain-text database or library that supports joins?

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Coo.../Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Coo.../Recipe/498130 and
http://furius.ca/pubcode/pub/conf/co...db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

Jun 22 '07 #1
5 3270
On Jun 22, 1:18 pm, felciano <felci...@gmail .comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
-http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/159974is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
-http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon
ramon,

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

Jun 22 '07 #2
>
i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.
These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

Jun 23 '07 #3
Not Python, but maybe relevant:
http://www.scriptaworks.com/cgi-bin/...NoSQL/HomePage

Alan Isaac
Jun 23 '07 #4
On Jun 22, 7:18 pm, felciano <felci...@gmail .comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Have you looked at itools?

http://www.ikaaro.org/itools#itools.csv

HTH,

Michele Simionato

Jun 23 '07 #5
On Friday 22 June 2007 09:18, felciano wrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Two pointers, but maybe not a complete solution:

http://search.cpan.org/dist/DBD-Sprite/
Perl library that uses CSV files and supports simple joins. Maybe a port of
this?

http://www.biostat.wisc.edu/~annis/c...ns/pseudb.html
Functional interface for CSV files inspired by Sprite, but does not support
joins. Possibly could be extended?

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ Â*ID 0xDB26D7CE

--
Posted via a free Usenet account from http://www.teranews.com

Jun 26 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
11142
by: ken | last post by:
I've been looking for a solution to a string to long conversion problem that I've run into >>> x = 'e10ea210' >>> print x e10ea210 >>> y=long(x) Traceback (most recent call last): File "<pyshell#2>", line 1, in ? y=long(x)
699
33512
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro capabilities, unfortunately. I'd like to know if it may be possible to add a powerful macro system to Python, while keeping its amazing syntax, and if it...
24
2493
by: huy | last post by:
Hi, Just wondering if there were any plans (or existing projects) similar to Java webstart for Python. I think this would be a boon for python. I would love to start my new project using Python but the deployment (and long term upgrading) factor is a major itch which pyfreeze and py2exe just doesn't scratch properly. Thanks
114
9722
by: Maurice LING | last post by:
This may be a dumb thing to ask, but besides the penalty for dynamic typing, is there any other real reasons that Python is slower than Java? maurice
4
2616
by: Edmond Rusjan | last post by:
Hi All, I'd like to use Python-2.3.4 on OSF1 V4.0, but have trouble installing. With a plain "./configure; make" build, I cannot import socket. If I uncomment the socketmodule in Modules/Setup, the build fails. Using gcc 3.4.2. Had no problem on OSF1 V5.1, Linux and Sun. Have not been able to find reference to a similar problem in...
267
10577
by: Xah Lee | last post by:
Python, Lambda, and Guido van Rossum Xah Lee, 2006-05-05 In this post, i'd like to deconstruct one of Guido's recent blog about lambda in Python. In Guido's blog written in 2006-02-10 at http://www.artima.com/weblogs/viewpost.jsp?thread=147358
3
5257
by: gisleyt | last post by:
I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.3/sre.py", line 179, in compile return _compile(pattern, flags) File "/usr/lib/python2.3/sre.py", line...
14
3010
by: nicolasg | last post by:
Hi folks, I have accomplished to make a python program that make some image manipulation to bmp files. I now want to provide this program as a web service. A user can visit a site and through a web interface he should upload the file to the web server , the server then will do the image process with the python program I have wrote and when it...
21
1898
by: Raj | last post by:
Hi, We just executed a project with Python using TG. The feedback was to use more python like programming rather than C style code executed in Python. The feedback is from a Python purist and for some reasons we cannot solicity his help. So we'd like to do is to scrub through the codebase and identify places where the codebase needs...
6
1773
by: Hussein B | last post by:
Hi. Apache Ant is the de facto building tool for Java (whether JSE, JEE and JME) application. With Ant you can do what ever you want: compile, generate docs, generate code, packing, deploy, connecting to remote servers and every thing. Do we have such a tool for Python projects? Thank you.
0
7647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8088
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7930
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6210
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5472
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5187
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3617
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3600
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2068
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.