Python plain-text database or library that supports joins?

felciano

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Coo.../Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Coo.../Recipe/498130 and
http://furius.ca/pubcode/pub/conf/co...db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

Jun 22 '07 #1

Subscribe Post Reply

3242

askel

On Jun 22, 1:18 pm, felciano <felci...@gmail.comwrote:

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

ramon,

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

Jun 22 '07 #2

felciano

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

Jun 23 '07 #3

Alan Isaac

Not Python, but maybe relevant:
http://www.scriptaworks.com/cgi-bin/...NoSQL/HomePage

Alan Isaac

Jun 23 '07 #4

Michele Simionato

On Jun 22, 7:18 pm, felciano <felci...@gmail.comwrote:

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Have you looked at itools?

http://www.ikaaro.org/itools#itools.csv

HTH,

Michele Simionato

Jun 23 '07 #5

Joshua J. Kugler

On Friday 22 June 2007 09:18, felciano wrote:

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Two pointers, but maybe not a complete solution:

http://search.cpan.org/dist/DBD-Sprite/
Perl library that uses CSV files and supports simple joins. Maybe a port of
this?

http://www.biostat.wisc.edu/~annis/c...ns/pseudb.html
Functional interface for CSV files inspired by Sprite, but does not support
joins. Possibly could be extended?

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ Â*ID 0xDB26D7CE

--
Posted via a free Usenet account from http://www.teranews.com

Jun 26 '07 #6

Similar topics

python 2.2 string conversion ?

by: ken | last post by:

I've been looking for a solution to a string to long conversion problem that I've run into >>> x = 'e10ea210' >>> print x e10ea210 >>> y=long(x) Traceback (most recent call last): File...

Python

699

Python syntax in Lisp and Scheme

by: mike420 | last post by:

I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...

Python

Python Webstart ?

by: huy | last post by:

Hi, Just wondering if there were any plans (or existing projects) similar to Java webstart for Python. I think this would be a boon for python. I would love to start my new project using Python...

Python

114

why python is slower than java?

by: Maurice LING | last post by:

This may be a dumb thing to ask, but besides the penalty for dynamic typing, is there any other real reasons that Python is slower than Java? maurice

Python

Python-2.3.4 on OSF1 V4.0?

by: Edmond Rusjan | last post by:

Hi All, I'd like to use Python-2.3.4 on OSF1 V4.0, but have trouble installing. With a plain "./configure; make" build, I cannot import socket. If I uncomment the socketmodule in Modules/Setup,...

Python

267

A critic of Guido's blog on Python's lambda

by: Xah Lee | last post by:

Python, Lambda, and Guido van Rossum Xah Lee, 2006-05-05 In this post, i'd like to deconstruct one of Guido's recent blog about lambda in Python. In Guido's blog written in 2006-02-10 at...

Python

regex in python

by: gisleyt | last post by:

I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File...

Python

Python web service ...

by: nicolasg | last post by:

Hi folks, I have accomplished to make a python program that make some image manipulation to bmp files. I now want to provide this program as a web service. A user can visit a site and through a...

Python

How to write Smart Python programs?

by: Raj | last post by:

Hi, We just executed a project with Python using TG. The feedback was to use more python like programming rather than C style code executed in Python. The feedback is from a Python purist and...

Python

Build tool for Python

by: Hussein B | last post by:

Hi. Apache Ant is the de facto building tool for Java (whether JSE, JEE and JME) application. With Ant you can do what ever you want: compile, generate docs, generate code, packing, deploy,...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General