472,122 Members | 1,509 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,122 software developers and data experts.

Python plain-text database or library that supports joins?

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Coo.../Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Coo.../Recipe/498130 and
http://furius.ca/pubcode/pub/conf/co...db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

Jun 22 '07 #1
5 3120
On Jun 22, 1:18 pm, felciano <felci...@gmail.comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon
ramon,

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

Jun 22 '07 #2
>
i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.
These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

Jun 23 '07 #3
Not Python, but maybe relevant:
http://www.scriptaworks.com/cgi-bin/...NoSQL/HomePage

Alan Isaac
Jun 23 '07 #4
On Jun 22, 7:18 pm, felciano <felci...@gmail.comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Have you looked at itools?

http://www.ikaaro.org/itools#itools.csv

HTH,

Michele Simionato

Jun 23 '07 #5
On Friday 22 June 2007 09:18, felciano wrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Two pointers, but maybe not a complete solution:

http://search.cpan.org/dist/DBD-Sprite/
Perl library that uses CSV files and supports simple joins. Maybe a port of
this?

http://www.biostat.wisc.edu/~annis/c...ns/pseudb.html
Functional interface for CSV files inspired by Sprite, but does not support
joins. Possibly could be extended?

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ Â*ID 0xDB26D7CE

--
Posted via a free Usenet account from http://www.teranews.com

Jun 26 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by ken | last post: by
699 posts views Thread by mike420 | last post: by
24 posts views Thread by huy | last post: by
114 posts views Thread by Maurice LING | last post: by
4 posts views Thread by Edmond Rusjan | last post: by
267 posts views Thread by Xah Lee | last post: by
3 posts views Thread by gisleyt | last post: by
14 posts views Thread by nicolasg | last post: by
21 posts views Thread by Raj | last post: by
6 posts views Thread by Hussein B | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.