473,394 Members | 2,063 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Python plain-text database or library that supports joins?

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Coo.../Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Coo.../Recipe/498130 and
http://furius.ca/pubcode/pub/conf/co...db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

Jun 22 '07 #1
5 3242
On Jun 22, 1:18 pm, felciano <felci...@gmail.comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon
ramon,

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

Jun 22 '07 #2
>
i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.
These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

Jun 23 '07 #3
Not Python, but maybe relevant:
http://www.scriptaworks.com/cgi-bin/...NoSQL/HomePage

Alan Isaac
Jun 23 '07 #4
On Jun 22, 7:18 pm, felciano <felci...@gmail.comwrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Have you looked at itools?

http://www.ikaaro.org/itools#itools.csv

HTH,

Michele Simionato

Jun 23 '07 #5
On Friday 22 June 2007 09:18, felciano wrote:
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.
Two pointers, but maybe not a complete solution:

http://search.cpan.org/dist/DBD-Sprite/
Perl library that uses CSV files and supports simple joins. Maybe a port of
this?

http://www.biostat.wisc.edu/~annis/c...ns/pseudb.html
Functional interface for CSV files inspired by Sprite, but does not support
joins. Possibly could be extended?

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ Â*ID 0xDB26D7CE

--
Posted via a free Usenet account from http://www.teranews.com

Jun 26 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: ken | last post by:
I've been looking for a solution to a string to long conversion problem that I've run into >>> x = 'e10ea210' >>> print x e10ea210 >>> y=long(x) Traceback (most recent call last): File...
699
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...
24
by: huy | last post by:
Hi, Just wondering if there were any plans (or existing projects) similar to Java webstart for Python. I think this would be a boon for python. I would love to start my new project using Python...
114
by: Maurice LING | last post by:
This may be a dumb thing to ask, but besides the penalty for dynamic typing, is there any other real reasons that Python is slower than Java? maurice
4
by: Edmond Rusjan | last post by:
Hi All, I'd like to use Python-2.3.4 on OSF1 V4.0, but have trouble installing. With a plain "./configure; make" build, I cannot import socket. If I uncomment the socketmodule in Modules/Setup,...
267
by: Xah Lee | last post by:
Python, Lambda, and Guido van Rossum Xah Lee, 2006-05-05 In this post, i'd like to deconstruct one of Guido's recent blog about lambda in Python. In Guido's blog written in 2006-02-10 at...
3
by: gisleyt | last post by:
I'm trying to compile a perfectly valid regex, but get the error message: r = re.compile(r'(*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') Traceback (most recent call last): File...
14
by: nicolasg | last post by:
Hi folks, I have accomplished to make a python program that make some image manipulation to bmp files. I now want to provide this program as a web service. A user can visit a site and through a...
21
by: Raj | last post by:
Hi, We just executed a project with Python using TG. The feedback was to use more python like programming rather than C style code executed in Python. The feedback is from a Python purist and...
6
by: Hussein B | last post by:
Hi. Apache Ant is the de facto building tool for Java (whether JSE, JEE and JME) application. With Ant you can do what ever you want: compile, generate docs, generate code, packing, deploy,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.