473,326 Members | 2,114 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

(in memory) database

Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough. But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.

Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.

Cheers,
Mark
Aug 31 '08 #1
14 1854
mark wrote:
I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).
you could probably do all that with data stream processing, but if you
haven't worked with such algorithms, just stuffing it all in a database
is probably less work for you (if not for your CPU).
Unfortunately I have only some knowledge of SQLite which is not an
option here.
why is sqlite not an option? it's is bundled with Python these days,
and should be available (or trivial to install) on all major deployment
platforms.

</F>

Aug 31 '08 #2
In article <ma*************************************@python.or g>,
Fredrik Lundh <fr*****@pythonware.comwrote:
>mark wrote:
Aug 31 '08 #3
On 31 Aug, 16:45, cla...@lairds.us (Cameron Laird) wrote:
Yes and no. My own experience with Debian packages is that with a
standard
apt-get install python2.5
an attempt to
import sqlite3
results in
ImportError: No module named _sqlite3
That's strange from the perspective of the Debian package information:

http://packages.debian.org/etch/python2.5
http://packages.debian.org/lenny/python2.5

Both have libsqlite3-0 as a dependency. On my Ubuntu system, the same
dependency applies.
that is, <URL:https://bugzilla.novell.com/show_bug.cgi?id=228733>.
I'm not sure Novell can help with the matter, though. ;-)
I recognize the error was resolved nearly two years ago,
but I, for one, don't understand how to express the resolution in
terms of Debian packages. Is there a way to install Python and have
it manage SQLite3 correctly withOUT configuring recent sources "by
hand"?
Which Debian version and which package repository? I imagine that
there may have been backports of Python 2.5 to Debian 3.1 (Sarge) and
earlier, but my own experience with sqlite prior to running Python 2.5
on Ubuntu involved use of the pysqlite2 module with Python 2.4
instead. Since Python 2.5 became the default on Ubuntu, I don't recall
having any problems with sqlite.

Paul
Aug 31 '08 #4
In article <b6**********************************@26g2000hsk.g ooglegroups.com>,
Paul Boddie <pa**@boddie.org.ukwrote:
>On 31 Aug, 16:45, cla...@lairds.us (Cameron Laird) wrote:
>Yes and no. My own experience with Debian packages is that with a
standard
apt-get install python2.5
an attempt to
import sqlite3
results in
ImportError: No module named _sqlite3

That's strange from the perspective of the Debian package information:

http://packages.debian.org/etch/python2.5
http://packages.debian.org/lenny/python2.5

Both have libsqlite3-0 as a dependency. On my Ubuntu system, the same
dependency applies.
>that is, <URL:https://bugzilla.novell.com/show_bug.cgi?id=228733>.

I'm not sure Novell can help with the matter, though. ;-)
>I recognize the error was resolved nearly two years ago,
but I, for one, don't understand how to express the resolution in
terms of Debian packages. Is there a way to install Python and have
it manage SQLite3 correctly withOUT configuring recent sources "by
hand"?

Which Debian version and which package repository? I imagine that
there may have been backports of Python 2.5 to Debian 3.1 (Sarge) and
earlier, but my own experience with sqlite prior to running Python 2.5
on Ubuntu involved use of the pysqlite2 module with Python 2.4
instead. Since Python 2.5 became the default on Ubuntu, I don't recall
having any problems with sqlite.

Paul
Thanks for pursuing this, Paul. You have me curious now.

Let's take a definite example: I have a convenient
Ubuntu 8.04.1
The content of /etc/apt/sources.list is
deb http://us.archive.ubuntu.com/ubuntu hardy main restricted
deb http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted
deb http://us.archive.ubuntu.com/ubuntu hardy universe multiverse
deb http://security.ubuntu.com/ubuntu hardy-security main restricted
I do
apt-get update
apt-get upgrade
apt-get install python2.5
then
# python2.5
Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import sqlite3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in <module>
from dbapi2 import *
File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named _sqlite3

How do you interpret this?
Aug 31 '08 #5
On 31 Aug, 20:05, cla...@lairds.us (Cameron Laird) wrote:
>
Let's take a definite example: I have a convenient
Ubuntu 8.04.1
The content of /etc/apt/sources.list is
debhttp://us.archive.ubuntu.com/ubuntuhardy main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy-updates main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy universe multiverse
debhttp://security.ubuntu.com/ubuntuhardy-security main restricted
I do
apt-get update
apt-get upgrade
apt-get install python2.5
then
# python2.5
Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import sqlite3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in <module>
from dbapi2 import *
File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named _sqlite3

How do you interpret this?
What do you get if you run this command...?

dpkg -s python2.5

For me, I get something which mentions the following:

Package: python2.5

[...]

Depends: python2.5-minimal (= 2.5.1-0ubuntu1.2), mime-support,
libbz2-1.0, libc6 (>= 2.5-0ubuntu1), libdb4.4,
libncursesw5 (>= 5.4-5), libreadline5 (>= 5.2),
libsqlite3-0 (>= 3.3.13), libssl0.9.8 (>= 0.9.8c-1)

Note the presence of the libsqlite3-0 package. In addition, you should
have the sqlite3 extension module somewhere:

locate sqlite3.so

This should tell you where the sqlite libraries are as well as where
the extension module is. For me, I get something which includes the
following:

/usr/lib/python2.5/lib-dynload/_sqlite3.so
/usr/lib/libsqlite3.so.0

Passing one of these to "dpkg -S" should say which package provided
it.

The strange thing is that the Ubuntu package information for your
version does mention the sqlite dependency and include the extension
module in the list of files:

http://packages.ubuntu.com/hardy/python2.5

You can run the following command to see whether your python2.5
package really provides the extension module:

dpkg --listfiles python2.5

Even if the sqlite library is installed, if that package doesn't
provide the extension module, something must be wrong with it because
it should be there.

Paul
Aug 31 '08 #6
In article <ee**********************************@z72g2000hsb. googlegroups.com>,
Paul Boddie <pa**@boddie.org.ukwrote:
>On 31 Aug, 20:05, cla...@lairds.us (Cameron Laird) wrote:
>>
Let's take a definite example: I have a convenient
Ubuntu 8.04.1
The content of /etc/apt/sources.list is
debhttp://us.archive.ubuntu.com/ubuntuhardy main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy-updates main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy universe multiverse
debhttp://security.ubuntu.com/ubuntuhardy-security main restricted
I do
apt-get update
apt-get upgrade
apt-get install python2.5
then
# python2.5
Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
> >>import sqlite3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in
<module>
> from dbapi2 import *
File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named _sqlite3

How do you interpret this?

What do you get if you run this command...?

dpkg -s python2.5

For me, I get something which mentions the following:

Package: python2.5

[...]

Depends: python2.5-minimal (= 2.5.1-0ubuntu1.2), mime-support,
libbz2-1.0, libc6 (>= 2.5-0ubuntu1), libdb4.4,
libncursesw5 (>= 5.4-5), libreadline5 (>= 5.2),
libsqlite3-0 (>= 3.3.13), libssl0.9.8 (>= 0.9.8c-1)
For me:
Depends: libbz2-1.0, libc6 (>= 2.4), libdb4.6, libncursesw5 (>= 5.6+20071006-3), libreadline5 (>= 5.2), libsqlite3-0 (>= 3.4.2), libssl0.9.8 (>= 0.9.8f-1), mime-support, python2.5-minimal (= 2.5.2-2ubuntu4.1)
>
Note the presence of the libsqlite3-0 package. In addition, you should
have the sqlite3 extension module somewhere:

locate sqlite3.so

This should tell you where the sqlite libraries are as well as where
the extension module is. For me, I get something which includes the
following:

/usr/lib/python2.5/lib-dynload/_sqlite3.so
/usr/lib/libsqlite3.so.0
/usr/lib/python2.5/lib-dynload/_sqlite3.so
/usr/lib/libsqlite3.so.0.8.6
/usr/lib/xulrunner-1.9.0.1/libsqlite3.so
/usr/lib/xulrunner-1.9.0.1/libsqlite3.so.0
/usr/lib/libsqlite3.so.0
/usr/lib/libsqlite3.so
>
Passing one of these to "dpkg -S" should say which package provided
it.
libsqlite3-dev: /usr/lib/libsqlite3.so
>
The strange thing is that the Ubuntu package information for your
version does mention the sqlite dependency and include the extension
module in the list of files:

http://packages.ubuntu.com/hardy/python2.5

You can run the following command to see whether your python2.5
package really provides the extension module:

dpkg --listfiles python2.5
# dpkg --listfiles python2.5 | grep sqli
/usr/lib/python2.5/sqlite3
/usr/lib/python2.5/sqlite3/test
/usr/lib/python2.5/sqlite3/test/__init__.py
/usr/lib/python2.5/sqlite3/test/dbapi.py
/usr/lib/python2.5/sqlite3/test/factory.py
/usr/lib/python2.5/sqlite3/test/hooks.py
/usr/lib/python2.5/sqlite3/test/regression.py
/usr/lib/python2.5/sqlite3/test/transactions.py
/usr/lib/python2.5/sqlite3/test/types.py
/usr/lib/python2.5/sqlite3/test/userfunctions.py
/usr/lib/python2.5/sqlite3/__init__.py
/usr/lib/python2.5/sqlite3/dbapi2.py
/usr/lib/python2.5/lib-dynload/_sqlite3.so
>
Even if the sqlite library is installed, if that package doesn't
provide the extension module, something must be wrong with it because
it should be there.

Paul
I'm certainly perplexed, and welcome suggestions.
Aug 31 '08 #7
On 31 Aug, 21:29, cla...@lairds.us (Cameron Laird) wrote:
>
[Lots of output suggesting correct package configuration]
I'm certainly perplexed, and welcome suggestions.
Maybe...

which python

I think Jean-Paul might be on to something with his response. Are we
referring to the system-packaged Python? There's always "python -v"
and/or "strace python" for full details of what might be happening
otherwise.

Paul
Aug 31 '08 #8
.... .
Yes and no. My own experience with Debian packages
is that with a standard

apt-get install python2.5

an attempt to
import sqlite3

results in
ImportError: No module named _sqlite3
....
No problems here with Debian Lenny ....

All packages via .... apt-get install xxxx ....

$ uname -a
Linux em1 2.6.25-2-686 #1 SMP Fri Jul 18 17:46:56 UTC 2008 i686 GNU/Linux

$ dpkg -l | grep sqlite
ii libhk-classes-sqlite3 0.8.3-4 SQLite 3 driver plugin for hk_classes
ii libsqlite3-0 3.5.9-3 SQLite 3 shared library
ii python-pysqlite2 2.4.1-1 Python interface to SQLite 3
ii sqlite3 3.5.9-3 A command line interface for SQLite 3

$ py
Python 2.5.2 (r252:60911, Aug 8 2008, 09:22:44)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
import sqlite3

--
Stanley C. Kitching
Human Being
Phoenix, Arizona

Sep 1 '08 #9
....
Yes and no. My own experience with Debian packages
is that with a standard
apt-get install python2.5
an attempt to
import sqlite3
results in
ImportError: No module named _sqlite3
....
From Kubuntu 8.04 ....

$ uname -a
Linux em1 2.6.24-19-generic #1 SMP
Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux

$ dpkg -l | grep sqlite
ii libsqlite0 2.8.17-4build1 SQLite shared library
ii libsqlite3-0 3.4.2-2 SQLite 3 shared library

$ py25
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
import sqlite3
It is now my estimation that the Force
is not currently with you .... :-)

--
Stanley C. Kitching
Human Being
Phoenix, Arizona
Sep 1 '08 #10
mark a écrit :
Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough.
Is this an a priori, or did you actually benchmark and found out it
would not fit your requirements ?
But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.

Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast
These two requirements can conflict for some values of "fast".
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.
Never used any of them - I have sqlite, mysql and pgsql installed on all
my machines -, so I can't help here.

Sep 1 '08 #11
On 2008-08-31 15:15, mark wrote:
Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough. But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.

Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.
You could use Gadfly for this since it is pure Python and provides
a standard Python DB-API interface:

http://gadfly.sourceforge.net/

(the C extensions are optional to speedup processing)

This is the SQL subset it supports:

http://gadfly.sourceforge.net/sql.html

Another option is SnakeSQL:

http://pythonweb.org/projects/snakesql/

but I've never used that one, so can't judge its quality.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Sep 01 2008)
>>Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Sep 1 '08 #12
I don't understand why Cameron has a different version of Python which
doesn't seem to have sqlite support enabled.
Agreed, but won't the package manager tell him if python-sqlite is
installed? That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.

Sep 2 '08 #13
Zentrader wrote:
>I don't understand why Cameron has a different version of Python which
doesn't seem to have sqlite support enabled.

Agreed, but won't the package manager tell him if python-sqlite is
installed? That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.
When you install Python manually from source you need the header files for
sqlite3 to get sqlite3 support. These are in the libsqlite3-dev package.

I think you can distinguish a manually installed python from the packaged
one by the .../local/... in its path, e. g., on my machine

$ which python2.5 # in the distribution
/usr/bin/python2.5
$ which python2.6
/usr/local/bin/python2.6 # installed from source

I have installed libsqlite3-dev so I can't reproduce Cameron's error, but
here's a similar one for bsddb:

$ python2.5
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:43)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import bsddb
bsddb.__file__
'/usr/lib/python2.5/bsddb/__init__.pyc'

$ python2.6
Python 2.6b2+ (trunk:65902, Aug 20 2008, 08:38:26)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import bsddb
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/bsddb/__init__.py", line 58, in <module>
import _bsddb
ImportError: No module named _bsddb

Peter

PS: Yes, I'm using 2.6, but I don't think that's relevant for the problem.
Sep 2 '08 #14
On 2 Sep, 17:38, Zentrader <zentrad...@gmail.comwrote:
I don't understand why Cameron has a different version of Python which
doesn't seem to have sqlite support enabled.

Agreed, but won't the package manager tell him if python-sqlite is
installed?
It shouldn't need to be installed: the python2.5 package includes the
sqlite3 module and the _sqlite extension module. He's running a more
modern version of Ubuntu than I am, but I don't think that they've
reintroduced the python-sqlite package in any form.
That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.
Indeed, which is why I think that there must be a manually installed
Python on his system, especially given that /usr/local/lib/python2.5/
sqlite3/__init__.py is one of the files mentioned in the traceback.

Paul
Sep 2 '08 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Alex Greem | last post by:
Dear all, Our database (DB2 Workgroup 7.2 FP12) is constantly under heavy load. Most time CPU usage (1 Pentium3 1Ghz) is more 50% busy. We have 3GB RAM memory Our normal workload is 200-300...
10
by: sunil | last post by:
Hi, I have a container that holds objects of same type (user defined class), where each object has multiple attributes. These objects have primary key (similar to a database) formed by combination...
5
by: ImOk | last post by:
Anyone know of an extension that is an in memory database? I dont want to create any files on disk (nor install a database engine) but have an object/variable that points to a database and tables...
3
by: Jean-Paul Calderone | last post by:
On Sun, 31 Aug 2008 18:05:08 +0000, Cameron Laird <claird@lairds.uswrote: It doesn't seem likely to me that this is the Python 2.5 packaged in Ubuntu 8.04. It's build timestamp is almost a year...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.