bsddb3 database file, what are the __db.001, __db.002, __db.003files for?

Claudio Grondi

I have just started to play around with the bsddb3 module interfacing
the Berkeley Database.

Beside the intended database file
databaseFile.bdb
I see in same directory also the
__db.001
__db.002
__db.003
files where
__db.003 is ten times as larger as the databaseFile.bdb
and
__db.001 has the same size as the databaseFile.bdb .

What are these files for and what could be the reason they occur?

If I delete them, the access to the database in databaseFile.bdb
still works as expected.

Any hints toward enlightenment?

Is there any _good_ documentation of the bsddb3 module around beside
this provided with this module itself, where it is not necessary e.g. to
guess, that C integer value of zero (0) is represented in Python by the
value None returned in case of success by db.open() ?

Claudio

Feb 23 '06 #1

Subscribe Post Reply

6434

Klaas

Claudio Grondi wrote:

Beside the intended database file
databaseFile.bdb
I see in same directory also the
__db.001
__db.002
__db.003
files where
__db.003 is ten times as larger as the databaseFile.bdb
and
__db.001 has the same size as the databaseFile.bdb .
I can't tell you exactly what each is, but they are the files that the
shared environment (DBEnv) uses to coordinate multi-process access to
the database. In particular, the big one is likely the mmap'd cache
(which defaults to 5Mb, I believe).

You can safely delete them, but probably shouldn't while your program
is executing.
Is there any _good_ documentation of the bsddb3 module around beside
this provided with this module itself, where it is not necessary e.g. to
guess, that C integer value of zero (0) is represented in Python by the
value None returned in case of success by db.open() ?

This is the only documentation available, AFAIK:
http://pybsddb.sourceforge.net/bsddb3.html

For most of the important stuff it is necessary to dig into the bdb
docs themselves.

-Mike

Feb 23 '06 #2

Claudio Grondi

Klaas wrote:

Claudio Grondi wrote:

Beside the intended database file
databaseFile.bdb
I see in same directory also the
__db.001
__db.002
__db.003
files where
__db.003 is ten times as larger as the databaseFile.bdb
and
__db.001 has the same size as the databaseFile.bdb .

I can't tell you exactly what each is, but they are the files that the
shared environment (DBEnv) uses to coordinate multi-process access to
the database. In particular, the big one is likely the mmap'd cache
(which defaults to 5Mb, I believe).

You can safely delete them, but probably shouldn't while your program
is executing.

Is there any _good_ documentation of the bsddb3 module around beside
this provided with this module itself, where it is not necessary e.g. to
guess, that C integer value of zero (0) is represented in Python by the
value None returned in case of success by db.open() ?

This is the only documentation available, AFAIK:
http://pybsddb.sourceforge.net/bsddb3.html

For most of the important stuff it is necessary to dig into the bdb
docs themselves.

Thank you for the reply.

Probably to avoid admitting, that the documentation is weak a positive
way of stating this was found by using the phrase:

"Berkeley DB was designed by programmers, for programmers."

So I have to try to get an excavator ;-) to speed up digging the docs
and maybe even the source, right?

Are there online somewhere any useful simple examples of applications
using the Berkeley DB I could learn from?

I am especially interested in using the multimap feature activated using
db.set_flags(bsddb3.db.DB_DUPSORT) and fear, that after the database
file size will grow during mapping tokens to the files they occur in (I
have appr. 10 million files which I want to build a search index for) I
will hit some unexpected limits and the project will fail like it
happened to me once in the past when I tried to use MySQL for similar
purpose (after the database file has grown over 2 GByte MySQL just began
to hang when trying to add some more records).
I am on a Windows using the NTFS file system, so I don't expect problems
with too large file size. In between I have also already working Python
code performing the basic database operations I will need to feed and
query the database.
Has someone used Berkeley DB for similar purpose and can tell me, that
actually in practice (not in theory stated in the feature list of the
Berkeley DB) I must not fear any problems?
It took me some days of continuous updating the MySQL database to see,
that there is an unexpected strange limit for the database file size. I
still have no idea what the actual cause of the problem with MySQL was
(I suppose it in having only 256 MB RAM available that time) as it is
known that MySQL databases larger than 2 GByte exist and are in daily
use :-( .

This are the reasons why I would be glad to hear how to avoid running
into a similar problem again _before_ I start to torture my machine with
filling the appropriate Berkeley DB database with entries.

Claudio

-Mike

Feb 23 '06 #3

Klaas

Claudio writes:

I am on a Windows using the NTFS file system, so I don't expect problems
with too large file size.
how large can files grow on NTFS? I know little about it.
(I suppose it in having only 256 MB RAM available that time) as it is
known that MySQL databases larger than 2 GByte exist and are in daily
use :-( .

Do you have more ram now? I've used berkeley dbs up to around 5 gigs
in size and they performed fine. However, it is quite important that
the working set of the database (it's internal index pages) can fit
into available ram. If they are swapping in and out, there will be
problems.

-Mike

Feb 27 '06 #4

Claudio Grondi

Klaas wrote:

Claudio writes:
I am on a Windows using the NTFS file system, so I don't expect problems
with too large file size.

how large can files grow on NTFS? I know little about it.

No practical limit on current harddrives. i.e.:
Maximum file size
Theory: 16 exabytes minus 1 KB (2**64 bytes minus 1 KB)
Implementation: 16 terabytes minus 64 KB (2**44 bytes minus 64 KB)
Maximum volume size
Theory: 2**64 clusters minus 1
Implementation: 256 terabytes minus 64 KB (2**32 clusters minus 1)
Files per volume
4,294,967,295 (2**32 minus 1 file)
(I suppose it in having only 256 MB RAM available that time) as it is
known that MySQL databases larger than 2 GByte exist and are in daily
use :-( .

Do you have more ram now?

I have now 3 GByte RAM on my best machine, but Windows allows a process
not to exceed 2 GByte, so in practice a little bit less than 2 GByte are
the actual upper limit.

I've used berkeley dbs up to around 5 gigs in size and they performed fine. However, it is quite important that
the working set of the database (it's internal index pages) can fit
into available ram. If they are swapping in and out, there will be
problems. Thank you very much for your reply.

In my current project I expect the data to have much less volume than
the indexes. In my failed MySQL project the size of the indexes was
appr. same as the size of the indexed data (1 GByte).
In my current project I expect the total size of the indexes to exceed
by far the size of the data indexed, but because Berkeley does not
support multiple indexed columns (i.e. only one key value column as
index) if I access the database files one after another (not
simultaneously) it should work without problems with RAM, right?

Do the data volume required to store the key values have impact on the
size of the index pages or does the size of the index pages depend only
on the number of records and kind of the index (btree, hash)?

In last case, I were free to use for the key values also larger sized
data columns without running into the problems with RAM size for the
index itself, else I were forced to use key columns storing a kind of
hash to get their size down (and two dictionaries instead of one).

What is the upper limit of number of records in practice?

Theoretical, as given in the tutorial, Berkeley is capable of holding up
to billions of records with sizes of up to 4 GB each single record with
tables up to total storage size of 256 TB of data.
By the way: are billions in the given context multiple of 1.000.000.000
or of 1.000.000.000.000 i.e. in US or British sense?

I expect the number of records in my project in the order of tens of
millions (multiple of 10.000.000).

I would be glad to hear if someone has already successful run Berkeley
with this or larger amount of records and how much RAM and which OS had
the therefore used machine (I am on Windows XP with 3 GByte RAM).

Claudio

-Mike

Feb 28 '06 #5

Klaas

> In my current project I expect the total size of the indexes to exceed

by far the size of the data indexed, but because Berkeley does not
support multiple indexed columns (i.e. only one key value column as
index) if I access the database files one after another (not
simultaneously) it should work without problems with RAM, right?
You can maintain multiple secondary indices on a primary database. BDB
isn't a "relational" database, though, so speaking of columns confuses
the issue. But you can have one database with primary key -> value,
then multiple secondary key -> primary key databases (with bdb
transparently providing the secondary key -> value mapping if you
desire).
Do the data volume required to store the key values have impact on the
size of the index pages or does the size of the index pages depend only
on the number of records and kind of the index (btree, hash)?
For btree, it is the size of the keys that matters. I presume the same
is true for the hashtable, but I'm not certain.
What is the upper limit of number of records in practice?

Depends on sizes of the keys and values, page size, cache size, and
physical limitations of your machine.

-Mike

Mar 1 '06 #6

Similar topics

Setting up my first database

by: John | last post by:

Hello. If I want to set up my first database and start using it in Dreamweaver what do I need to do? The book I'm working on has a CD with the database on. It is telling me to put it in the...

PHP

ASP security model / file permissions to allow updating of a database

by: Fran Tirimo | last post by:

I am developing a small website using ASP scripts to format data retrieved from an Access database. It will run on a Windows 2003 server supporting FrontPage extensions 2002 hosted by the company...

ASP / Active Server Pages

ERROR in Standby database setup with RH9

by: Cherrish Vaidiyan | last post by:

sir, The following are the steps that i followed in setting up standby database on Red hat Linux 9. i am using Oracle 9i. i have followed the steps in this site : ...

Oracle Database

Reading Database BLOB with showing progress information

by: ssp | last post by:

Dear all, I'm dealing with a very tricky problem and can't seem to find the answer with google. the problem is: i want to store huge data (binaries) inside a mysql databases blob - to later...

PHP

SQL1224N A database agent could not be started to service a request

by: Marvin Libson | last post by:

Hi All: I am running DB2 UDB V7.2 with FP11. Platform is Windows 2000. I have created a java UDF and trigger. When I update my database I get the following error: SQL1224N A database...

DB2 Database

bsddb3 locking questions

by: Eric S. Johansson | last post by:

are there any simple examples of how to do record locking with bsddb3? the bsddb3 documentation is reasonably opaque. For example, the DB initialization requires a DBEnv instance for certain...

Python

Multiple database access Bug in adoDB and It's Solution

by: CountDraculla | last post by:

Fixing Multiple Database bug in adoDB popular data access layer for php, adoDB can support multiple databases from different provider at time, but not from same provider. what I mean is if you...

PHP

Trouble creating bsddb database

by: brainstaurm | last post by:

I am trying to create a bsddb database where I would like to store all the relevant database files created by my application under one directory. My code looks as follows: from bsddb import...

Python

221

Uploading files into a MySQL database using PHP

by: Atli | last post by:

You may be wondering why you would want to put your files “into” the database, rather than just onto the file-system. Well, most of the time, you wouldn’t. In situations where your PHP application...

PHP

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp