Database suggestion

Mikhail Kovalev

Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

Dec 5 '07 #1

Subscribe Reply

1651

Toby A Inkster

Mikhail Kovalev wrote:

I've been adviced to use MySQL (relational db's), but are there any
other options beside that and var_export/include? Something that works
in a similar way as MySQL when adding a new element without loading the
whole database itself...

SQLite, PostgreSQL, Oracle, Microsoft SQL Server, ...

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 18:31.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Dec 5 '07 #2

Mikhail Kovalev

I mean, I don't mind using MySQL in itself, it's just that I don't
want use relational db's.
I don't know if it is possible to predefine the structures I'm using
in MySQL, because the number of levels varies etc.
Here is an example recording a single sequence of 6 elements:

How do I put this in a db?

array (
0 =>
array (
0 =2,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
2 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
4 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
),
),
),
),
),
),
),
),
),
),
1 =>
array (
0 =2,
1 =>
array (
),
),
2 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
4 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
),
),
),
),
),
),
),
),
3 =>
array (
0 =2,
1 =>
array (
4 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
),
),
),
),
1 =>
array (
0 =1,
1 =>
array (
),
),
),
),
4 =>
array (
0 =1,
1 =>
array (
3 =>
array (
0 =1,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
),
),
),
),
)

On 5 Des, 12:42, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:

Mikhail Kovalev wrote:
I've been adviced to use MySQL (relational db's), but are there any
other options beside that and var_export/include? Something that works
in a similar way as MySQL when adding a new element without loading the
whole database itself...

SQLite, PostgreSQL, Oracle, Microsoft SQL Server, ...

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 18:31.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Dec 5 '07 #3

Jerry Stuckle

Mikhail Kovalev wrote:

Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

What do you mean by "without loading the whole database itself"?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Dec 5 '07 #4

Mikhail Kovalev

I unserialize an 8MB file, which contains the db array, and put the
new element there (then serialize the whole thing again and store)

On 5 Des, 13:28, Jerry Stuckle <jstuck...@attglobal.netwrote:

Mikhail Kovalev wrote:
Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

What do you mean by "without loading the whole database itself"?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstuck...@attglobal.net
==================

Dec 5 '07 #5

Mikhail Kovalev

On 5 Des, 14:02, The Natural Philosopher <a...@b.cwrote:

Mikhail Kovalev wrote:
I unserialize an 8MB file, which contains the db array, and put the
new element there (then serialize the whole thing again and store)

I am far from understanding what you are actually doing, but it sounds
like you are willing to trade disk usage for speed and memory usage.

Yep, you nailed it right on through, my friend

My question is, will I be able to access and store values by
addresses?
Such as here:
$database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example

Each element contains a COUNT value and an array of childs (NEXT),
which can be non-empty, empty or non-existent (when i check if it
exists)
Looks like this at top level:

Array(
433 =Array(
COUNT =2121,
NEXT =Array( //
23 =Array( count&next etc ...)
433 =...
etc...
)
),
1012 =Array( count & next.....)
.....
)

Dec 5 '07 #6

Marijn

My question is, will I be able to access and store values by

addresses?
Such as here:
$database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example

Each element contains a COUNT value and an array of childs (NEXT),
which can be non-empty, empty or non-existent (when i check if it
exists)

Maybe you should tell us what it is exactly that you are storing,
cause it seems to me that you are over complicating things. What is it
you are counting and why do you currently group it in multi-level
arrays?

Marijn

Dec 5 '07 #7

Mikhail Kovalev

On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.comwrote:

On 5 Dec, 11:13, Mikhail Kovalev <mikhail_kova...@mail.ruwrote:

Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

Any relational DB can model a hierarchical data structure - at the
simplest level:

node_id unique identifier, not null
parent_node null
payload whatever

But getting the threaded structure is a bit tricky -

SELECT parent.node_id as Parent_node_id, parent.payload as
parent_payload,
child.node_id as child_node_id, child.payload as child_payload
FROM mytable parent, mytable child
WHERE child.parent_node=parent.node_id
AND parent.node_id=$start_here

.. i.e. by explicitly declaring aliases for each level in the tree you
can get the sub-nodes, but when you don't know how deep the tree is,
its not possible to return the whole structure in a single query using
standard SQL. Oracle allows you to compose queries returning
representations of hierarchical data (have a google for 'connect by' &
'start with') but thats rather non-standard.

I don't know how deep the tree is, but I don't need to return the
whole structure in a single query. Alas, I don't know the nature of
MySQL

At this moment everything I do with my structure happens by calling
addresses
See my reply to The Natural Philosopher.
Here's an example with one node at top level and it's childs:

112 (10)
225 (3)
930 (1)
11 (5)

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

I then use the following mechanism:

// retrieves a node with its two values as an array(COUNT, NEXT)
// NEXT contain further childs, but i dont use them
// array_1 is the whole structure, and array_2 is the address
function function_7($array_1, $array_2) { return eval('return
$array_1'.function_8($array_2).';'); }

// array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
function function_8($array_1) { return implode('[NEXT]',
array_map('function_9', $array_1)); }

function function_9($integer_1) { return '['.$integer_1.']'; }

// saves a value array_3 to an address array_2 in structure array_1
function function_10($array_1, $array_2, $array_3)
{
eval('$array_1'.function_8($array_2).' = $array_3;');
return $array_1;
}

Dec 5 '07 #8

Mikhail Kovalev

On 5 Des, 14:52, Marijn <marijn.huizendv...@gmail.comwrote:

My question is, will I be able to access and store values by
addresses?
Such as here:
$database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example

Each element contains a COUNT value and an array of childs (NEXT),
which can be non-empty, empty or non-existent (when i check if it
exists)

Maybe you should tell us what it is exactly that you are storing,
cause it seems to me that you are over complicating things. What is it
you are counting and why do you currently group it in multi-level
arrays?

Marijn

I'm storing some statistics on Markov chains of characters in words

Dec 5 '07 #9

Captain Paralytic

On 5 Dec, 13:56, Mikhail Kovalev <mikhail_kova...@mail.ruwrote:

Alas, I don't know the nature of
MySQL

This has nothing to do with MySQL per se and everything to do with
understanding data storage models in general.

Here is some good reading on the subject.
http://del.icio.us/Captain_Paralytic/hierarchical

Dec 5 '07 #10

The Natural Philosopher

Mikhail Kovalev wrote:

On 5 Des, 14:02, The Natural Philosopher <a...@b.cwrote:
>Mikhail Kovalev wrote:
>>I unserialize an 8MB file, which contains the db array, and put the
new element there (then serialize the whole thing again and store)
I am far from understanding what you are actually doing, but it sounds
like you are willing to trade disk usage for speed and memory usage.

Yep, you nailed it right on through, my friend

My question is, will I be able to access and store values by
addresses?

essentially yes.

T

Such as here:
$database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example

That would translate into a very large Mysql conditional statement, :
you might find it easier to split that into a series of separate
recursive calls into the SQL data base.. I have implemented something
like that..I have a data base of products that have a parent category,
and categories of categories..and so on.

I simply run a recursive subroutine that executes a single SQL statement
to walk all the tree branches.

All a database is, is an array of structures..called a table. Plus a set
of routines for accessing them.

Each structure is a row in the table, and can have a unique ID. This is
indexed by a hash by default.

Elements within the structure are the fields of the record. These can
also have indices associated with them.
To retrieve or update an individual record whose ID is known is
blindingly fast, but it sounds like you would need to search through
fields first to identify the one you want. That probably would mean
several recursive calls to the database and some attention to indexing
to get the best performance.

Dec 5 '07 #11

The Natural Philosopher

Captain Paralytic wrote:

On 5 Dec, 13:56, Mikhail Kovalev <mikhail_kova...@mail.ruwrote:
>Alas, I don't know the nature of
MySQL
This has nothing to do with MySQL per se and everything to do with
understanding data storage models in general.

Indeed,. but a background explanation of SQL does not come amiss.

Sounds like they OP is a scientist/mathematician, more than a programmer.

So likely to know about different things.

Here is some good reading on the subject.
http://del.icio.us/Captain_Paralytic/hierarchical

Dec 5 '07 #12

Toby A Inkster

Mikhail Kovalev wrote:

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?

i.e. Can 11 have only one "parent" or multiple?

If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:

CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);

To add counts to the database, you'd use:

INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);

To retrieve the count for key array(112, 225, 930), you'd use:

SELECT node_count
FROM nodes
WHERE node_address='112/225/930';

To find a list of descendant nodes of '112', you'd use:

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';

Or to find just child nodes (i.e. no grandchildren, etc):

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';

By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 22:57.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Dec 5 '07 #13

Norman Peelman

Mikhail Kovalev wrote:

On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.comwrote:
>On 5 Dec, 11:13, Mikhail Kovalev <mikhail_kova...@mail.ruwrote:

>>Hi.
I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...
Thanks!
Any relational DB can model a hierarchical data structure - at the
simplest level:

node_id unique identifier, not null
parent_node null
payload whatever

But getting the threaded structure is a bit tricky -

SELECT parent.node_id as Parent_node_id, parent.payload as
parent_payload,
child.node_id as child_node_id, child.payload as child_payload
FROM mytable parent, mytable child
WHERE child.parent_node=parent.node_id
AND parent.node_id=$start_here

.. i.e. by explicitly declaring aliases for each level in the tree you
can get the sub-nodes, but when you don't know how deep the tree is,
its not possible to return the whole structure in a single query using
standard SQL. Oracle allows you to compose queries returning
representations of hierarchical data (have a google for 'connect by' &
'start with') but thats rather non-standard.

I don't know how deep the tree is, but I don't need to return the
whole structure in a single query. Alas, I don't know the nature of
MySQL

At this moment everything I do with my structure happens by calling
addresses
See my reply to The Natural Philosopher.
Here's an example with one node at top level and it's childs:

112 (10)
225 (3)
930 (1)
11 (5)

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

I then use the following mechanism:

// retrieves a node with its two values as an array(COUNT, NEXT)
// NEXT contain further childs, but i dont use them
// array_1 is the whole structure, and array_2 is the address
function function_7($array_1, $array_2) { return eval('retur
$array_1'.function_8($array_2).';'); }

// array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
function function_8($array_1) { return implode('[NEXT]',
array_map('function_9', $array_1)); }

function function_9($integer_1) { return '['.$integer_1.']'; }

// saves a value array_3 to an address array_2 in structure array_1
function function_10($array_1, $array_2, $array_3)
{
eval('$array_1'.function_8($array_2).' = $array_3;');
return $array_1;
}

Tried to look up on Markov chains... are any of your addresses
duplicated? ie: can you have an address like 112, 225, 112 ?

Norm

Dec 5 '07 #14

Norman Peelman

Toby A Inkster wrote:

Mikhail Kovalev wrote:

>$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?

i.e. Can 11 have only one "parent" or multiple?

If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:

CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);

To add counts to the database, you'd use:

INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);

To retrieve the count for key array(112, 225, 930), you'd use:

SELECT node_count
FROM nodes
WHERE node_address='112/225/930';

To find a list of descendant nodes of '112', you'd use:

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';

Or to find just child nodes (i.e. no grandchildren, etc):

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';

By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.

INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

....will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Dec 5 '07 #15

Mikhail Kovalev

On 5 Des, 18:26, Norman Peelman <npeel...@cfl.rr.comwrote:

Mikhail Kovalev wrote:
On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.comwrote:
On 5 Dec, 11:13, Mikhail Kovalev <mikhail_kova...@mail.ruwrote:

>Hi.
I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...
Thanks!
Any relational DB can model a hierarchical data structure - at the
simplest level:

node_id unique identifier, not null
parent_node null
payload whatever

But getting the threaded structure is a bit tricky -

SELECT parent.node_id as Parent_node_id, parent.payload as
parent_payload,
child.node_id as child_node_id, child.payload as child_payload
FROM mytable parent, mytable child
WHERE child.parent_node=parent.node_id
AND parent.node_id=$start_here

.. i.e. by explicitly declaring aliases for each level in the tree you
can get the sub-nodes, but when you don't know how deep the tree is,
its not possible to return the whole structure in a single query using
standard SQL. Oracle allows you to compose queries returning
representations of hierarchical data (have a google for 'connect by' &
'start with') but thats rather non-standard.

I don't know how deep the tree is, but I don't need to return the
whole structure in a single query. Alas, I don't know the nature of
MySQL

At this moment everything I do with my structure happens by calling
addresses
See my reply to The Natural Philosopher.
Here's an example with one node at top level and it's childs:

112 (10)
225 (3)
930 (1)
11 (5)

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

I then use the following mechanism:

// retrieves a node with its two values as an array(COUNT, NEXT)
// NEXT contain further childs, but i dont use them
// array_1 is the whole structure, and array_2 is the address
function function_7($array_1, $array_2) { return eval('retur
$array_1'.function_8($array_2).';'); }

// array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
function function_8($array_1) { return implode('[NEXT]',
array_map('function_9', $array_1)); }

function function_9($integer_1) { return '['.$integer_1.']'; }

// saves a value array_3 to an address array_2 in structure array_1
function function_10($array_1, $array_2, $array_3)
{
eval('$array_1'.function_8($array_2).' = $array_3;');
return $array_1;
}

Tried to look up on Markov chains... are any of your addresses
duplicated? ie: can you have an address like 112, 225, 112 ?

Norm

Yes. Any id can be child or parent of any other id or itself and any
way around.
I'm only interested in how many times a given sequence occurs.
For example, if I was to record this sequence (112, 225, 930) (the
example above doesn't show it), it would look like this:

Split by length from 1 to the size of the sequence itself (3)
112
225
930
112, 225
225, 930
112, 225, 930

I use those as addresses to add count values to make this structure.

112 (1)
225 (1)
930 (1)
225 (1)
930 (1)
930 (1)

If I now add another sequence 700, 112, 112, 225, the table changes
to:

112 (3)
225 (2)
930 (1)
112 (1)
225 (1)
225 (2)
930 (1)
930 (1)
700 (1)
112 (1)
112 (1)
225 (1)

So from this I know that (112) occurs 3 times (in both sequences that
have been added), (112, 225) occurs 2 times, and (700, 112, 112)
occurs 1 times etc.

I have another file which translates id's to other variables.

Dec 5 '07 #16

Mikhail Kovalev

On 5 Des, 17:26, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:

Mikhail Kovalev wrote:
$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?

i.e. Can 11 have only one "parent" or multiple?

If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:

CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);

To add counts to the database, you'd use:

INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);

To retrieve the count for key array(112, 225, 930), you'd use:

SELECT node_count
FROM nodes
WHERE node_address='112/225/930';

To find a list of descendant nodes of '112', you'd use:

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';

Or to find just child nodes (i.e. no grandchildren, etc):

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';

By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 22:57.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Testing it! Forever grateful for the examples!=)

Also, see my reply to Norman below.

Dec 5 '07 #17

Mikhail Kovalev

On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.comwrote:

Toby A Inkster wrote:
Mikhail Kovalev wrote:

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?

i.e. Can 11 have only one "parent" or multiple?

If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:

CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);

To add counts to the database, you'd use:

INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);

To retrieve the count for key array(112, 225, 930), you'd use:

SELECT node_count
FROM nodes
WHERE node_address='112/225/930';

To find a list of descendant nodes of '112', you'd use:

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';

Or to find just child nodes (i.e. no grandchildren, etc):

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';

By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.

INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

...will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)

From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:

INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
<new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
<new_count>

Is this correct? (Do I have to use <in VALUES, I'm following an
example which does it?)

Dec 5 '07 #18

Good Man

Mikhail Kovalev <mi*************@mail.ruwrote in news:59d586fd-60f0-4a33-
8c***************@o6g2000hsd.googlegroups.com:

Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

Databases are truly helpful in the case.

You might find this article well worth reading:

http://www.sitepoint.com/article/hie...-data-database

Dec 5 '07 #19

Norman Peelman

Mikhail Kovalev wrote:

On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.comwrote:

>INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

...will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)

From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:

INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
<new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
<new_count>

Is this correct? (Do I have to use <in VALUES, I'm following an
example which does it?)

No replace everything '<..>' with your own variable name, that will do it.

$new_count = 5
....ON DUPLICATE KEY UPDATE node_count = $new_count

Not sure what you mean by 'second table'. How many do you have? Though
this was one table...

Norm

Dec 5 '07 #20

Norman Peelman

Mikhail Kovalev wrote:

On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.comwrote:
>Toby A Inkster wrote:
>>Mikhail Kovalev wrote:
$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();
An address is an array, f ex array(112, 225, 930)
If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?
i.e. Can 11 have only one "parent" or multiple?
If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:
CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);
To add counts to the database, you'd use:
INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);
To retrieve the count for key array(112, 225, 930), you'd use:
SELECT node_count
FROM nodes
WHERE node_address='112/225/930';
To find a list of descendant nodes of '112', you'd use:
SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';
Or to find just child nodes (i.e. no grandchildren, etc):
SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';
By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.
INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

...will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)

From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:

INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
<new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
<new_count>

Is this correct? (Do I have to use <in VALUES, I'm following an
example which does it?)

Ooops, in my other reply that should have read:

$your_variable = 2
"...ON DUPLICATE KEY UPDATE new_count = new_count + $your_variable"

....to add an arbitrary amount to new_count. 3 + 2 = 5 for your example.
Norm

Dec 5 '07 #21

Mikhail Kovalev

On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.comwrote:

Mikhail Kovalev wrote:
On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.comwrote:

<snipped>

INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

...will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)

From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:

INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
<new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
<new_count>

Is this correct? (Do I have to use <in VALUES, I'm following an
example which does it?)

No replace everything '<..>' with your own variable name, that will do it.

$new_count = 5
...ON DUPLICATE KEY UPDATE node_count = $new_count

Not sure what you mean by 'second table'. How many do you have? Though
this was one table...

Norm

I have different databases for different types of sequences. Each
sequence is some tousands elements in length and is recorded by chunks
of small series each from 1 to 20 elements long. Sometimes I join
different databases together and analyze each one separately and then
in relation to the joint database.

I have been thinking, if a sequence is for instance 5.000 elements
long, chunks 1-20, and there are no duplicates (absolutely unlikely,
but still) I will end up having a database with around 105.000
entries...
My current solution is probably better because it saves so much
space(?)

An entry like this in the current solution:
112 (310)
225 (20)
930 (6)
700 (1)
7 (1)
812 (1)

Equals this in the new system I am about to implement:

('112', 310)
('112/225', 20)
('112/225/930', 6)
('112/225/930/700', 1)
('112/225/930/700/7', 1)
('112/225/930/812', 1)

I don't know how a flat MySQL database of 100.000-200.000 entries like
these will perform. I'm now also considering a filesystem as the
database, as C has suggested.

Dec 5 '07 #22

Norman Peelman

Mikhail Kovalev wrote:

On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.comwrote:
>Mikhail Kovalev wrote:
>>On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.comwrote:
<snipped>

>>>INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
...will INSERT new entries and UPDATE existing entries in one swoop.
Norm
Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)
From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:
INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
<new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
<new_count>
Is this correct? (Do I have to use <in VALUES, I'm following an
example which does it?)
No replace everything '<..>' with your own variable name, that will do it.

$new_count = 5
...ON DUPLICATE KEY UPDATE node_count = $new_count

Not sure what you mean by 'second table'. How many do you have? Though
this was one table...

Norm

I have different databases for different types of sequences. Each
sequence is some tousands elements in length and is recorded by chunks
of small series each from 1 to 20 elements long. Sometimes I join
different databases together and analyze each one separately and then
in relation to the joint database.

I have been thinking, if a sequence is for instance 5.000 elements
long, chunks 1-20, and there are no duplicates (absolutely unlikely,
but still) I will end up having a database with around 105.000
entries...
My current solution is probably better because it saves so much
space(?)

An entry like this in the current solution:
112 (310)
225 (20)
930 (6)
700 (1)
7 (1)
812 (1)

Equals this in the new system I am about to implement:

('112', 310)
('112/225', 20)
('112/225/930', 6)
('112/225/930/700', 1)
('112/225/930/700/7', 1)
('112/225/930/812', 1)

I don't know how a flat MySQL database of 100.000-200.000 entries like
these will perform. I'm now also considering a filesystem as the
database, as C has suggested.

I would imagine it would perform quite well as each one of the
addresses is a key (index) into the database. MySQL won't have to
examine all the entries that start with 112 to find where 930 or 700 or
whatever is. It knows exactly where 112/225/930/812/???/???/??? is. The
INSERT UPDATE will take less than a second on average most likely.

All this being said, you can do the same thing with your current array
in memory style using the same syntax:

$your_array['112/225/930'] = 6
$your_array['112/225/930/700'] = 1

the only difference is you still have to load it into memory (an array)
to use it. If you go with MySQL and still need an array in memory at
some point, a simple:

$dataset = array();
$result = mysql_query("SELECT * FROM nodes");
while($row = mysql_fetch_assoc($result))
{
$dataset[$row['node_address']] = $row['node_count'];
}

will get you your entire dataset into an in memory array in which you
would access just like sql:

$dataset['112'] = 310
$dataset['112/225'] = 20
$dataset['112/225/930'] = 6
$dataset['112/225/930/700'] = 1
$dataset['112/225/930/700/7'] = 1

Norm

Dec 5 '07 #23

asdf

"Mikhail Kovalev" <mi*************@mail.ruwrote in message
news:82**********************************@d27g2000 prf.googlegroups.com...

>I mean, I don't mind using MySQL in itself, it's just that I don't
want use relational db's.
I don't know if it is possible to predefine the structures I'm using
in MySQL, because the number of levels varies etc.
Here is an example recording a single sequence of 6 elements:

How do I put this in a db?

array (
0 =>
array (
0 =2,
1 =>
array (
1 =>
array (
0 =1,
1 =>
array (
),
),
2 =>
array (
0 =1,
1 =>
array (

[snip]

Check this out: http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html

Google this:
http://www.google.com.au/search?hl=e...G=Search&meta=

Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((

Dec 5 '07 #24

Rik Wasmus

On Wed, 05 Dec 2007 23:39:27 +0100, asdf <as**@asdf.comwrote:

Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((

Joe Celko's Trees and Hierarchies in SQL for Smarties, a must read for
anyone serious about the issue indeed. Not the easiest book to plough
through, but very rewarding :)
--
Rik Wasmus

Dec 5 '07 #25

Toby A Inkster

Mikhail Kovalev wrote:

I have different databases for different types of sequences.

Aha. I don't think you mentioned that earlier.

CREATE TABLE data_sets
(
data_set integer NOT NULL PRIMARY KEY,
data_set_name varchar(20)
);
CREATE TABLE nodes
(
data_set integer NOT NULL
REFERENCES data_sets ON DELETE CASCADE,
node_address varchar(1024) NOT NULL,
node_count integer,
PRIMARY KEY (data_set_id, node_address)
);

An example of inserting some data:

INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
INSERT INTO nodes VALUES (1, '112', 3);
INSERT INTO nodes VALUES (1, '118', 4);
INSERT INTO nodes VALUES (1, '112/11', 5);
INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
INSERT INTO nodes VALUES (2, '112', 2);
INSERT INTO nodes VALUES (2, '4/115', 3);

An example of selecting data from Data Set 1:

SELECT node_count
FROM nodes
WHERE data_set=1
AND node_address='112';
(returns 3)

Some examples of selecting the sum of data from multiple sets:

SELECT sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
AND node_address='112'
GROUP BY node_address;
(returns 5)

SELECT node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;
(returns list of all nodes in set 1 or set 2, plus sum of counts
for both data sets)

Let's create a new data set 3 and populate it with the data from the
previous query. That is, data set 3 is the sum of data sets 1 and 2:

INSERT INTO data_sets VALUES (3, 'Combined Set');
INSERT INTO nodes
SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;

SQL is a really easy way of manipulating data sets.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Dec 6 '07 #26

Mikhail Kovalev

On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:

Mikhail Kovalev wrote:
I have different databases for different types of sequences.

Aha. I don't think you mentioned that earlier.

CREATE TABLE data_sets
(
data_set integer NOT NULL PRIMARY KEY,
data_set_name varchar(20)
);
CREATE TABLE nodes
(
data_set integer NOT NULL
REFERENCES data_sets ON DELETE CASCADE,
node_address varchar(1024) NOT NULL,
node_count integer,
PRIMARY KEY (data_set_id, node_address)
);

An example of inserting some data:

INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
INSERT INTO nodes VALUES (1, '112', 3);
INSERT INTO nodes VALUES (1, '118', 4);
INSERT INTO nodes VALUES (1, '112/11', 5);
INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
INSERT INTO nodes VALUES (2, '112', 2);
INSERT INTO nodes VALUES (2, '4/115', 3);

An example of selecting data from Data Set 1:

SELECT node_count
FROM nodes
WHERE data_set=1
AND node_address='112';
(returns 3)

Some examples of selecting the sum of data from multiple sets:

SELECT sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
AND node_address='112'
GROUP BY node_address;
(returns 5)

SELECT node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;
(returns list of all nodes in set 1 or set 2, plus sum of counts
for both data sets)

Let's create a new data set 3 and populate it with the data from the
previous query. That is, data set 3 is the sum of data sets 1 and 2:

INSERT INTO data_sets VALUES (3, 'Combined Set');
INSERT INTO nodes
SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;

SQL is a really easy way of manipulating data sets.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

Dec 6 '07 #27

Mikhail Kovalev

On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:

Mikhail Kovalev wrote:
I have different databases for different types of sequences.

Aha. I don't think you mentioned that earlier.

CREATE TABLE data_sets
(
data_set integer NOT NULL PRIMARY KEY,
data_set_name varchar(20)
);
CREATE TABLE nodes
(
data_set integer NOT NULL
REFERENCES data_sets ON DELETE CASCADE,
node_address varchar(1024) NOT NULL,
node_count integer,
PRIMARY KEY (data_set_id, node_address)
);

An example of inserting some data:

INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
INSERT INTO nodes VALUES (1, '112', 3);
INSERT INTO nodes VALUES (1, '118', 4);
INSERT INTO nodes VALUES (1, '112/11', 5);
INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
INSERT INTO nodes VALUES (2, '112', 2);
INSERT INTO nodes VALUES (2, '4/115', 3);

An example of selecting data from Data Set 1:

SELECT node_count
FROM nodes
WHERE data_set=1
AND node_address='112';
(returns 3)

Some examples of selecting the sum of data from multiple sets:

SELECT sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
AND node_address='112'
GROUP BY node_address;
(returns 5)

SELECT node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;
(returns list of all nodes in set 1 or set 2, plus sum of counts
for both data sets)

Let's create a new data set 3 and populate it with the data from the
previous query. That is, data set 3 is the sum of data sets 1 and 2:

INSERT INTO data_sets VALUES (3, 'Combined Set');
INSERT INTO nodes
SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;

SQL is a really easy way of manipulating data sets.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/

This and the wildcard search will save me 50% of programming.
Seriously, I don't understand why I didn't looked into it earlier.

Thanks for the input!

And sorry about last empty message, sent it by mistake.

Dec 6 '07 #28

Database suggestion

Similar topics