473,418 Members | 2,072 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,418 software developers and data experts.

optimizing memory utilization

Hello all,

I'm hoping for some guidance here... I am a c/c++ "expert", but a
complete python virgin. I'm trying to create a program that loads in the
entire FreeDB database (excluding the CDDBID itself) and uses this
"database" for other subsequent processing. The problem is, I'm running
out of memory on a linux RH8 box with 512MB. The FreeDB database that I'm
trying to load is presently consisting of two "CSV" files. The first file
contains a "list" of albums with artist name and an arbitrary sequential
album ID an the CDDBID (an ascii-hex representation of a 32-bit value).
The second file contains a list of all of the tracks on each of the
albums, crossreferenced via the album ID. When I load into memory, I
create a python list where each entry in the list is itself a list
representing the data for a given album. The album data list consists of
a small handful of text items like the album title, author, genre, and
year, as well as a list which itself contains a list for each of the track
on the album.

[[<Alb1ID#>, '<Alb1Artist>', '<Alb1Title>', '<Alb1Genre>','<Alb1Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
[<Alb2ID#>, '<Alb2Artist>', '<Alb2Title>', '<Alb2Genre>','<Alb2Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
...
[<AlbNID#>, '<AlbNArtist>', '<AlbNTitle>', '<AlbNGenre>','<AlbNYear>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]]]]

So the problem I'm having is that I want to load it all in memory (the two
files total about 250MB of raw data) but just loading the first 50,000
lines of tracks (about 25MB of raw data) consumes 75MB of RAM. If the
approximation is fairly accurate, I'd need >750MB of available RAM just to
load my in-memory database.

The bottom line is, is there a more memory efficient way to load all this
arbitrary field length and count type data into RAM? I can already see
that creating a seperate list for the group of tracks on an album is
probably wasteful versus just appending them to the album list, but I
doubt that will yeild the desired level of optimization of memory usage??

Any data structures suggestions for this application? BTW, the later
accesses to this database would not benefit in any way from being
presorted, so no need to warn me in advance about concepts like
presorting the albums list to facillitate faster look-up later...

Thanks,
Wes
Jul 18 '05 #1
12 2016

anon> [[<Alb1ID#>, '<Alb1Artist>', '<Alb1Title>', '<Alb1Genre>','<Alb1Year>',
anon> [["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
anon> [<Alb2ID#>, '<Alb2Artist>', '<Alb2Title>', '<Alb2Genre>','<Alb2Year>',
anon> [["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
anon> ...
anon> [<AlbNID#>, '<AlbNArtist>', '<AlbNTitle>', '<AlbNGenre>','<AlbNYear>',
anon> [["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]]]]

anon> So the problem I'm having is that I want to load it all in memory
anon> (the two files total about 250MB of raw data) but just loading the
anon> first 50,000 lines of tracks (about 25MB of raw data) consumes
anon> 75MB of RAM. If the approximation is fairly accurate, I'd need
anon> >750MB of available RAM just to load my in-memory database.

anon> The bottom line is, is there a more memory efficient way to load
anon> all this arbitrary field length and count type data into RAM?

Sure, assuming you know what your keys are, store them in a db file. Let's
assume you want to search by artist. Do your csv thing, but store the
records in a shelve keyed by the AlbNArtist field:

import shelve
import csv

reader = csv.reader(open("file1.csv", "rb"))
db = shelve.open("file1.db")

for row in reader:
stuff = db.get(row[1], [])
stuff.append(row)
db[row[1]] = stuff
db.close()

I'm not sure I've interpreted your sample csv quite right, but I think
you'll get the idea. You can of course have multiple db files, each keyed
by a different field (or part of a field).

Obviously, using a db file will be slower than an in-memory dictionary, but
if memory is a bottleneck, this will likely help. You can also avoid
initializing the db file on subsequent program runs if the csv file is older
than the db file, probably resulting in faster startup.

Skip
Jul 18 '05 #2

"Anon" <an**@ymous.com> wrote in message
news:pa****************************@ymous.com...
Any data structures suggestions for this application? BTW, the later
accesses to this database would not benefit in any way from being
presorted,


You keep saying it yourself: Use a Database ;-).

Databases "knows" about stuff in memory, as well as any searching and
sorting you might dream up later.

Python supports several, the simplest is probably PySQLite:
http://sourceforge.net/projects/pysqlite/
Jul 18 '05 #3
"Frithiof Andreas Jensen" <frithiof.jensen@die_spammer_die.ericsson.com> wrote in message news:<ci**********@newstree.wise.edt.ericsson.se>. ..
"Anon" <an**@ymous.com> wrote in message
news:pa****************************@ymous.com...
Any data structures suggestions for this application? BTW, the later
accesses to this database would not benefit in any way from being
presorted,


You keep saying it yourself: Use a Database ;-).

Databases "knows" about stuff in memory, as well as any searching and
sorting you might dream up later.

Python supports several, the simplest is probably PySQLite:
http://sourceforge.net/projects/pysqlite/


Thanks for the specific suggestion, but it's an approach I'd hoped to
avoid. I'd rather get 2Gig of RAM if I was confident I could make it
work that way... The problem with a file-based database (versus an
in-memory data-structure-based database as I was meaning) is
performance.

Maybe someone has a better suggestion if I give little more info: I
want to iterate through about 10,000 strings (each ~256 characters or
less) looking for occurances (within those 10,000 strings) of any one
of about 500,000 smaller (~30-50 characters average) strings. Then I
generate an XML doc that records which of the 500,000 strings were
found within each of the 10,000 strings.

I guess I am hoping to optimize memory usage in order to avoid
thrashing my drives to death (due to using a file based database). I
can't believe that python requires memory 200% "overhead" to store my
data as lists of lists. I gotta believe I've chosen the wrong
data-type or have mis-applied it somehow??

BTW, my illustration wasn't my input data, it was literally the output
of "print Albums" (with only a couple albums worth of dummy data
loaded). Albums is a list containing a list for each album, which in
turn contains another list for each track on the album, which in-turn
contains a couple of items of data about the given track.

Bottom line question here: Is a 'list' the most efficient data type
to use for storing "arrays" of "arrays" of "arrays" and strings",
given that I'm happy to iterate and don't need any sort of lookup or
search functionality??
Thanks again,
Wes
Jul 18 '05 #4
On 14 Sep 2004 14:55:56 -0700, wc******@sandc.com (Wes Crucius) wrote:
"Frithiof Andreas Jensen" <frithiof.jensen@die_spammer_die.ericsson.com> wrote in message news:<ci**********@newstree.wise.edt.ericsson.se>. ..
"Anon" <an**@ymous.com> wrote in message
news:pa****************************@ymous.com...
> Any data structures suggestions for this application? BTW, the later
> accesses to this database would not benefit in any way from being
> presorted,
You keep saying it yourself: Use a Database ;-).

Databases "knows" about stuff in memory, as well as any searching and
sorting you might dream up later.

Python supports several, the simplest is probably PySQLite:
http://sourceforge.net/projects/pysqlite/


Thanks for the specific suggestion, but it's an approach I'd hoped to
avoid. I'd rather get 2Gig of RAM if I was confident I could make it
work that way... The problem with a file-based database (versus an
in-memory data-structure-based database as I was meaning) is
performance.

Maybe someone has a better suggestion if I give little more info: I
want to iterate through about 10,000 strings (each ~256 characters or
less) looking for occurances (within those 10,000 strings) of any one
of about 500,000 smaller (~30-50 characters average) strings. Then I
generate an XML doc that records which of the 500,000 strings were
found within each of the 10,000 strings.


Need example of raw input and desired output ;-)

Patterns starting at any character? What if you have overlapping matches? I.e.,
if you had a string 'abbcde123', and were looking for 'bb' and 'bc'
would it count as two matches? Would longer matches have priority?
E.g., matching 'bbcd' overrides 'bb and 'cd' as separate matches?

Anyway, let's see what a brute force simple approach does. E.g, just put the
500,000 small strings in a small tree of dictionaries based on say string length, then
the string (you could tier it again with some leading characters and the rest, but I'd
try the easy thing first. But anyway, the idea (practically untested) is something like:

This finds repeating and overlapping patterns, since it was easy. That may not be what
you want, but you didn't say ;-)

----< s500kin10k.py >---------------------------------------
def main(path500k, path10k):
root = {}
for line in file(path500k):
line = line.rstrip()
if not line: continue
root.setdefault(len(line),{})[line] = None
lengths = root.keys()
lengths.sort()
print lengths
print root
# and then walk through your 10k strings of ~256 characters

for nl, line in enumerate(file(path10k)):
line = line.rstrip()
hdr = 'Line %s: %r\n' % (nl, line)
nc = len(line)
for length in lengths:
if length > nc: break
tier1 = root[length]
for nstep in xrange(nc+1-length): # walk a window of length chars
if line[nstep:nstep+length] in tier1:
print '%s %r' % (hdr, line[nstep:nstep+length])
hdr = ''

if __name__ == '__main__':
import sys
if len(sys.argv)!=3: raise SystemExit, """
Usage: s500kin10k.py path500k path10k
where path500k has 500k lines of one string each, and
path10k has 10k lines to search for the 500k in any position.
"""
main(*sys.argv[1:])
------------------------------------------------------------

I guess I am hoping to optimize memory usage in order to avoid
thrashing my drives to death (due to using a file based database). I
can't believe that python requires memory 200% "overhead" to store my
data as lists of lists. I gotta believe I've chosen the wrong
data-type or have mis-applied it somehow??

BTW, my illustration wasn't my input data, it was literally the output
of "print Albums" (with only a couple albums worth of dummy data
loaded). Albums is a list containing a list for each album, which in
turn contains another list for each track on the album, which in-turn
contains a couple of items of data about the given track.

Bottom line question here: Is a 'list' the most efficient data type
to use for storing "arrays" of "arrays" of "arrays" and strings",
given that I'm happy to iterate and don't need any sort of lookup or
search functionality??

I suspect that you won't be that happy iterating through just arrays,
unless they are arrays of patternmatching tables like IIRC flex generates
to do parsing. Basically doing tricky parallel pattern matching by keeping
track of which patterns are alive as you go from character to character.
But 500k patterns are a lot of patterns....Thanks again,


See if the above runs you out of memory. Or time ;-)
I'd try it on shorter files first. You didn't say what you wanted
your final output to look like. Small examples of input => output communicate well ;-)

Regards,
Bengt Richter
Jul 18 '05 #5
On Wed, 15 Sep 2004 02:20:38 +0000, Bengt Richter wrote:
Need example of raw input and desired output ;-) The input is:
- one "csv" file containing a list of albums with their artist and an
AlbumID - one "csv" file containing a list of track names and associated
AlbumID - one big file containing fully-qualified paths to mp3 files

The desired output is:
and XML file with the "proposed" artist, album, and track name for each
file. The file name would be the top level element for a given XML node
and then there would be element groupings of "proposed" artist, album, and
track name for each file.

The last step is to edit the XML file and "select" the desired fields from
list of candidates for each fields, and then use the XML as a work order
for tagging (and possibly renaming) all of the identified files.

Patterns starting at any character? What if you have overlapping
matches? I.e., if you had a string 'abbcde123', and were looking for
'bb' and 'bc' would it count as two matches? Would longer matches have
priority? E.g., matching 'bbcd' overrides 'bb and 'cd' as separate
matches?
Yes, yes, and yes...

Anyway, let's see what a brute force simple approach does. E.g, just put
the 500,000 small strings in a small tree of dictionaries based on say
string length, then the string (you could tier it again with some
leading characters and the rest, but I'd try the easy thing first. But
anyway, the idea (practically untested) is something like:
I had planned to sort the list based upon string length (and probably
apply some minimum length requirement too), but the tree of dictionaries
surely seems a bit more elegant.

This finds repeating and overlapping patterns, since it was easy. That
may not be what you want, but you didn't say ;-)

----< s500kin10k.py >--------------------------------------- def
main(path500k, path10k):
root = {}
for line in file(path500k):
line = line.rstrip()
if not line: continue
root.setdefault(len(line),{})[line] = None
lengths = root.keys()
lengths.sort()
print lengths
print root
# and then walk through your 10k strings of ~256 characters

for nl, line in enumerate(file(path10k)):
line = line.rstrip()
hdr = 'Line %s: %r\n' % (nl, line)
nc = len(line)
for length in lengths:
if length > nc: break
tier1 = root[length]
for nstep in xrange(nc+1-length): # walk a window of length
chars
if line[nstep:nstep+length] in tier1:
print '%s %r' % (hdr, line[nstep:nstep+length])
hdr = ''

if __name__ == '__main__':
import sys
if len(sys.argv)!=3: raise SystemExit, """ Usage: s500kin10k.py
path500k path10k
where path500k has 500k lines of one string each, and path10k
has 10k lines to search for the 500k in any position.
"""
main(*sys.argv[1:])
------------------------------------------------------------

Gee, thanks. That's a nice example!
I suspect that you won't be that happy iterating through just arrays,
unless they are arrays of patternmatching tables like IIRC flex
generates to do parsing. Basically doing tricky parallel pattern
matching by keeping track of which patterns are alive as you go from
character to character. But 500k patterns are a lot of patterns....
Since I only want to do it once in rare while, execution time isn't a huge
deal... I could readily write this code in C and make my data fit within
a reasonable amount of RAM, but I'm trying to use the project as an
opportunity to lean Python.

See if the above runs you out of memory. Or time ;-) I'd try it on
shorter files first. You didn't say what you wanted your final output to
look like. Small examples of input => output communicate well ;-)


Well, basically, I'm trying to use the FreeDB database as a list of valid
artists, albums, and tracks to locate these strings within the file names
of mp3 files and then use those result to apply ID tags to the mp3 files.
I'm assuming that I'll have multiple matches for any given file so my plan
was to output to an XML file, hand 'clean' that, and then use the XML as
the input for the actual tagging operation.

Thanks,
Wes
Jul 18 '05 #6
[Anon]:

Well, basically, I'm trying to use the FreeDB database as a list
of valid artists, albums, and tracks to locate these strings
within the file names of mp3 files and then use those result to
apply ID tags to the mp3 files.


have you considered using musicbrainz instead? it computes a checksum
of the music, not the bits and bytes.

http://www.musicbrainz.org/
--
Kjetil T.
Jul 18 '05 #7

"Wes Crucius" <wc******@sandc.com> wrote in message
news:1d**************************@posting.google.c om...
Thanks for the specific suggestion, but it's an approach I'd hoped to
avoid. I'd rather get 2Gig of RAM if I was confident I could make it
work that way... The problem with a file-based database (versus an
in-memory data-structure-based database as I was meaning) is
performance.


I am not convinced ;-)

The people designing databases have run into exactly that kind of problem
and they have spent a long time studying algorithms, eficcient
datastructures and writing confererence papers on how to cleverly map and
search through said data structures that may not fit in memory - to the
point where I think one will have a hard time doing it better by hand.

PySQLite will try to keep everything in memory, so if you have enough, it
will only hit the disk on opening the database and on comitting a change.
Commit can take a long time though.
Jul 18 '05 #8
can't believe that python requires memory 200% "overhead" to store my
data as lists of lists.


And it almost certainly doesn't.

One one hand there could be a sizebale overhead when running
your program even when loading a single data line.
So don't extrapolate from a small datasets.

But what I believe happens to you is that you keep references
alive and you end up with data duplication.

For example if you do:

lines = file('whatever').readlines()

then you say for example:

stripped_lines = [ line.strip() for line in lines ]

you've just made your program use twice the memory
at that point of the code.

You may also inadvertently keep your 'lines' variable
in the scope of the full program then
you'll effectively end up with a lot more memory consumption
overall.

Check and plug all these holes.

Istvan.
Jul 18 '05 #9
On Tue, Sep 14, 2004 at 04:39:49AM +0000, Anon wrote:
Hello all,

I'm hoping for some guidance here... I am a c/c++ "expert", but a
complete python virgin. I'm trying to create a program that loads in the
entire FreeDB database (excluding the CDDBID itself) and uses this
"database" for other subsequent processing. The problem is, I'm running
out of memory on a linux RH8 box with 512MB. The FreeDB database that I'm
trying to load is presently consisting of two "CSV" files. The first file
contains a "list" of albums with artist name and an arbitrary sequential
album ID an the CDDBID (an ascii-hex representation of a 32-bit value).
The second file contains a list of all of the tracks on each of the
albums, crossreferenced via the album ID. When I load into memory, I
create a python list where each entry in the list is itself a list
representing the data for a given album. The album data list consists of
a small handful of text items like the album title, author, genre, and
year, as well as a list which itself contains a list for each of the track
on the album.

[[<Alb1ID#>, '<Alb1Artist>', '<Alb1Title>', '<Alb1Genre>','<Alb1Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
[<Alb2ID#>, '<Alb2Artist>', '<Alb2Title>', '<Alb2Genre>','<Alb2Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
...
[<AlbNID#>, '<AlbNArtist>', '<AlbNTitle>', '<AlbNGenre>','<AlbNYear>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]]]]


silly question: have you looked into using tuples instead of lists for
the inner objects? They're supposed to be more memory efficient,
although I have found that that isn't necessarily the case.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
You're a card which will have to be dealt with.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBSIMRgPqu395ykGsRAgPgAJ9DqQNAvQI4pnr9JUEshQ/ACThW6QCfYmIJ
2W/vN0+beKvg5IdF2PsJe/I=
=5lCT
-----END PGP SIGNATURE-----

Jul 18 '05 #10
On Wed, 15 Sep 2004 03:44:36 GMT, "Anon" <an**@ymous.com> wrote:
On Wed, 15 Sep 2004 02:20:38 +0000, Bengt Richter wrote:
Need example of raw input and desired output ;-)The input is:
- one "csv" file containing a list of albums with their artist and an
AlbumID - one "csv" file containing a list of track names and associated
AlbumID - one big file containing fully-qualified paths to mp3 files

Is the latter the 10k file to be searched for matches to patterns from the former two?
If so, you could certainly do better than walking through mp3 paths sliding
a pattern window one character at a time, since windows containing path delimiters
are not likely to match anything. IOW, you probably want to split on the delimiters
and also do some text normalization before trying to match.

How about some actual raw data? ;-) I.e., a few lines from the raw csv files and
the mp3 path file. Actual samples will probably raise questions about how the search
patterns are formed and how the searched data may have to be modified to match. E.g.,
removing punctuation, collapsing multiple embedded spaces, case sensitivity, spelling, etc.

The desired output is:
and XML file with the "proposed" artist, album, and track name for each
file. The file name would be the top level element for a given XML node
and then there would be element groupings of "proposed" artist, album, and
track name for each file. XML lets you do stuff like <foo fie="fie" fo="fum /> or <foo><fie>fie</fie><fo>fum</fo></foo>
and variations. You can generate a lot of fluff if you don't watch out. Compare the
economy of python source vs XML representing python source ;-) Do you have
a utility that is going to accept the XML as input, and so is defining the format?

The last step is to edit the XML file and "select" the desired fields from
list of candidates for each fields, and then use the XML as a work order
for tagging (and possibly renaming) all of the identified files.

Patterns starting at any character? What if you have overlapping
matches? I.e., if you had a string 'abbcde123', and were looking for
'bb' and 'bc' would it count as two matches? Would longer matches have
priority? E.g., matching 'bbcd' overrides 'bb and 'cd' as separate
matches?
Yes, yes, and yes...

You'll need a change to make long patterns come out first, but no big deal.
Anyway, let's see what a brute force simple approach does. E.g, just put
the 500,000 small strings in a small tree of dictionaries based on say
string length, then the string (you could tier it again with some
leading characters and the rest, but I'd try the easy thing first. But
anyway, the idea (practically untested) is something like:
I had planned to sort the list based upon string length (and probably
apply some minimum length requirement too), but the tree of dictionaries
surely seems a bit more elegant.

Before that, though, are you going to modify the strings for consistent
white space and punctuation? Or check spellings? )On the other end too, of course).
This finds repeating and overlapping patterns, since it was easy. That
may not be what you want, but you didn't say ;-)

----< s500kin10k.py >--------------------------------------- def [...]Gee, thanks. That's a nice example!

You're welcome.
I suspect that you won't be that happy iterating through just arrays,
unless they are arrays of patternmatching tables like IIRC flex
generates to do parsing. Basically doing tricky parallel pattern
matching by keeping track of which patterns are alive as you go from
character to character. But 500k patterns are a lot of patterns....


Since I only want to do it once in rare while, execution time isn't a huge
deal... I could readily write this code in C and make my data fit within
a reasonable amount of RAM, but I'm trying to use the project as an
opportunity to lean Python.

Sounds like a good way to go. If you are good in C, you can also learn how
to make the best of both worlds by writing your own extension modules in C
that you can import in python.

Don't forget to look into available modules that may help. E.g., the array
module can make arrays of numbers very efficient. mmap can let you look at
a file as if it were a character array, etc. Also re for regular expressions,
which you may need at some point.

See if the above runs you out of memory. Or time ;-) I'd try it on
shorter files first. You didn't say what you wanted your final output to
look like. Small examples of input => output communicate well ;-)


Well, basically, I'm trying to use the FreeDB database as a list of valid
artists, albums, and tracks to locate these strings within the file names
of mp3 files and then use those result to apply ID tags to the mp3 files.
I'm assuming that I'll have multiple matches for any given file so my plan
was to output to an XML file, hand 'clean' that, and then use the XML as
the input for the actual tagging operation.

Again, "locate these strings" is easy to say, but each word hides a lot of
meaning. Messing with a small batch of _representative_ actual data will tell you a lot ;-)

Good luck.

Regards,
Bengt Richter
Jul 18 '05 #11
On Tue, 14 Sep 2004 14:55:56 -0700, Wes Crucius wrote:
Maybe someone has a better suggestion if I give little more info: I
want to iterate through about 10,000 strings (each ~256 characters or
less) looking for occurances (within those 10,000 strings) of any one
of about 500,000 smaller (~30-50 characters average) strings. Then I
generate an XML doc that records which of the 500,000 strings were
found within each of the 10,000 strings.


There may be a good reason for this, but it should be asked: Why not
stream one set of strings or the other in, instead of loading them up
front?

The other obvious answer that isn't cute, but will probably be faster to
program and still run faster then a process that runs into swap is to
break the problem up: Either check some % of the 10,000 in one run and
start over, or check some % of the 500,000 and start over with a new
chunk.

There's nothing wrong with such chunking; when you start asking for
results from datasets that challenge your machine there are many
techniques that seem wasteful but really aren't. One of my personal
canonical examples is something called "iterative deepening depth first
search" for large graphs that may have cycles (which can trap the
depth-first search). On the first run, you search your first level. On the
second, you search down two levels, and so on. By stopping at a certain
depth you ensure that you cover all nodes down to that depth without being
trapped. It sounds horridly wasteful compared to trying to keep track of
the visited nodes, but it turns out that it may only double or less the
execution time while saving vaste swathes of memory. It still boggles my
mind.
Jul 18 '05 #12
On Wed, 15 Sep 2004 14:59:45 +0000, John Lenton wrote:
On Tue, Sep 14, 2004 at 04:39:49AM +0000, Anon wrote:
Hello all,

I'm hoping for some guidance here... I am a c/c++ "expert", but a
complete python virgin. I'm trying to create a program that loads in
the entire FreeDB database (excluding the CDDBID itself) and uses this
"database" for other subsequent processing. The problem is, I'm
running out of memory on a linux RH8 box with 512MB. The FreeDB
database that I'm trying to load is presently consisting of two "CSV"
files. The first file contains a "list" of albums with artist name and
an arbitrary sequential album ID an the CDDBID (an ascii-hex
representation of a 32-bit value). The second file contains a list of
all of the tracks on each of the albums, crossreferenced via the album
ID. When I load into memory, I create a python list where each entry
in the list is itself a list representing the data for a given album.
The album data list consists of a small handful of text items like the
album title, author, genre, and year, as well as a list which itself
contains a list for each of the track on the album.

[[<Alb1ID#>, '<Alb1Artist>', '<Alb1Title>', '<Alb1Genre>','<Alb1Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
[<Alb2ID#>, '<Alb2Artist>', '<Alb2Title>', '<Alb2Genre>','<Alb2Year>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]],
...
[<AlbNID#>, '<AlbNArtist>', '<AlbNTitle>', '<AlbNGenre>','<AlbNYear>',
[["Track1", 1], ["Track2", 2], ["Track3", 3], ..., ["TrackN",N]]]]


silly question: have you looked into using tuples instead of lists for
the inner objects? They're supposed to be more memory efficient,
although I have found that that isn't necessarily the case.


That's exactly the kind of feedback that I was looking for when I asked
the question. However the suggestion that MusicBrainz already does what I
want in a possibly more accurate way looks like it might be an even better
suggestion, leaving me to concentrate my efforts on some other
as-yet-unsolved problem instead!

Thanks!
Wes
Jul 18 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: jong | last post by:
I have a problem with an C# application. The application is a 24x7 low volume message processing server. The server has a single thread of processing, running in a continuous loop, for each...
2
by: Arthur M. | last post by:
Does anyone know of a way to pin point true memory utilization on a per object / allocation basis in dot net. The problem that i'm having is projecting the amount of memory that will be required...
4
by: J. Campbell | last post by:
From reading this forum, it is my understanding that C++ doesn't require the compiler to keep code that does not manifest itself in any way to the user. For example, in the following: { for(int...
1
by: 1944USA | last post by:
I have a C# application written as a Windows Service. It spawns a number of threads, each thread has very large memory intensive processes running on it. I can keep adding threads as long as the...
2
by: rizjabbar | last post by:
I have a memory leak happening... I believe it is due to Dom parser... could anyone help me with this: Do I need a delete??? /////////////////////////////////////////////// //Code on Main HTML...
1
by: nazgul | last post by:
Hi all, I have an app that runs on multiple boxes. On my slackware box, running Python 2.5.1, top shows this: Mem: 1002736k total, 453268k used, 549468k free, 31392k buffers Swap: ...
2
by: =?Utf-8?B?Um9oaXQ=?= | last post by:
..NET is a memory hog - plain and simple. This, in my opinion, makes it less than ideal for embedded applications where memory is constrained. I am forced to use .NET for reasons that I will not...
1
by: Kaheru | last post by:
memory utilization increase? This is because when i try to keep track of the CPU utilization and memory utilization of my FTP server process (ftpserver.exe), the CPU utilization increase, but the...
7
by: balach | last post by:
hi all, i am new in .NET, some body please help me regarding calculating total cpu utilization and RAM memory utilization in percentage (Accurately) on page load event of ASP.NET. i found the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.