473,378 Members | 1,391 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Best data structure?

I am using vb.net and need to keep in memory a large data structure, so I am
looking for the best option. And after several test I am pretty confused. So
I will be grateful if anyone can help me.

My basic need is:

Keep an "array" (or other structure) of pointers to "arrays" (or other
structure) of bytes. Example:

arrayPointer(1,000,000) each one point to arraByte1(), arrayByte2(),
arrayByte3().., arrayByte1000000().

So it is possible to redim each arrayByte() to the dimension I need.

I supposed that a configuration like the above (suppose each arrayByte hold
1 Byte) needs:

Pointers: 4 bytes x 1 Million = 4 Mb

Arrays: 1 byte per array x 1 Million arrays = 1 Mb

I have trying different option and I obtained a very high memory
comsumption, is there any way to keep memory low and achieve what I need?,

I am using "GC.GetTotalMemory(True)" to find out the memory usage is that
right?

Array of structures

Public Structure stbyte

Public byt() As Byte

End Structure

Memory = GC.GetTotalMemory(True)

Dim intPo(1000000) As stbyte ' 1 Million structures

For intI = 1 To 1000000

ReDim intPo(intI - 1).byt(0) ' Redim to 1 byte each

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000

I obtained Memory= 20 Mbytes

Jagged Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000)() As Byte ' 1 Million Jagged array

For intI = 1 To 1000000

ReDim bytes(intI - 1)(1) ' Each vector to 1 byte

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000

I obtained Memory= 20 Mbytes

Rectangular Array of Bytes

Memory = GC.GetTotalMemory(True)

Dim bytes(1000000, 0) As Byte ' 1 Million array of 1 byte each

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000

I obtained Memory = 1 Mbytes (but in this case I can not redim each row).

Array of BitsArrays

Memory = GC.GetTotalMemory(True)

Dim bitArray(1000000) As BitArray ' 1 Million bitarrays

For intI = 1 To 1000000

bitArray(intI - 1) = New BitArray(8) ' Each bitarray 8 bits

Next

Memory = GC.GetTotalMemory(True) - Memory

Memory = Memory / 1000000

I obtained Memory = 40 Mbytes



Feb 20 '06 #1
6 2600
James,

The first question you have to ask yourself forever with arrays is. "Are all
elements of the array used". If not than in what amount not.

Maybe is by instance the HashTable than a better choise.

However in my opinion is there not to give a general answer based on your
question as it is.

Cor
Feb 20 '06 #2
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this a
Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is: 1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james

"Cor Ligthert [MVP]" <no************@planet.nl> escribió en el mensaje
news:%2****************@tk2msftngp13.phx.gbl...
James,

The first question you have to ask yourself forever with arrays is. "Are
all elements of the array used". If not than in what amount not.

Maybe is by instance the HashTable than a better choise.

However in my opinion is there not to give a general answer based on your
question as it is.

Cor


Feb 20 '06 #3
The big picture may help (is this for things like spreadsheet cells ? What
is the nature of those informations ?).

For example a "sparse array" structure is a combination of elementar data
structures that allows to store only filled in cells. (i.e. you can have a
200x200 cells array but non used cells won't take storage).

Also as a side note that there is AFAIK not a single solution for this (for
example basically it uses a list of rows/columns and each list holds a list
of the cells on the row/column) but as long as the external interface is
exposed as an array (cells(x,y)) you can actually change the actual
implementation with no or minimal impact on the main code...

--
Patrice

"James" <in**@pricetech.es> a écrit dans le message de
news:u4**************@TK2MSFTNGP09.phx.gbl...
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this a Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is: 1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james

"Cor Ligthert [MVP]" <no************@planet.nl> escribió en el mensaje
news:%2****************@tk2msftngp13.phx.gbl...
James,

The first question you have to ask yourself forever with arrays is. "Are
all elements of the array used". If not than in what amount not.

Maybe is by instance the HashTable than a better choise.

However in my opinion is there not to give a general answer based on your question as it is.

Cor


Feb 20 '06 #4
Ok! These are my needs:

I have a list of words (more than a million different words) I keep this
words in a hashtable for easy access. For each of the words I need to keep
in compress (binary form) the documents ID where each word belong (it can be
hundreds of documents for each word). Sometimes I need to add more documents
ID to some of the words. That is:

Words DocumentID
Paris -> doc123, doc234, doc345, doc345,...
London->doc2, doc34, doc234,...
....
So I need to keep in memory (that is what i am looking for low memory usage)
the Words (actully I used a hashtable) and some structure to hold the
document ID's for each word. Sometimes I need to add new documents to some
of the words (normally after the last document for that word), so the data
structure that hold the documents must be able to increase it lenght.

My first thought was to use a jagged array but it consume a lot of overhead
memory:
Dim bytes(1000000)() As Byte ' 1 Million Jagged array
For intI = 1 To 1000000
ReDim bytes(intI - 1)(1) ' Each vector to 1 byte
Next
This consume: 20 Mbytes of data?? when I was just supposing it will consume
1 Mbyte

Any help??


"Patrice" <a@bc.c> escribió en el mensaje
news:%2****************@TK2MSFTNGP12.phx.gbl...
The big picture may help (is this for things like spreadsheet cells ? What
is the nature of those informations ?).

For example a "sparse array" structure is a combination of elementar data
structures that allows to store only filled in cells. (i.e. you can have a
200x200 cells array but non used cells won't take storage).

Also as a side note that there is AFAIK not a single solution for this
(for
example basically it uses a list of rows/columns and each list holds a
list
of the cells on the row/column) but as long as the external interface is
exposed as an array (cells(x,y)) you can actually change the actual
implementation with no or minimal impact on the main code...

--
Patrice

"James" <in**@pricetech.es> a écrit dans le message de
news:u4**************@TK2MSFTNGP09.phx.gbl...
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this

a
Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is:
1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james

"Cor Ligthert [MVP]" <no************@planet.nl> escribió en el mensaje
news:%2****************@tk2msftngp13.phx.gbl...
> James,
>
> The first question you have to ask yourself forever with arrays is.
> "Are
> all elements of the array used". If not than in what amount not.
>
> Maybe is by instance the HashTable than a better choise.
>
> However in my opinion is there not to give a general answer based on your > question as it is.
>
> Cor
>
>




Feb 20 '06 #5
Have you considered using a class object that has a "Word" property and a
Collection of Document ID"s. You would have a class instance for each word.
--
Dennis in Houston
"James" wrote:
Ok! These are my needs:

I have a list of words (more than a million different words) I keep this
words in a hashtable for easy access. For each of the words I need to keep
in compress (binary form) the documents ID where each word belong (it can be
hundreds of documents for each word). Sometimes I need to add more documents
ID to some of the words. That is:

Words DocumentID
Paris -> doc123, doc234, doc345, doc345,...
London->doc2, doc34, doc234,...
....
So I need to keep in memory (that is what i am looking for low memory usage)
the Words (actully I used a hashtable) and some structure to hold the
document ID's for each word. Sometimes I need to add new documents to some
of the words (normally after the last document for that word), so the data
structure that hold the documents must be able to increase it lenght.

My first thought was to use a jagged array but it consume a lot of overhead
memory:
Dim bytes(1000000)() As Byte ' 1 Million Jagged array
For intI = 1 To 1000000
ReDim bytes(intI - 1)(1) ' Each vector to 1 byte
Next
This consume: 20 Mbytes of data?? when I was just supposing it will consume
1 Mbyte

Any help??


"Patrice" <a@bc.c> escribió en el mensaje
news:%2****************@TK2MSFTNGP12.phx.gbl...
The big picture may help (is this for things like spreadsheet cells ? What
is the nature of those informations ?).

For example a "sparse array" structure is a combination of elementar data
structures that allows to store only filled in cells. (i.e. you can have a
200x200 cells array but non used cells won't take storage).

Also as a side note that there is AFAIK not a single solution for this
(for
example basically it uses a list of rows/columns and each list holds a
list
of the cells on the row/column) but as long as the external interface is
exposed as an array (cells(x,y)) you can actually change the actual
implementation with no or minimal impact on the main code...

--
Patrice

"James" <in**@pricetech.es> a écrit dans le message de
news:u4**************@TK2MSFTNGP09.phx.gbl...
Probably the question is not well define. What I need:

I need the less consuming memory data structure that allow me to:

1) Keep a list of "Pointers" or "references" to data structures (for this

a
Hashtable is fine).
2) Each data structure can be resize dynamically, each one with different
sizes.

An example could be:
An array of pointers, each pointer points to an array of bytes. That is:
1
array of M pointers, and M arrays of bytes (each one can hod different
number of bytes).

Thanks,

james

"Cor Ligthert [MVP]" <no************@planet.nl> escribió en el mensaje
news:%2****************@tk2msftngp13.phx.gbl...
> James,
>
> The first question you have to ask yourself forever with arrays is.
> "Are
> all elements of the array used". If not than in what amount not.
>
> Maybe is by instance the HashTable than a better choise.
>
> However in my opinion is there not to give a general answer based on

your
> question as it is.
>
> Cor
>
>



Feb 21 '06 #6
Dennis,

That might not be a bad idea. That class could encapsulate the
compression/decompression of the word and associated document names for
more efficient storage. Sure, there'll be a performance hit, but with
1,000,000 word associations the OP is going to have to get creative
with the data structure. It'll be an excerise in balancing space vs.
performance.

That's one thing you didn't tell us James. I think we have a pretty
good handle on your memory requirements, but what about performance.
Is that important? What is the typical scenario for accessing the data
structure?

Brian

Dennis wrote:
Have you considered using a class object that has a "Word" property and a
Collection of Document ID"s. You would have a class instance for each word.
--
Dennis in Houston


Feb 21 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Vagif Abilov | last post by:
We decided to adopt .NET coding guidelines posted by Brad Abrams from Microsoft: http://blogs.msdn.com/brada/archive/2005/01/26/361369.aspx Here is what Brad (and AFAIK Microsoft) suggests...
9
by: Brian | last post by:
Hello, I have a text file I'm attempting to parse. There are about 50 fixed width fields in each line / row. For example (shortened for brevity): W1234Somebody East 101110001111010101...
0
by: naijacoder naijacoder | last post by:
Here goes the scenario.. The intranet is using windows Auth integrated wih Active Directory. Which is fine and all the appropriate Roles/Groups are set. When a user clicks on a button on a page...
1
by: GeRmIc | last post by:
Hi, I have to pass an array containing a structure from a C# library to a C DLL. Which is the best way to do it? Hashtable or ArrayList or anyother better way. Also, how do i marshal and then...
4
by: Dave | last post by:
(My apologies for posting this on two forums. I have just found out the other one was the incorrect location) I am writing a VB.NET 2003 web application to operate on my company's intranet. It...
5
by: Edward | last post by:
I've been running into issue after issue trying to position text with pure CSS (ie 3-pixel bug, margin bugs, inexplainable side-effects, etc.). I noted that I can just put an HTML table anywhere I...
5
by: Trapulo | last post by:
Hello, I need to send to a webservice a parameter that is a string containing an XML doc. In this xml, a node value came from a byte array (it's an RSA signature). What is the best way to convert...
13
by: G | last post by:
Hello, Looking for opinions on a fairly simple task, new to ASP.net (C#) and want to make sure I do this as efficiently as possible. I have a web based form, and I need to run some SQL before...
6
by: mirandacascade | last post by:
Assume the following: 1) multi-user environment 2) when user opens app, want to run some code that retrieves some information specific to the user...retrieving this information is somewhat i/o...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.