473,804 Members | 2,225 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Data Structures Question...

Hi friends,
I am trying to choose the best possible data structure for the probelm
I am going to describe now.

I have lets say tens of thousands of numbers in file1 and tens of
thousands of numbers in another file2. file1 & file2 contents (only
numbers) can be entirely different.

Now the program should be able to read the files and give me a
difference file. Ofcourse this is a very easy implementation if I go
with primitive programming using "ifs" and "whiles".

This is what I need to do. First I need to rearrange the data in file1
in vary compact format so that my program uses as little RAM as
possible. SETS are one way of doing that.

For example if file1 contains 2,3,1,7,9,10,11 ,12,4,6,22 ( In reality
file1 may contain several thousand numbers) then I can rearrange them
like these sets

{1-4} // Numbers 1 to 4
{9-12} // Numbers 9 to 12
{6-7} // Numbers 6 to 7
{22} // Single number 22.

And similraly I need to rearrange contents of file2 in this format.

Now my question is the best possible method for storage of this
datastructure.
Linked Lists, Tree or others or simply use STLs? Also I will be happy
if somebody can give me an idea about the best way of doing
comparision process
between sets of file1 and file2 and produce the difference set in a
number format in a result file "file3".

Thanks,
Sai
Jul 22 '05 #1
6 2619
Saikrishna wrote:
Hi friends,
I am trying to choose the best possible data structure for the probelm
I am going to describe now.

I have lets say tens of thousands of numbers in file1 and tens of
thousands of numbers in another file2. file1 & file2 contents (only
numbers) can be entirely different.

Now the program should be able to read the files and give me a
difference file. Ofcourse this is a very easy implementation if I go
with primitive programming using "ifs" and "whiles".

This is what I need to do. First I need to rearrange the data in file1
in vary compact format so that my program uses as little RAM as
possible. SETS are one way of doing that.

For example if file1 contains 2,3,1,7,9,10,11 ,12,4,6,22 ( In reality
file1 may contain several thousand numbers) then I can rearrange them
like these sets

{1-4} // Numbers 1 to 4
{9-12} // Numbers 9 to 12
{6-7} // Numbers 6 to 7
{22} // Single number 22.

And similraly I need to rearrange contents of file2 in this format.

Now my question is the best possible method for storage of this
datastructure.
Linked Lists, Tree or others or simply use STLs? Also I will be happy
if somebody can give me an idea about the best way of doing
comparision process
between sets of file1 and file2 and produce the difference set in a
number format in a result file "file3".

Thanks,
Sai


Your "sets" are also termed "runs" in a typical Merge Sort
algorithm.

This is possibly an algorithm issue, not a language issue.
My suggestion is to sort both files into a third, then
remove duplicates. Perhaps the folks in news:comp.progr amming
can offer better advice. Follow-ups set.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 22 '05 #2
Saikrishna wrote:
Hi friends,
I am trying to choose the best possible data structure for the probelm
I am going to describe now.

I have lets say tens of thousands of numbers in file1 and tens of
thousands of numbers in another file2. file1 & file2 contents (only
numbers) can be entirely different.

Now the program should be able to read the files and give me a
difference file. Ofcourse this is a very easy implementation if I go
with primitive programming using "ifs" and "whiles".

This is what I need to do. First I need to rearrange the data in file1
in vary compact format so that my program uses as little RAM as
possible. SETS are one way of doing that.

For example if file1 contains 2,3,1,7,9,10,11 ,12,4,6,22 ( In reality
file1 may contain several thousand numbers) then I can rearrange them
like these sets

{1-4} // Numbers 1 to 4
{9-12} // Numbers 9 to 12
{6-7} // Numbers 6 to 7
{22} // Single number 22.

And similraly I need to rearrange contents of file2 in this format.

Now my question is the best possible method for storage of this
datastructure.
Linked Lists, Tree or others or simply use STLs? Also I will be happy
if somebody can give me an idea about the best way of doing
comparision process
between sets of file1 and file2 and produce the difference set in a
number format in a result file "file3".

Thanks,
Sai


Your "sets" are also termed "runs" in a typical Merge Sort
algorithm.

This is possibly an algorithm issue, not a language issue.
My suggestion is to sort both files into a third, then
remove duplicates. Perhaps the folks in news:comp.progr amming
can offer better advice. Follow-ups set.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 22 '05 #3
There are three DIFFerent methods to attack the problem.

The first method is to look at the implementation of the diff utility
for Linux/Unix.
The second is to look at the implemenation of the sequnce-alignment from
biological-computer-science. There they determine the best aligments of
the gene sequences and it may have smth to do with your problem.
The laziest method is to ask in some programming / algorithmics group,
since algorithms/data structures are

O F F - T O P I C

here. That means that people here are not always qualified enough to
reply. (No offence meant.)

--
Best regards,
Alex.

PS. To email me, remove "loeschedie s" from the email address given.
Jul 22 '05 #4
There are three DIFFerent methods to attack the problem.

The first method is to look at the implementation of the diff utility
for Linux/Unix.
The second is to look at the implemenation of the sequnce-alignment from
biological-computer-science. There they determine the best aligments of
the gene sequences and it may have smth to do with your problem.
The laziest method is to ask in some programming / algorithmics group,
since algorithms/data structures are

O F F - T O P I C

here. That means that people here are not always qualified enough to
reply. (No offence meant.)

--
Best regards,
Alex.

PS. To email me, remove "loeschedie s" from the email address given.
Jul 22 '05 #5


Thomas Matthews schrieb:
Saikrishna wrote:
I am trying to choose the best possible data structure for the probelm
I am going to describe now. For example if file1 contains 2,3,1,7,9,10,11 ,12,4,6,22 ( In reality
file1 may contain several thousand numbers) then I can rearrange them
like these sets

{1-4} // Numbers 1 to 4
{9-12} // Numbers 9 to 12
{6-7} // Numbers 6 to 7
{22} // Single number 22.

And similraly I need to rearrange contents of file2 in this format.

Now my question is the best possible method for storage of this
datastructure.
Linked Lists, Tree or others or simply use STLs?


Have a look at the data structure called "discrete interval encoding
tree" or "diet"; I think that's what you're after.

See e.g. http://www.nist.gov/dads/HTML/discretintrv.html

Michael
--
Feel the stare of my burning hamster and stop smoking!
Jul 22 '05 #6


Thomas Matthews schrieb:
Saikrishna wrote:
I am trying to choose the best possible data structure for the probelm
I am going to describe now. For example if file1 contains 2,3,1,7,9,10,11 ,12,4,6,22 ( In reality
file1 may contain several thousand numbers) then I can rearrange them
like these sets

{1-4} // Numbers 1 to 4
{9-12} // Numbers 9 to 12
{6-7} // Numbers 6 to 7
{22} // Single number 22.

And similraly I need to rearrange contents of file2 in this format.

Now my question is the best possible method for storage of this
datastructure.
Linked Lists, Tree or others or simply use STLs?


Have a look at the data structure called "discrete interval encoding
tree" or "diet"; I think that's what you're after.

See e.g. http://www.nist.gov/dads/HTML/discretintrv.html

Michael
--
Feel the stare of my burning hamster and stop smoking!
Jul 22 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1596
by: Amit | last post by:
Hello, Can any of you recommend a really good book on data structures and more so, if it relates to STL data structures, and how they are used to build far more complex data structures. Thanks.
4
3867
by: Thomas Paul Diffenbach | last post by:
Can anyone point me to an open source library of /statically allocated/ data structures? I'm writing some code that would benefit from trees, preferably self balancing, but on an embedded system that doesn't offer dynamic memory allocation (to be clear: no malloc, no realloc), and with rather tight memory constraints. Writing my own malloc to do dynamic allocation from some static pool isn't really an option, for various reasons, not...
11
3194
by: theshowmecanuck | last post by:
As a matter of academic interest only, is there a way to programmatically list the 'c' data types? I am not looking for detail, just if it is possible, and what function could be used to accomplish it. For example: int main void() { while there are more data types { print next data type; }
10
4792
by: Bart Goeman | last post by:
Hi, I have a question about how to put redundant information in data structures, initialized at compile time. This is often necessary for performance reasons and can't be done at run time (data structures are read only) Ideally one should be able to put the redundant information there automatically so no mistakes are possible, but in a lot of case I see no way how to do it.
14
8798
by: SD | last post by:
I am thinking about writing a text editor in C for unix sometime soon. I am just doing this to learn more about C. I want to write something like ed.c, a simple line editor. What types of data structures would be appropriate? I am thinking about using a linked list, but I am also wondering whether a tree would be useful. Please give me your ideas...thanks, tilak
5
1911
by: Shwetabh | last post by:
Hi everyone. My question is, why are data structures implemented only with struct data type? Why not union when it is more efficient as compared with structures? Thanks in advance
13
5263
by: Leszek Taratuta | last post by:
Hello, I have several drop-down lists on my ASP.NET page. I need to keep data sources of these lists in Session State. What would be the most effective method to serialize this kind of data structures? Thanks for any hints, Leszek Taratuta
6
2628
by: James | last post by:
I am using vb.net and need to keep in memory a large data structure, so I am looking for the best option. And after several test I am pretty confused. So I will be grateful if anyone can help me. My basic need is:
11
3789
by: efrat | last post by:
Hello, I'm planning to use Python in order to teach a DSA (data structures and algorithms) course in an academic institute. If you could help out with the following questions, I'd sure appreciate it: 1. What exactly is a Python list? If one writes a, then is the complexity Theta(n)? If this is O(1), then why was the name "list" chosen? If this is indeed Theta(n), then what alternative should be used? (array does not seem suited for...
29
6367
by: Mik0b0 | last post by:
Hallo to everyone. This fall I am going to start data structures as a part of C language course. The problem is I could not find any satisfying tutorial about structures in C. There are plenty of books about data structures in C+ + etc., could anyone please recommend me such a C -specific book ? And another question: are data structures (like stack, structure etc.) used in C++ identical to those in C and is it possible to use C++ books to...
0
9715
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9595
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10603
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10356
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10099
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5536
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3836
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3003
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.