473,387 Members | 1,742 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Memory Map in C++

151 100+
I am working on file indexing and successful in indexing and retrieving the records. But If file is too big having millions of records then it takes to read the index file. Is there any way that index is stored in memory and when i run my program i can use that map?

Manish
Apr 8 '09 #1
5 8772
weaknessforcats
9,208 Expert Mod 8TB
If there are millions of records, why are you not using SQL?

For very large files, indexes may reside partly in memory an partly on disc using a thing called a segment index.

It may be that you are redeveloping the database software industry instead of using already written code. I mean, there's the Access JET Engine, Oracle, MS SQL Server, MySQL, etc...
Apr 8 '09 #2
Man4ish
151 100+
@weaknessforcats
As per my project guidelines i am not supposed to use mysql.
How segmentation index works?Is there any tutorial with sample code, so that being a novice in this area i can use it properly.

Manish
Apr 9 '09 #3
weaknessforcats
9,208 Expert Mod 8TB
OK, but you are in for a lot of work.

First, you need to segment your keys by ranges. Let's say, by the first character of the key. You will build 128 index files.

Second, you will read your millions of records and locate all keys plus the offset from the front of the file. You will build a record consisting of the key and the offset. Records with keys starting with letter 1 go in file 1. Records with keys starting with letter 1 go in file 2.

You now have your millions of records broken into 128 segments.

Third, you now create a 2-3-4 binary tree for each of these 128 files. 2-3-4 trees can be found in any textbook. In this tree you have the key plus offsets in the tree for the children plus the offset for the key in the millions of records.

Fourth, you write the segment index binary tree to a disc file and mange it there rather than in memory. That way you read the millions of records once and create the index file once. Since it is a 2-3-4 tree, it never needs rebalancing.

Fifth, you can now delete the 128 segments keeping only the 128 segment index files.

Six, to access a record, you ask the user for the key. From the first character of the key you open the correct segment index file.

Seven, using your own code, you access the segment index file as a 2-3-4 tree intil you have a complete match on the key. That leaf will have the offset into the millions of records. You then read thay file to get your data.

Eight, you may wish to break your millions of records into segment files to avoid file size limitations.

Nine, I would package all of this code carefully. This would be a complete database access system that you should be able to use for the rest of your life.
I have written these things and my experience says this is about a 1000-hour job.

Ten, add protections for multithreading and you are good to go.

Truly, a SQL solution is your best bet. Any project you work on should not require you to handle millions of records. If it's a business project you would use SQL. If it's a class project, it needs to be much, much simpler.
Apr 9 '09 #4
Man4ish
151 100+
@weaknessforcats
Thank you very much for your detailed reply. I am very much happy to see this answer.I am really thankful to you. And will proceed the way you explained.

Manish
Apr 10 '09 #5
JosAH
11,448 Expert 8TB
I moved this thread from the C/C++ Insights part because it doesn't belong there.

kind regards,

Jos
Apr 10 '09 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Andreas Suurkuusk | last post by:
Hi, I just noticed your post in the "C# memory problem: no end for our problem?" thread. In the post you implied that I do not how the garbage collector works and that I mislead people. Since...
4
by: Frank Esser | last post by:
I am using SQL 8 Personal edition with sp2 applied. I set the max server memory to 32MB and leave the min server memory at 0. When my application starts hitting the database hard the memory usage...
4
by: Franklin Lee | last post by:
Hi All, I use new to allocate some memory,even I doesn't use delete to release them. When my Application exit, OS will release them. Am I right? If I'm right, how about Thread especally on...
9
by: Mike P | last post by:
I know everything about reference counting and making sure you don't have large objects lying around. I have also profiled my app with multiple tools. I know about the fact GC collects memory but...
22
by: xixi | last post by:
hi, we are using db2 udb v8.1 for windows, i have changed the buffer pool size to accommadate better performance, say size 200000, if i have multiple connection to the same database from...
14
by: Alessandro Monopoli | last post by:
Hi all, I'm searching a PORTABLE way to get the available and total physical memory. Something like "getTotalMemory" and it returns the memory installed on my PC in bytes, and...
1
by: Nick Craig-Wood | last post by:
I've been dumping a database in a python code format (for use with Python on S60 mobile phone actually) and I've noticed that it uses absolutely tons of memory as compared to how much the data...
5
by: kumarmdb2 | last post by:
Hi guys, For last few days we are getting out of private memory error. We have a development environment. We tried to figure out the problem but we believe that it might be related to the OS...
1
by: Jean-Paul Calderone | last post by:
On Tue, 22 Apr 2008 14:54:37 -0700 (PDT), yzghan@gmail.com wrote: The test doesn't demonstrate any leaks. It does demonstrate that memory usage can remain at or near peak memory usage even after...
5
by: cham | last post by:
Hi, I am working on c++ in a linux system ( Fedora core 4 ), kernel version - 2.6.11-1.1369_FC4 gcc version - 4.0.0 20050519 ( Red Hat 4.0.0-8 ) In my code i am creating a vector to store...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.