I am working on file indexing and successful in indexing and retrieving the records. But If file is too big having millions of records then it takes to read the index file. Is there any way that index is stored in memory and when i run my program i can use that map?
Manish
5 8772
If there are millions of records, why are you not using SQL?
For very large files, indexes may reside partly in memory an partly on disc using a thing called a segment index.
It may be that you are redeveloping the database software industry instead of using already written code. I mean, there's the Access JET Engine, Oracle, MS SQL Server, MySQL, etc...
@weaknessforcats
As per my project guidelines i am not supposed to use mysql.
How segmentation index works?Is there any tutorial with sample code, so that being a novice in this area i can use it properly.
Manish
OK, but you are in for a lot of work.
First, you need to segment your keys by ranges. Let's say, by the first character of the key. You will build 128 index files.
Second, you will read your millions of records and locate all keys plus the offset from the front of the file. You will build a record consisting of the key and the offset. Records with keys starting with letter 1 go in file 1. Records with keys starting with letter 1 go in file 2.
You now have your millions of records broken into 128 segments.
Third, you now create a 2-3-4 binary tree for each of these 128 files. 2-3-4 trees can be found in any textbook. In this tree you have the key plus offsets in the tree for the children plus the offset for the key in the millions of records.
Fourth, you write the segment index binary tree to a disc file and mange it there rather than in memory. That way you read the millions of records once and create the index file once. Since it is a 2-3-4 tree, it never needs rebalancing.
Fifth, you can now delete the 128 segments keeping only the 128 segment index files.
Six, to access a record, you ask the user for the key. From the first character of the key you open the correct segment index file.
Seven, using your own code, you access the segment index file as a 2-3-4 tree intil you have a complete match on the key. That leaf will have the offset into the millions of records. You then read thay file to get your data.
Eight, you may wish to break your millions of records into segment files to avoid file size limitations.
Nine, I would package all of this code carefully. This would be a complete database access system that you should be able to use for the rest of your life.
I have written these things and my experience says this is about a 1000-hour job.
Ten, add protections for multithreading and you are good to go.
Truly, a SQL solution is your best bet. Any project you work on should not require you to handle millions of records. If it's a business project you would use SQL. If it's a class project, it needs to be much, much simpler.
@weaknessforcats
Thank you very much for your detailed reply. I am very much happy to see this answer.I am really thankful to you. And will proceed the way you explained.
Manish
I moved this thread from the C/C++ Insights part because it doesn't belong there.
kind regards,
Jos
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Andreas Suurkuusk |
last post by:
Hi,
I just noticed your post in the "C# memory problem: no end for our problem?"
thread.
In the post you implied that I do not how the garbage collector works and
that I mislead people. Since...
|
by: Frank Esser |
last post by:
I am using SQL 8 Personal edition with sp2 applied. I set the max
server memory to 32MB and leave the min server memory at 0. When my
application starts hitting the database hard the memory usage...
|
by: Franklin Lee |
last post by:
Hi All,
I use new to allocate some memory,even I doesn't use delete to release them.
When my Application exit, OS will release them.
Am I right?
If I'm right, how about Thread especally on...
|
by: Mike P |
last post by:
I know everything about reference counting and making sure you don't have
large objects lying around. I have also profiled my app with multiple tools.
I know about the fact GC collects memory but...
|
by: xixi |
last post by:
hi, we are using db2 udb v8.1 for windows, i have changed the buffer
pool size to accommadate better performance, say size 200000, if i
have multiple connection to the same database from...
|
by: Alessandro Monopoli |
last post by:
Hi all,
I'm searching a PORTABLE way to get the available and total physical memory.
Something like "getTotalMemory" and it returns the memory installed on my PC
in bytes, and...
|
by: Nick Craig-Wood |
last post by:
I've been dumping a database in a python code format (for use with
Python on S60 mobile phone actually) and I've noticed that it uses
absolutely tons of memory as compared to how much the data...
|
by: kumarmdb2 |
last post by:
Hi guys,
For last few days we are getting out of private memory error. We have a
development environment. We tried to figure out the problem but we
believe that it might be related to the OS...
|
by: Jean-Paul Calderone |
last post by:
On Tue, 22 Apr 2008 14:54:37 -0700 (PDT), yzghan@gmail.com wrote:
The test doesn't demonstrate any leaks. It does demonstrate that memory
usage can remain at or near peak memory usage even after...
|
by: cham |
last post by:
Hi,
I am working on c++ in a linux system ( Fedora core 4 ),
kernel version - 2.6.11-1.1369_FC4
gcc version - 4.0.0 20050519 ( Red Hat 4.0.0-8 )
In my code i am creating a vector to store...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |