473,387 Members | 1,512 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Reading a file randomly to find the value in given range

151 100+
I have one pblm for reading a file randomly for searching the value with in given range.
e.g.
offset allele id
19 G/T 2066803
20 C/T 2066804
13 A/G 2066805
12 A/G 2066927

In the above file i have to search all the records which are in range (10,15). If i read whole file sequentially it will consume lot of time since data file is around 1GB. So I am planning to read it randomly for searching the offset with in the given range. I know that by using seekg() and tellg(), we can find the exact value randomly but how it is possible to find within range?
Thanks in advance.

Manish
Mar 26 '09 #1
7 2253
donbock
2,426 Expert 2GB
I am skeptical about performing a random search. There is no predictability in a random algorithm. Two successive searches for the same key will have markedly different response times.

Search efficiency is significantly improved if the file is sorted by the same key that you intend to search by. In your example, what is the search key? Do you need to support searching by different keys at different times?

My age is showing, but I can't think of a better reference for sorting and searching algorithms than the classic "The Art of Computer Programming . Vol 3: Sorting and Searching" by Donald Knuth.
Mar 26 '09 #2
Man4ish
151 100+
@donbock
My Data is sorted the first column (id) is key.
offset allele id

12 A/G 2066927
13 A/G 2066805
19 G/T 2066803
20 C/T 2066804

Then How all the records within range (10 ,15) can be retrieved?

Regards
Manish
Mar 26 '09 #3
donbock
2,426 Expert 2GB
Is your file amenable to binary search? The typical criteria is whether or not the records are fixed-size.

Suppose each record is 16 bytes; your 1GB file contains 64M records. Binary search of this many records takes at most 26 comparisons.

If your ranges are generally pretty small then you could use a brute force strategy: search for the smallest limit of the range; if you don't find it then repeat the search for the next value in the range until you either find a match or exhaust the range. If you find a match, then read each successive record until you find a value outside the range.

You can improve on brute force if you retain and take advantage of the search trajectory of the previous [unmatched] value.
Mar 26 '09 #4
Man4ish
151 100+
@donbock
My Key may range from 1 to billiion. so records will be like
432423 A/T 4789321
4314441 G/T 4787654
47893217 C/T 7487489

other fields will be of fixed range.My range of searching may vary from 1 to billion also.Then still can we use the brute force algorithm.
Mar 26 '09 #5
gpraghuram
1,275 Expert 1GB
I think You posted a similar thread for retreiving record based on a key?
Is it related to that or this is new?

Raghu
Mar 26 '09 #6
Man4ish
151 100+
@gpraghuram
It is similar but i don't want to read the whole file i just want to access the data randomly so that i can reduce the time. I have used the method as suggested. It it is working fine for small file upto 1 MB but when i am trying the same method on 1GB file it is taking 4s which is to much i want it should be in .5 seconds.Please suggest if there is any other method or same method with better performance related to it.

Regards
Manish
Mar 26 '09 #7
weaknessforcats
9,208 Expert Mod 8TB
Random accesses are not going to work.

You say your file is sorted by key and that you have a lot of keys?

I sadio earlier that you need to create an index of this file. That requires reading the entire file to build he index. I also said you probably need a multimap<> of pair<> objects as your index structure.

Once the index is built you can reuse it without re-indexing.

All large databases are not read from beginning to end to locate items wirthin a range. They all use indexes.

Here is where you should get a book on database design.

OR....

Consider using SQL rather than work with the file yourself. Access is available as is MySQL, SQL Server, Oracle, etc... Plus there are database SDKs that let you use alreqady written database access code.

Unless this is a home work assignment, no one writes this code from scratch any more.
Mar 26 '09 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: jester.dev | last post by:
Hello, I'm learning Python from Python Bible, and having some problems with this code below. When I run it, I get nothing. It should open the file poem.txt (which exists in the current...
108
by: Bryan Olson | last post by:
The Python slice type has one method 'indices', and reportedly: This method takes a single integer argument /length/ and computes information about the extended slice that the slice object would...
0
by: Microsoft | last post by:
I have a porgram that opens an excel workbook and modifies some data, but almost randomly a box will pop up in the workbook telling me the file is now available and to hit ok or cancel. This...
18
by: dfetrow410 | last post by:
Anyone have some code that will do this? Dave
67
by: PC Datasheet | last post by:
Transaction data is given with date ranges: Beginning End 4/1/06 4/4/06 4/7/06 4/11/06 4/14/06 4/17/06 4/18/06 4/21/06 426/06 ...
1
by: laredotornado | last post by:
Hi, I'm using PHP 4.4.4 on Apache 2 on Fedora Core 5. PHP was installed using Apache's apxs and the php library was installed to /usr/local/php. However, when I set my "error_reporting"...
10
by: Tyler | last post by:
Hello All: After trying to find an open source alternative to Matlab (or IDL), I am currently getting acquainted with Python and, in particular SciPy, NumPy, and Matplotlib. While I await the...
3
by: Bharathi | last post by:
Hi, I got strucked with reading date value from excel file using C#.NET. For Jan-2000 the value I am getting is 36526.0. For all other dates also I am getting some double value like this. ...
0
by: ASPnewb1 | last post by:
Hello, I'm trying to execute a find command on my Excel obj to find if a value exists in only one column, not the whole worksheet, what I had before that works (if the whole worksheet is...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.