473,396 Members | 1,714 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

reading tabulated files

hi everyone, I'm doing a project and it requires comparing entries on a
file. (the entries are separated by \r). i need to compare the first to
the second, then to the third etc. the same thing needs to be done with
the second entry (compared to the third, fourth etc. i want to do it
strait from the file, since it is a large amount of data. how do i
"remember" the place i have last read on the file? or perhaps how can i
remember the locations of all entries?
thanks in advance!!!

Sep 18 '05 #1
4 1561
davario wrote:
hi everyone, I'm doing a project and it requires comparing entries on a
file. (the entries are separated by \r). i need to compare the first to
the second, then to the third etc. the same thing needs to be done with
the second entry (compared to the third, fourth etc. i want to do it
strait from the file, since it is a large amount of data. how do i
"remember" the place i have last read on the file? or perhaps how can i
remember the locations of all entries?
thanks in advance!!!


tellg and seekg are the methods for manipulating the position of a file.
tellg tells you where you are in a file, and seekg moves the file to a
new place.

But really this sounds horrendous, if the amount of data is so big that
you can't load it into memory then this is going to take days to
execute. If the amount of data is small enough to load into memory you
should.

But the real problem is the algorithm. Suppose you have 10,000 data
items, then you are going to have to do 50,000,000 (approx) comparisons.
Suppose you have 100,000 data items then that rises to 5,000,000,000
(approx) comparisons.

Since I don't know what you are comparing and why it's hard to suggest
improvements but you might consider sorting the data before you start
doing comparisons.

john
Sep 18 '05 #2
I am sorting DNA sequences, and i have around 4000 sequences to
caompare.
they are each quite big, and when i tried to load them all into memory
to use by matlab it took ages and did'nt work too well, so i thought it
might be better to load them two at a time.

Sep 18 '05 #3
> hi everyone, I'm doing a project and it requires comparing entries on a
file. (the entries are separated by \r). i need to compare the first to
the second, then to the third etc. the same thing needs to be done with
the second entry (compared to the third, fourth etc. i want to do it
strait from the file, since it is a large amount of data. how do i
"remember" the place i have last read on the file? or perhaps how can i
remember the locations of all entries?
thanks in advance!!!


If your comparison involves only checking for equality/inequality you may
precalculate hash values for sequences and compare them. You can easily hold
4000 hash values in memory and it will work in a snap.

If hash values compare false it means that sequences are different, if they
compare true the sequences _might_ be equal, so then and only then you
compare the sequences themselves. The hash function you use is almost
irrelevant, it can probably be very simple like sum of all bytes in
sequence.

cheers,
Marcin
Sep 18 '05 #4
davario wrote:
I am sorting DNA sequences, and i have around 4000 sequences to
caompare.
they are each quite big, and when i tried to load them all into memory
to use by matlab it took ages and did'nt work too well, so i thought it
might be better to load them two at a time.


You are sorting using the process you described in your first email??? I
would say that the reason it took ages when you loaded them all into
memory was that you are using the wrong algorithm. Loading them two at a
time is not going to make things any better (in fact it will be worse).

This is the wrong way to do it. There are vastly more efficient ways to
sort data. If you really want to do this without reading the data into
memory then you should look at an algorithm called merge sort. If you
can read the data into memory then use an algorithm called quick sort.
This is the preferable option, but both of these will be hugely more
efficient than what you are proposing.

There is a great deal of liturature on sorting techniques so a little
research should turn up something very quickly. C++ even has quick sort
as part of it's standard library, so you won't even have to code the
algorithm.

John
Sep 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Olivier Maurice | last post by:
Hi all, I suppose some of you know the program Redmon (type redmon in google, first result). This neat little tool allows to hook up any functionality to a printer by putting the file printed...
4
by: aaronfude | last post by:
Hi, This is not really a C/C++ question. Suppose I have a function double f(double x) that takes a long time to compute. Is there a tool available that would take this function and would...
19
by: Lionel B | last post by:
Greetings, I need to read (unformatted text) from stdin up to EOF into a char buffer; of course I cannot allocate my buffer until I know how much text is available, and I do not know how much...
1
by: Magnus | last post by:
allrite folks, got some questions here... 1) LAY-OUT OF REPORTS How is it possible to fundamentaly change the lay-out/form of a report in access? I dont really know it that "difficult", but...
6
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...
2
by: nnimod | last post by:
Hi. I'm having trouble reading some unicode files. Basically, I have to parse certain files. Some of those files are being input in Japanese, Chinese etc. The easiest way, I figured, to distinguish...
7
by: jccorreu | last post by:
I've got to read info from multiple files that will be given to me. I know the format and what the data is. The thing is each time we run the program we may be using a differnt number of files,...
4
by: chengsi | last post by:
Hi all, This is my first post, so please excuse if i am posting a silly question. I have an MS Access database and have created a tabulated form (where there are multiple lines, one for each...
5
blazedaces
by: blazedaces | last post by:
Ok, so you know my problem, java is running out of memory reading with SAX, the event-based xml parser intended more-so than DOM for extremely large files. I'll try to explain what I've been doing...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.