473,385 Members | 1,384 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

moving data in a file without using system memory

Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?

Thanks for help,
John

Jan 9 '06 #1
11 2451
On 2006-01-09, ulyses <ul****@autograf.pl> wrote:
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?

I assume you want to make the swap without using char *, because you
need to use some memory.
One way you could do this is to use two files: f1 and f2.
You can read the first line, character by character and put it in the
f1 file using fgetc and fputc.
Read the second line in the same way and put it in f2.
Create f3 or rewrite the initial file, write the content of f2 (also
with fgetc and fputc) to f3, add the EOL and then add the content
of f1 to f3. Your lines should be swapped. Now, you have to pay
attention at what EOL means both when reading and writing. Also,
given the fact that you have integers, you could use isdigit() (pay
attention to negative numbers).

--
Ioan - Ciprian Tandau
tandau _at_ freeshell _dot_ org (hope it's not too late)
(... and that it still works...)
Jan 9 '06 #2
2938929384902491233.....
923949919199191919112.... File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?


First you have to scan the first 50MB to locate the end-of-line, unless
you happen to know it. That doesn't take a whole lot of memory.

Swapping the lines using a second file is pretty trivial. Just start
copying the second line to the destination. When you're done, rewind the
source and copy the first.

If you can't actually afford a second file, the problem gets more
interesting. You want to slide the second 30MB forward in the file, and
the first 50MB back. Although it's tricky, this too can be done
incrementally, using very little memory. Look up peristaltic in-place
out-of-core permutation.

--
mac the naïf
Jan 9 '06 #3
Nelu <pl****@do.not.spam.me> writes:
On 2006-01-09, ulyses <ul****@autograf.pl> wrote:
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?

I assume you want to make the swap without using char *, because you
need to use some memory.
One way you could do this is to use two files: f1 and f2.
You can read the first line, character by character and put it in the
f1 file using fgetc and fputc.
Read the second line in the same way and put it in f2.
Create f3 or rewrite the initial file, write the content of f2 (also
with fgetc and fputc) to f3, add the EOL and then add the content
of f1 to f3. Your lines should be swapped. Now, you have to pay
attention at what EOL means both when reading and writing. Also,
given the fact that you have integers, you could use isdigit() (pay
attention to negative numbers).


This wouldn't prevent the use of system memory, since any I/O will be
cached by default.

To avoid using system memory you'd have to use the raw device drive on
linux. Perhaps there is something similar on other OSes. Anyways, it's
out of topic on comp.lang.c...

Perhaps the OP could explain what he wants exactly, since it's rather
silly to buy GigaBytes of RAM not to use it later...
--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
Jan 9 '06 #4
In article <87************@thalassa.informatimago.com>,
Pascal Bourguignon <sp**@mouse-potato.com> wrote:
Nelu <pl****@do.not.spam.me> writes:
On 2006-01-09, ulyses <ul****@autograf.pl> wrote:
File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)?
I assume you want to make the swap without using char *, because you
need to use some memory.
One way you could do this is to use two files: f1 and f2.
This wouldn't prevent the use of system memory, since any I/O will be
cached by default. Perhaps the OP could explain what he wants exactly, since it's rather
silly to buy GigaBytes of RAM not to use it later...


My speculation would be that the OP does not want to read the entire
file into memory, as that might be a strain on the system resources.

The OP gave, by the way, no indication that gigabytes of RAM are available
nor that the file contents would be able to fit within the address space
available.
If the OP has numerous rows to exchange, then a "create a new file
each time" algorithm could get rather slow. What might be practical,
though, is to create a couple of I/O wrapper routines that kept
track of the files and which "logically current" portions of the files
are physically somewhere else (probably easiest in this circumstance
if the I/O wrappers operated at the "record" level.) Then, at the
end, create as many temporary files as there were files logically
written to, and for each logically-written file do a linear run
pulling the data out of the original data files in the appropriate order;
when the temporary files are fully generated, then either rename
them into existance where the original data files were, or else
[e.g., for permissions or NFS reasons] then copy the data out of the
temporary files into the original files and trunc() the original files
and throw away the temporary files.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Jan 9 '06 #5
"ulyses" <ul****@autograf.pl> writes:
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?


Do you mean that your file is a text file where each line consists of
a sequence decimal digits representing a large integer?

If so, for purposes of you problem, you can just treat them as
sequences of characters (which happen to be decimal digits); the fact
that they represent large integers is irrelevant.

Can the file contain arbitrarily many lines?

Here's one possible approach. Read the input file, keeping track of
where each new line starts. Use ftell() or fgetpos() and build a list
or array of indices.

Then traverse your index in reverse order. For each index, use
fseek() or fsetpos() to jump to that location in the file; read up to
the end of the current line and write to your output file.

(Note that some systems may provide a command to do this -- and 50MB
isn't all that much memory these days. Since you mentioned
Unix/Linux, <OT>"man tac" or "info tac"</OT>.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jan 9 '06 #6

"ulyses" <ul****@autograf.pl> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?


It's not clear what you mean by "without using system memory". Why
wouldn't you want to use memory if this results in faster operation?

For example, you could move 16Kb at a time, and rig it to use very
little system memory. But then the head of the disk would have to jump back
and forth 5,000 times. It would almost certainly make more sense to use more
memory.

Are you concerned about consumption of system cache?

It's a bit tricky, but there are exchange algorithms. You can tweak them
to minimize user memory consumption and then let the system figure out how
much cache is appropriate. You can hint to the cache to help with that.

DS

Jan 11 '06 #7
David Schwartz napisal(a):
"ulyses" <ul****@autograf.pl> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?


It's not clear what you mean by "without using system memory". Why
wouldn't you want to use memory if this results in faster operation?


I thouhg about something that would be some kind of low level function
that would modify pointers not real data, the same way as we do with
pointer to data in memory, e.g.:
int a*, b*, temp*;
....
//the swap
temp = a;
a = b;
b = temp;

And the data was swapped without moving it in memory. I asked if there
is such functionality that would enable me to do something like that
but with rows in file. Swap them without moving them on disk.

Thanks again for help,
John

Jan 12 '06 #8
In article <11**********************@g44g2000cwa.googlegroups .com>,
"ulyses" <ul****@autograf.pl> wrote:
David Schwartz napisal(a):
"ulyses" <ul****@autograf.pl> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
Let's assume I have following file:

2938929384902491233.....
923949919199191919112....

File contains INTs only. What is more they are huge. For example first
row in file may contain integer which size is 50MB and the second 30MB.
Now we come to my problem. Is there possibility to swap this rows
without using system memory (preferably in Unix/Linux)? Is there any
function in C to do this?


It's not clear what you mean by "without using system memory". Why
wouldn't you want to use memory if this results in faster operation?


I thouhg about something that would be some kind of low level function
that would modify pointers not real data, the same way as we do with
pointer to data in memory, e.g.:
int a*, b*, temp*;
...
//the swap
temp = a;
a = b;
b = temp;

And the data was swapped without moving it in memory. I asked if there
is such functionality that would enable me to do something like that
but with rows in file. Swap them without moving them on disk.


No, because the information in files is not organized by lines, it's
just sequences of bytes organized by disk blocks. If you wanted to
rearrange the blocks it would theoretically be possible to update the
block pointers in the inode. However, there's no API for this, so you
would have to do it by writing a new ioctl or device driver, or by
accessing the disk device directly (and you'd need to unmount the
filesystem first, to avoid conflicts with the kernel's in-memory copies
of inodes).

But to rearrange lines you have to read the file into memory, search for
the newline characters, then write the lines back out to the file.

--
Barry Margolin, ba****@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Jan 13 '06 #9
Barry Margolin <ba****@alum.mit.edu> writes:
In article <11**********************@g44g2000cwa.googlegroups .com>,
"ulyses" <ul****@autograf.pl> wrote:
David Schwartz napisal(a):
> "ulyses" <ul****@autograf.pl> wrote in message
> news:11*********************@g44g2000cwa.googlegro ups.com...
>
> > Let's assume I have following file:
> >
> > 2938929384902491233.....
> > 923949919199191919112....
> >
> > File contains INTs only. What is more they are huge. For example first
> > row in file may contain integer which size is 50MB and the second 30MB.
> > Now we come to my problem. Is there possibility to swap this rows
> > without using system memory (preferably in Unix/Linux)? Is there any
> > function in C to do this?
>
> It's not clear what you mean by "without using system memory". Why
> wouldn't you want to use memory if this results in faster operation?
>


I thouhg about something that would be some kind of low level function
that would modify pointers not real data, the same way as we do with
pointer to data in memory, e.g.:
int a*, b*, temp*;
...
//the swap
temp = a;
a = b;
b = temp;

And the data was swapped without moving it in memory. I asked if there
is such functionality that would enable me to do something like that
but with rows in file. Swap them without moving them on disk.


No, because the information in files is not organized by lines, it's
just sequences of bytes organized by disk blocks. If you wanted to
rearrange the blocks it would theoretically be possible to update the
block pointers in the inode. However, there's no API for this, so you
would have to do it by writing a new ioctl or device driver, or by
accessing the disk device directly (and you'd need to unmount the
filesystem first, to avoid conflicts with the kernel's in-memory copies
of inodes).

But to rearrange lines you have to read the file into memory, search for
the newline characters, then write the lines back out to the file.


If you want to rearrange the lines so you can read the rearranged file
with ordinary stdio calls, you'll need to physically re-write the file
(unless you can manage to do some nasty low-level file system stuff).
But if you want to be able to access the lines in some specified order
other than their physical order in the file, you can just create a
separate index. Do one pass over the file, creating an index of the
position of the start of each line (using ftell() or fgetpos()). Once
you have the index, you can use fseek() or fsetpos() to jump directly
to the beginning of any line you want.

Reading the file in reverse order is likely to be less efficient than
if you had physically reversed the file; the performance tradeoff
depends on how often you read it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jan 13 '06 #10
"ulyses" <ul****@autograf.pl> writes:
David Schwartz napisal(a):
"ulyses" <ul****@autograf.pl> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
> Let's assume I have following file:
>
> 2938929384902491233.....
> 923949919199191919112....
>
> File contains INTs only. What is more they are huge. For example first
> row in file may contain integer which size is 50MB and the second 30MB.
> Now we come to my problem. Is there possibility to swap this rows
> without using system memory (preferably in Unix/Linux)? Is there any
> function in C to do this?


It's not clear what you mean by "without using system memory". Why
wouldn't you want to use memory if this results in faster operation?


I thouhg about something that would be some kind of low level function
that would modify pointers not real data, the same way as we do with
pointer to data in memory, e.g.:
int a*, b*, temp*;
...
//the swap
temp = a;
a = b;
b = temp;

And the data was swapped without moving it in memory. I asked if there
is such functionality that would enable me to do something like that
but with rows in file. Swap them without moving them on disk.


You can build an index yourself.

Read the data file byte by byte, and note the offset of all newline
characters. Save the list of offsets to an index file.

Later you can read the index file, and when you want to read the nth
number, you seek in the data file to the nth offset.

If you want to swap two numbers, you just swap the two offsets in the
index, which involves reading and writing at most two blocks. You can
also easily "delete" or "insert" numbers. To "delete" a number you
just remove its offset in the index (or put it in a "free list" if you
want to be able to reuse the space). To "insert" a number, you just
note the file size as the offset to the new number which you merely
append to the end of the file, and insert the offset in the index.
--
__Pascal Bourguignon__ http://www.informatimago.com/

"Debugging? Klingons do not debug! Our software does not coddle the
weak."
Jan 13 '06 #11
In article <87************@thalassa.informatimago.com>,
Pascal Bourguignon <sp**@mouse-potato.com> wrote:
You can build an index yourself.

Read the data file byte by byte, and note the offset of all newline
characters. Save the list of offsets to an index file.


Even better would be if you could get the program that creates the file
in the first place to create an index at the same time.

--
Barry Margolin, ba****@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Jan 14 '06 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: HeroOfSpielburg | last post by:
hi, I know this is a much-travelled subject, but I was wondering what people's thoughts were on the bare minimum (and conversely the grand scheme) for augmenting standard memory references to...
8
by: CAFxX | last post by:
i'm writing a program that executes some calculations on a bitmap loaded in memory. these calculation ends up with pixel wth values far over 255, but i need them to be between 0 and 255 since i...
6
by: Hemant Shah | last post by:
Folks, I need to move HOME directory of an instance to another directory. What is the best way of doing it? Is changing password file enough? or dies DB2 store this info in it's own config? ...
4
by: Thomas Paul Diffenbach | last post by:
Can anyone point me to an open source library of /statically allocated/ data structures? I'm writing some code that would benefit from trees, preferably self balancing, but on an embedded system...
19
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The...
18
by: steve.anon | last post by:
Hi I'm a Java developer moving to windows only applications. Of course the first thing I thought was "well at least, without the VM now I can write desktop applications that run real fast". So I...
7
by: =?Utf-8?B?TW9iaWxlTWFu?= | last post by:
Hello everyone: I am looking for everyone's thoughts on moving large amounts (actually, not very large, but large enough that I'm throwing exceptions using the default configurations). We're...
19
by: Zytan | last post by:
I want multiple instances of the same .exe to run and share the same data. I know they all can access the same file at the same time, no problem, but I'd like to have this data in RAM, which they...
5
by: DR | last post by:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.