470,855 Members | 1,179 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,855 developers. It's quick & easy.

Sliding window

Let me rephrase my question that hasn't been answered properly. I want
to read a file into n-bit chunks of data _extremely fast_, process the
chunks and finally write them back into a different file. The problem
is that usually people suggest algorithms that are slow.

I'm thinking that the possible algorithm should first read the file
into a buffer (in computer memory), which is large w.r.t. length of
the individiual chunk of bits. Secondly, one should try to avoid slow
operations such as "for". Maybe operations like memmove or memcopy
would do the trick, but how?

So, basically, one could think that the algorithm is like a window
that goes through a file sequentially showing at a time only the n-bit
chunk. There isn't a trivial solution, because computers process bytes
instead of bits and the algorithm has to be state-of-art.

Any suggestions? Thank you.
Jun 27 '08 #1
22 3843

"filia&sofia" <in*********@hotmail.comwrote in message
news:2a**********************************@q10g2000 prf.googlegroups.com...
Let me rephrase my question that hasn't been answered properly. I want
to read a file into n-bit chunks of data _extremely fast_, process the
chunks and finally write them back into a different file. The problem
is that usually people suggest algorithms that are slow.

I'm thinking that the possible algorithm should first read the file
into a buffer (in computer memory), which is large w.r.t. length of
the individiual chunk of bits. Secondly, one should try to avoid slow
operations such as "for". Maybe operations like memmove or memcopy
would do the trick, but how?

So, basically, one could think that the algorithm is like a window
that goes through a file sequentially showing at a time only the n-bit
chunk. There isn't a trivial solution, because computers process bytes
instead of bits and the algorithm has to be state-of-art.

Any suggestions? Thank you.
How big are typical chunks? A typical file?

When read into memory, do you want them byte-aligned? (If not, just reading
the entire file into memory will also read all the chunks into memory; then
you could process them in-situ, perhaps using a special pointer format that
includes bitfield info)

memmove and similar functions are all byte-aligned. If the chunks are big,
you can use memmove etc on the main portion, but I guess you may want the
chunk bit-shifted (so it starts on the beginning of the first byte, but may
still have unused bits in the last byte)

Will the alignment on output be the same as in the input?

I don't have any actual answers, but sometimes providing more details such
as these can elicit better help.

--
Bart

Jun 27 '08 #2
"filia&sofia" <in*********@hotmail.comwrites:
Let me rephrase my question that hasn't been answered properly. I want
to read a file into n-bit chunks of data _extremely fast_, process the
chunks and finally write them back into a different file. The problem
is that usually people suggest algorithms that are slow.

I'm thinking that the possible algorithm should first read the file
into a buffer (in computer memory), which is large w.r.t. length of
the individiual chunk of bits. Secondly, one should try to avoid slow
operations such as "for". Maybe operations like memmove or memcopy
would do the trick, but how?

So, basically, one could think that the algorithm is like a window
that goes through a file sequentially showing at a time only the n-bit
chunk. There isn't a trivial solution, because computers process bytes
instead of bits and the algorithm has to be state-of-art.

Any suggestions? Thank you.
Yes. Buy a good book on programming and programming C in particular.

This is generally considered the bible:

http://www.amazon.com/Programming-La.../dp/0131103628
Jun 27 '08 #3
Nice to have help.
How big are typical chunks? A typical file?
Actually, there are no typical chunks or typical file. The length of
the chunks are dynamically assigned and typical file is any file e.g.
game setup file or DVD movie.
When read into memory, do you want them byte-aligned? (If not, just reading
the entire file into memory will also read all the chunks into memory; then
you could process them in-situ, perhaps using a special pointer format that
includes bitfield info)

memmove and similar functions are all byte-aligned. If the chunks are big,
you can use memmove etc on the main portion, but I guess you may want the
chunk bit-shifted (so it starts on the beginning of the first byte, but may
still have unused bits in the last byte)
I want only the bits that exist in the file originally. The length of
the chunks should be ceil(n/8) bytes. Also, the bits that do not hold
any information are located at the "Most Significant Bit" end. In
short, the read information (in bits) is represented in bytes.

For example, if the file has: "0110 0100 1111 1101 0011 0101 0101 1100
0000" and n=9.
Then, 1st chunk is (without padding) "0110 0100 1". 2nd is "111 1101
00". 3rd is "11 0101 010" and 4th is "1 1100 0000".
Now, we need two bytes to represent nine bits. We pad all the chunks
with zeroes, e.g. 1st becomes "0000000 011001001" etc.
Will the alignment on output be the same as in the input?
The output file is same as the input file: the algorithm has to remove
padded zeroes and join chunks in the way that preserves information in
overlapping bytes. At this point, the algorithm should produce
identical files in a extremely fast manner.

Hopefully, this additional information helps.
Jun 27 '08 #4
On 13 huhti, 01:25, Richard <de...@gmail.comwrote:
"filia&sofia" <in_tyran...@hotmail.comwrites:
Let me rephrase my question that hasn't been answered properly. I want
to read a file into n-bit chunks of data _extremely fast_, process the
chunks and finally write them back into a different file. The problem
is that usually people suggest algorithms that are slow.
I'm thinking that the possible algorithm should first read the file
into a buffer (in computer memory), which is large w.r.t. length of
the individiual chunk of bits. Secondly, one should try to avoid slow
operations such as "for". Maybe operations like memmove or memcopy
would do the trick, but how?
So, basically, one could think that the algorithm is like a window
that goes through a file sequentially showing at a time only the n-bit
chunk. There isn't a trivial solution, because computers process bytes
instead of bits and the algorithm has to be state-of-art.
Any suggestions? Thank you.

Yes. Buy a good book on programming and programming C in particular.

This is generally considered the bible:

http://www.amazon.com/Programming-La...l-Software/dp/...
I've already read this book. It's a good book to learn basics. But for
example it doesn't consider at all the complexity of the algorithms or
optimization of the code. But thanks for the effort.
Jun 27 '08 #5

"filia&sofia" <in*********@hotmail.comwrote in message
news:6f**********************************@b9g2000p rh.googlegroups.com...
I want only the bits that exist in the file originally. The length of
the chunks should be ceil(n/8) bytes. Also, the bits that do not hold
any information are located at the "Most Significant Bit" end. In
short, the read information (in bits) is represented in bytes.

For example, if the file has: "0110 0100 1111 1101 0011 0101 0101 1100
0000" and n=9.
Then, 1st chunk is (without padding) "0110 0100 1". 2nd is "111 1101
00". 3rd is "11 0101 010" and 4th is "1 1100 0000".
Now, we need two bytes to represent nine bits. We pad all the chunks
with zeroes, e.g. 1st becomes "0000000 011001001" etc.
>Will the alignment on output be the same as in the input?

The output file is same as the input file: the algorithm has to remove
padded zeroes and join chunks in the way that preserves information in
overlapping bytes. At this point, the algorithm should produce
identical files in a extremely fast manner.

Hopefully, this additional information helps.
Actually, it's made things more confusing! I've looked at your earlier post
and from that it seemed you need some sort of combined move+bitshift.

I assume your N in that post (buffer byte size) can be bigger than a machine
word? And the M bits at a time you're taking from that buffer can also be
bigger than a machine word?

Best to post some C code here, so that assuming it works as you want, you
might be able to get help with optimising.

--
Bart
Jun 27 '08 #6
On Sat, 12 Apr 2008 14:51:02 -0700, filia&sofia wrote:
Any suggestions? Thank you.
Google "Ring Buffer"
Jun 27 '08 #7
Ben Bacarisse wrote:
"filia&sofia" <in*********@hotmail.comwrites:
>Nice to have help.

I though you got a pretty good answer last time. At least say why
is was unsuitable or what is wrong with what you've already tried. It
may be that you are aiming for something that is impossible.
[...]
He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #8
In article <at******************************@comcast.com>,
Eric Sosman <es*****@ieee-dot-org.invalidwrote:
>Ben Bacarisse wrote:
>"filia&sofia" <in*********@hotmail.comwrites:
>>Nice to have help.
>I though you got a pretty good answer last time. At least say why
is was unsuitable or what is wrong with what you've already tried. It
may be that you are aiming for something that is impossible.
[...]
He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.
Probably needs something non-portable like "scatter-gather" asynch I/O
that DMA's into buffers that the O/S "flips" directly into user space
instead of -copying- them into user space.

But what do I know? That sort of the thing has been in use in HPC
for over a decade, and the OP requires that the approach be
"state of the art". I dunno what that means. Directly programming
SATA controllers and reserving transfer-slots on PCI-X ?? Is it
"state of the art" to use RAID-6 to stripe the data on multiple discs
to increase the I/O rate? Using the fastest drives available
is an idea as old as computing, so that isn't "state of the art" so I
guess there's no point in the OP thinking about that...
--
"Allegories are in the realm of thoughts, what ruins are in
the realm of things." -- Walter Benjamin
Jun 27 '08 #9
He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.
Nested k-level for-loops have time complexity of O(n^k).

O(n) algorithms' time complexity increases linearly.

Jun 27 '08 #10

"filia&sofia" <in*********@hotmail.comwrote in message
news:33**********************************@n14g2000 pri.googlegroups.com...
>
> He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.
Nested k-level for-loops have time complexity of O(n^k).
O(n) algorithms' time complexity increases linearly.
I don't fully understand the O() notation, but I have a feeling the above is
nonsense.

Anyway a for-loop isn't that slow, there's a small overhead per iteration,
which may be insignificant depending on what's in the loop. And there are
ways of minimising that overhead.

But I doubt what you need will involve deeply nested for-loops.

Post some code here (using any method including for-loops) and we'll see if
it needs to be improved.

--
Bart

Jun 27 '08 #11
I found some relevant information from encode.ru forums:

"about i/o - there are few std ideas that you may use

1) don't use fgetc/fputc. make your own buffer instead:

inline putc(c)
{
*curptr++ = c;
if curptr >= bufend
-- write
}

2. if you really need fast i/o - use 3 threads - for read, compress
and write. read/write data in large chunks. use circular buffers.
preread large enough amount of data (for example, for quad what
compress data in 16mb blocks, you should use 4*16mb preread buffers
and at least 2*16mb output buffers)"

Also comment for looking up "ring buffer" was good.
Jun 27 '08 #12
In article <33**********************************@n14g2000pri. googlegroups.com>,
filia&sofia <in*********@hotmail.comwrote:
> He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.
>Nested k-level for-loops have time complexity of O(n^k).
Only if the limit on each is O(n). If the limit on all except one is
O(1) then the overall result is O(n).

For example,

for (i=0;i<n;i++)
for (j=0;j<BUFSIZ;j++)
/* some code here */

The inner code is O(n), not O(n^2).

>O(n) algorithms' time complexity increases linearly.
You haven't given us enough information about the processing for
us to know whether an O(n) algorithm is even possible in these
circumstances. It looked to me, based upon what you wrote, that the
minimum was -probably- O(n log n), but it was hard to tell.

--
"This quitting thing, it's a hard habit to break once you start."
-- Walter Matthau
Jun 27 '08 #13
Bartc wrote:
)
) "filia&sofia" <in*********@hotmail.comwrote in message
) news:33**********************************@n14g2000 pri.googlegroups.com...
)>
)> He says that he wants to avoid using "for" because it's
)>a "slow operation." Go figure.
)
)Nested k-level for-loops have time complexity of O(n^k).
)O(n) algorithms' time complexity increases linearly.
)
) I don't fully understand the O() notation, but I have a feeling the above is
) nonsense.

The first line (nested k-level for loops ...) is utter nonsense.

The second line is reasonably sensible if you implicitly read 'with the
size of the input', but even then it's not 100% accurate.
However, it is totally unrelated to the current topic under discussion.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Jun 27 '08 #14
Let's put it this way, I'm trying to read n-bit numbers from a file.
I'm looking for "fast" ways to do this.

P.S. What's fast? I consider data compression programs such as THOR
and QuickLZ fast in compressing data.
Jun 27 '08 #15
"filia&sofia" <in*********@hotmail.comwrites:
Let's put it this way, I'm trying to read n-bit numbers from a file.
I'm looking for "fast" ways to do this.

P.S. What's fast? I consider data compression programs such as THOR
and QuickLZ fast in compressing data.
Since when did compression come into it?

You are providing no information. "Fast" is a relative thing. If you
have ideas about "fast" then you must have ideas about what isn't fast.

"Run fast" is something just about every program should aim to do - not
thinking about that has led to the incredible bloatware which infests
platforms today and ensures that PCs which are 1000 times faster than 20
years ago open a Word Processor in the same time as opposed to in a
blink of an eye.

Most C programmers use C because they have an eye on efficiency in terms
of CPU and memory usage IMO.
Jun 27 '08 #16
Since when did compression come into it?
It didn't.
You are providing no information. "Fast" is a relative thing. If you
have ideas about "fast" then you must have ideas about what isn't fast.
It seems that everyone here needs strict conditions to be able to say
anything. "Fast" is relative thing. "Compression" was an analogy.
"Run fast" is something just about every program should aim to do - not
thinking about that has led to the incredible bloatware which infests
platforms today and ensures that PCs which are 1000 times faster than 20
years ago open a Word Processor in the same time as opposed to in a
blink of an eye.
And this is relevant?
Most C programmers use C because they have an eye on efficiency in terms
of CPU and memory usage IMO.
Perhaps.
Jun 27 '08 #17
"filia&sofia" <in*********@hotmail.comwrites:
>Since when did compression come into it?

It didn't.
>You are providing no information. "Fast" is a relative thing. If you
have ideas about "fast" then you must have ideas about what isn't fast.

It seems that everyone here needs strict conditions to be able to say
anything. "Fast" is relative thing. "Compression" was an analogy.
Compressions was nothing to do with it. Are you seriously trying to
explain your use of "fast" by mentioning that compressions may be or
may not be fast depending on the algorithm?
>
>"Run fast" is something just about every program should aim to do - not
thinking about that has led to the incredible bloatware which infests
platforms today and ensures that PCs which are 1000 times faster than 20
years ago open a Word Processor in the same time as opposed to in a
blink of an eye.

And this is relevant?
Yes.
>
>Most C programmers use C because they have an eye on efficiency in terms
of CPU and memory usage IMO.

Perhaps.
And still you do nothing to provide information needed to answer your
question other than your pie in the sky "it must be fast".

I was trying to say that this isn't specific enough because "fast"
should be an aim for most program IMO. And "fast" can be specified in
many ways -fast to start up, fast to read a file, total time, use lest
system or cache memory etc etc etc.

It seems to me you're just looking for someone to write your program. I
hope someone does for you, but until then why not post your attempt and
then other people can comment on structure and choice of algorithms to
hopefully optimise it yet further.

Jun 27 '08 #18
filia&sofia wrote:
> He says that he wants to avoid using "for" because it's
a "slow operation." Go figure.

Nested k-level for-loops have time complexity of O(n^k).
If each loop iterates O(n) times.
O(n) algorithms' time complexity increases linearly.
Yes. Also: The Sun rises in the East, Socrates is
mortal, and things are more the way they are now than
they've ever been before.

None of which has anything at all to do with the
contention that "for" is slow. Go figure.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #19
If Eric Sossman gave the "standard" way of doing the trick, I didn't
know it was a standard way of doing it - I wanted more possible ways
to be able to discuss them later. Generally, I'm interested in how
would you write a program that reads n-bit numbers from a file (but I
do not want anyone to write the program, I'm going to do it myself as
soon as I understand how I should do it). I know that there isn't
enough information available, but I thought that it would be a good
thing to give room for ideas. Is Eric's suggestion the only one?
Jun 27 '08 #20
filia&sofia wrote:
If Eric Sossman gave the "standard" way of doing the trick, I didn't
know it was a standard way of doing it - I wanted more possible ways
to be able to discuss them later. Generally, I'm interested in how
would you write a program that reads n-bit numbers from a file (but I
do not want anyone to write the program, I'm going to do it myself as
soon as I understand how I should do it). I know that there isn't
enough information available, but I thought that it would be a good
thing to give room for ideas. Is Eric's suggestion the only one?
The issue -- as I see it, anyhow -- is that you have
described your problem with many words but with little
specificity. For example, you talk about "n-bit chunks"
but decline to hint about the likely values of n, although
you've been told that solutions for n=3 and for n=333 are
unlikely to show even a family resemblance. Similarly,
solutions where n is a compile-time constant will differ
from those where n is a constant determined at run-time,
which will in turn differ from solutions for n varying.

And then there's the question: What do you want to do
with these n-bit chunks once you get them? Some kinds of
operations will be easier if you separate each chunk from
its neighbors, while others might be performed in situ. But
all you've revealed is that you want to "process them." Not
very informative, not likely to elicit responses with a lot
of detail.

Finally, these notions about the slowness of "for" are
suggestive of inattention on your part. We know (you've told
us) that I/O is part of the scenario, and nearly all I/O
devices are slower than RAM, which is in turn slower than a
CPU. A disk operation under extremely favorable circumstances
(e.g., fibre channel to big NVRAM cache, no physical disk
movement at all) will take something like two milliseconds,
or four million ticks of a 2GHz CPU's clock ... If you have
reason to believe that the disparity between computation and
I/O is less than the usual six or seven decimal orders of
magnitude, you'd better explain them or people won't take
you seriously.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Jun 27 '08 #21
filia&sofia wrote:
>
I found some relevant information from encode.ru forums:

"about i/o - there are few std ideas that you may use

1) don't use fgetc/fputc. make your own buffer instead:

inline putc(c)
{
*curptr++ = c;
if curptr >= bufend
-- write
}
Nonsense. Just use getc and putc. They are designed to be macro
implementable (but ensure in the call that their arguments do not
need to be evaluated more than once) and thus use the built-in
stream buffers. You are very likely to find that the fastest file
copy routine is:

int filecpy(FILE *f1, FILE *f2) {
int ch;

while (EOF != (ch = getc(f1))) putc(ch, f2);
return ch;
}

If it isn't the fastest, it may be that your system fails to
provide putc/getc macros.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #22
In article <48***************@yahoo.com>,
CBFalconer <cb********@maineline.netwrote:
>Nonsense. Just use getc and putc. They are designed to be macro
implementable (but ensure in the call that their arguments do not
need to be evaluated more than once) and thus use the built-in
stream buffers. You are very likely to find that the fastest file
copy routine is:

int filecpy(FILE *f1, FILE *f2) {
int ch;

while (EOF != (ch = getc(f1))) putc(ch, f2);
return ch;
}

If it isn't the fastest, it may be that your system fails to
provide putc/getc macros.
Unfortunately, as discussed here recently, some widely-used systems
do not make getc and putc efficient, in a (misguided) attempt to
provide thread safety. Using fread() and fwrite() is likely
to be much faster on these systems, and at least as good on others.

-- Richard
--
:wq
Jun 27 '08 #23

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Developwebsites | last post: by
1 post views Thread by Ryan | last post: by
2 posts views Thread by Mohammad Parwez | last post: by
1 post views Thread by Guadala Harry | last post: by
2 posts views Thread by Lonewolf | last post: by
11 posts views Thread by epidemiologist | last post: by
12 posts views Thread by rhino | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.