473,326 Members | 2,095 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

B

Hi all,

I am writing an image processing algorithm that goes across an image
first row-wise and then column-wise. For illustration purposes,
imagine that for every pixels, the output is computed as the sum of
the input and its previous neighbourg. In the horizontal pass (i.e.
row-wise), the previous neighbourg is defined as the pixel on the
left. This is followed by a vertical pass, whereby the previous
neighbourg is defined as the pixel on top.

I quickly encountered a problem with the column-wise path, as
processing pixel in a vertical fashion was _very_ slow (0.31 vs. 0.009
for the horizontal). I since learned that it was due to cache misses.
Indeed, the image is stored row by row, so vertically-neighbouring
pixels are actually far away in memory, which entails all kind of
speed problems.

I was suggested to try a block processing approach, whereby the image
is stored by blocks rather than by rows. A block is typically 32x32,
so in memory, the first line of the block is followed by the second
line of the block, followed by the third, .... until we move on to the
next block, and the pattern is repeated. This has the advantage of
ensuring locality, i.e. both horizontal and vertical neighboring
pixels will be close, hence there should be less cache misses. Both
horizontal and vertical passes should be approximately as fast.

This is all well and good, but things are not that perfect in
practice. Accessing pixels is now significantly slower, even for
horizontal pass. For an image structured in blocks, an horizontal pass
is 0.030 sec. whereas with the original memory structure (row after
row, lets call it linear), the horizontal pass was only 0.009, 3 times
faster! This cannot be due to cache misses, as locality is pretty good
in both cases. The code to access a pixel is more costly in the case
of block processing. Compare the following, which is copied from my
implementation:

// linear access to pixel
// img: pointer to start of array in memory
#define PIXEL(img, width, row, col) ((img) + ((row) * width) +
(col))

// block access to pixel
#define PXL(img, width, col, row) \
((img) + (((row) & (~BLOCK_Y_MASK)) * (width)) + (((col) >>
BLOCK_X_SHIFT) * BLOCK_SIZE) \
+ (((row) & BLOCK_Y_MASK) << BLOCK_X_SHIFT) + ((col) &
BLOCK_X_MASK))

with enum
{
BLOCK_X_SHIFT = 5,
BLOCK_Y_SHIFT = 5,
BLOCK_X_MASK = (1 << BLOCK_X_SHIFT) - 1,
BLOCK_Y_MASK = (1 << BLOCK_Y_SHIFT) - 1,
BLOCK_WIDTH = (1 << BLOCK_X_SHIFT),
BLOCK_HEIGHT = (1 << BLOCK_Y_SHIFT),
BLOCK_SIZE = BLOCK_WIDTH * BLOCK_HEIGHT
};

Accessing a pixel is thus obviously much slower, and I was not able to
speed things up despite all my efforst. Does someone have any ideas or
suggestions that may help, or know of any good references discussing
the problem? Thanks in advance

Angus

Apr 5 '07 #1
6 1256
ve********@googlemail.com wrote On 04/05/07 11:58,:
Hi all,

I am writing an image processing algorithm that goes across an image
first row-wise and then column-wise. For illustration purposes,
imagine that for every pixels, the output is computed as the sum of
the input and its previous neighbourg. In the horizontal pass (i.e.
row-wise), the previous neighbourg is defined as the pixel on the
left. This is followed by a vertical pass, whereby the previous
neighbourg is defined as the pixel on top.

I quickly encountered a problem with the column-wise path, as
processing pixel in a vertical fashion was _very_ slow (0.31 vs. 0.009
for the horizontal). I since learned that it was due to cache misses.
Indeed, the image is stored row by row, so vertically-neighbouring
pixels are actually far away in memory, which entails all kind of
speed problems.
[...]
This isn't really a C question -- most questions about
speed are not C questions -- but I'll risk the rath of the
wrighteous and try a partial answer anyhow.

The biggest payoff is likely to come from combining the
two passes into one, perhaps by interleaving. For example,
you might make a horizontal pass on rows 0 and 1 and then
the first step of a vertical pass (just rows 0 and 1), then
horizontal on row 2 and a vertical step for rows 1 and 2,
and so on. This may increase the amount of work you are
able to do per cache fill.

You might also get a benefit from "padding" the image
so the rows are not separated by exact multiples of the cache
line size. A 640x480 image, for example, has a row length
that is likely to be a multiple of the cache line size, and
this could lead to cache thrashing. Embedding the 640x480
area in a 642x480 area (and ignoring the extra two columns)
might gain speed.

For further ideas, see if you can find a newsgroup with
"graphics" or "image" in its name.

--
Er*********@sun.com
Apr 5 '07 #2
ve********@googlemail.com wrote:
Hi all,
Please don't multi-post on Usenet. Post to the group for language you
are using.

--
Ian Collins.
Apr 5 '07 #3

<ve********@googlemail.comwrote in message
news:11**********************@n59g2000hsh.googlegr oups.com...
[image processing ]
I quickly encountered a problem with the column-wise path, as
processing pixel in a vertical fashion was _very_ slow (0.31 vs. 0.009
for the horizontal). I since learned that it was due to cache misses.
Indeed, the image is stored row by row, so vertically-neighbouring
pixels are actually far away in memory, which entails all kind of
speed problems.

I was suggested to try a block processing approach, whereby the image
is stored by blocks rather than by rows.
Firstly it surprising that scanning the image is the bottleneck in your
program. You ought to be able to scan a large area of arbitrary memory very
quickly indeed, even with sub-optimal cache usage.
You are largely wasting your time trying to speed things up by reorganising
images into blocks. Yes you can improve read times, but at the cost of extra
processing to manage the blocks, and then you add extra complexity to the
code. Finally you probably have to reformat anyway for the next stage of
processing. At the end you have a complex and bug-ridden program that runs
only slightly faster, if at all.

One thing that is not a waste of time is to try padding to sizes that are an
exact power of two, and change the multiply into a shift, or not an exact
power of two, to help the cache along. Then profile to see which is faster.

Another thing you can try is

int block[32][32]

getblock(img, block, x, y)
processblock(block);
writeblock(img, block, 32, 32);

storing the portion of the image you are working on to a local area,
processing it, and writing out the results. This will only work if
processblock() is quite intensive.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Apr 6 '07 #4
Eric Sosman wrote:
>
.... snip ...
>
This isn't really a C question -- most questions about
speed are not C questions -- but I'll risk the rath of the
wrighteous and try a partial answer anyhow.
rath? I am roth. :-)

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

--
Posted via a free Usenet account from http://www.teranews.com

Apr 7 '07 #5
On 5 Apr, 21:16, Eric Sosman <Eric.Sos...@sun.comwrote:
The biggest payoff is likely to come from combining the
two passes into one, perhaps by interleaving. For example,
you might make a horizontal pass on rows 0 and 1 and then
the first step of a vertical pass (just rows 0 and 1), then
horizontal on row 2 and a vertical step for rows 1 and 2,
and so on. This may increase the amount of work you are
able to do per cache fill.
That would require an extra buffer to store the result of the vertical
passs, and I can't do that.
You might also get a benefit from "padding" the image
so the rows are not separated by exact multiples of the cache
line size. A 640x480 image, for example, has a row length
that is likely to be a multiple of the cache line size, and
this could lead to cache thrashing. Embedding the 640x480
area in a 642x480 area (and ignoring the extra two columns)
might gain speed.
I do not understand, I thought having row length a multiple of the
cache size was a good thing??

Apr 10 '07 #6
vectorizor wrote:
On 5 Apr, 21:16, Eric Sosman <Eric.Sos...@sun.comwrote:
> The biggest payoff is likely to come from combining the
two passes into one, perhaps by interleaving. For example,
you might make a horizontal pass on rows 0 and 1 and then
the first step of a vertical pass (just rows 0 and 1), then
horizontal on row 2 and a vertical step for rows 1 and 2,
and so on. This may increase the amount of work you are
able to do per cache fill.

That would require an extra buffer to store the result of the vertical
passs, and I can't do that.
You need space for *one* image row. The idea is to do
the vertical "pass" as a series of row-by-row steps that
interleave between the horizontal passes.
> You might also get a benefit from "padding" the image
so the rows are not separated by exact multiples of the cache
line size. A 640x480 image, for example, has a row length
that is likely to be a multiple of the cache line size, and
this could lead to cache thrashing. Embedding the 640x480
area in a 642x480 area (and ignoring the extra two columns)
might gain speed.

I do not understand, I thought having row length a multiple of the
cache size was a good thing??
Might be, might not. My thought was that to calculate the
end value for a specific pixel you need its original value and
those of the three pixels just above it, just to its left, and
diagonally up and left. If the row length is a multiple of the
cache line size, the two vertically-adjacent pixels might compete
for the same cache location and thus cause thrashing. If the
length is one greater than a line size multiple, the up-and-left
pixel might compete with the central pixel. Hence my suggestion
of two padding slots.

But I may be talking through my hat, making suppositions
about "caches" and "line sizes" and "memory latencies" and
"speed" and other such matters that are not part of the C
language. I repeat my earlier suggestion (which I now think
should probably have been the entirety of my response):
For further ideas, see if you can find a newsgroup with
"graphics" or "image" in its name.
--
Eric Sosman
es*****@acm-dot-org.invalid

Apr 10 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: William C. White | last post by:
Does anyone know of a way to use PHP /w Authorize.net AIM without using cURL? Our website is hosted on a shared drive and the webhost company doesn't installed additional software (such as cURL)...
2
by: Albert Ahtenberg | last post by:
Hello, I don't know if it is only me but I was sure that header("Location:url") redirects the browser instantly to URL, or at least stops the execution of the code. But appearantely it continues...
3
by: James | last post by:
Hi, I have a form with 2 fields. 'A' 'B' The user completes one of the fields and the form is submitted. On the results page I want to run a query, but this will change subject to which...
0
by: Ollivier Robert | last post by:
Hello, I'm trying to link PHP with Oracle 9.2.0/OCI8 with gcc 3.2.3 on a Solaris9 system. The link succeeds but everytime I try to run php, I get a SEGV from inside the libcnltsh.so library. ...
1
by: Richard Galli | last post by:
I want viewers to compare state laws on a single subject. Imagine a three-column table with a drop-down box on the top. A viewer selects a state from the list, and that state's text fills the...
4
by: Albert Ahtenberg | last post by:
Hello, I have two questions. 1. When the user presses the back button and returns to a form he filled the form is reseted. How do I leave there the values he inserted? 2. When the...
1
by: inderjit S Gabrie | last post by:
Hi all Here is the scenerio ...is it possibly to do this... i am getting valid course dates output on to a web which i have designed ....all is okay so far , look at the following web url ...
2
by: Jack | last post by:
Hi All, What is the PHP equivilent of Oracle bind variables in a SQL statement, e.g. select x from y where z=:parameter Which in asp/jsp would be followed by some statements to bind a value...
3
by: Sandwick | last post by:
I am trying to change the size of a drawing so they are all 3x3. the script below is what i was trying to use to cut it in half ... I get errors. I can display the normal picture but not the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.