Hello all,
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. 'pixel' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:
############### #####
// HORIZONTAL
// loop for every row
....
// then for every column
for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}
// VERTICAL
// loop for every column
....
// then for every row
for(row=firstRo w+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth ;
}
############### ###
For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical =* *0.135 sec.
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes 'image_width'. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.
I would really appreciate if anybody could enlighten me on this topic.
Thanks in advance,
Enrique 10 5742
Enrique Cruiz wrote:
Hello all,
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. 'pixel' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:
############### #####
// HORIZONTAL
// loop for every row
...
// then for every column
for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}
// VERTICAL
// loop for every column
...
// then for every row
for(row=firstRo w+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth ;
}
############### ###
For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical = 0.135 sec.
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes 'image_width'. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.
I would really appreciate if anybody could enlighten me on this topic.
Thanks in advance,
Without a complete, minimal, compilable example, our guess is as good
as yours, which is probably a good guess. Use the time program to
observe time spent by your code and time spent in system code. That'll
give an indication if your guess is right.
In anycase, the C standard says nothing about efficiency of the code
generated. If the small delay is too much for your application, then
you'll have to profile your code to find out the area where it is
stalling and look at what to do about it.
On 25 Jan, 11:00, Enrique Cruiz <jni6l03mdo6n.. .@jetable.orgwr ote:
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
[Snip]
This is not really a question about the C language. A general forum
such as comp.programmin g may be more appropriate than a C language
forum.
My only guess relates to memory management issues.
<Off-topic>
You are almost certainly right, and there's probably not a lot you can
do to change it with this processing style.
In your horizontal pass you can process a cache's worth of data at a
time before pulling the next cachefull. In the worst case, your
vertical pass could have a cache miss for each pixel accessed.
</Off-topic>
Enrique Cruiz wrote:
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
I'm pretty sure it's because the processor for which you are compiling
has an INC-command which increments a given number quite fast. Your
"pixel++" is compiled into such an INC, whereas "pixel+=imgWidt h" can
only be translated into an ADD-command which takes much longer.
I don't think you can do much C-specific here. Maybe, if imgWidth is a
power of two, there is a chance of using some strange &s and |s, but I
wonder...
Regards
Steffen
Steffen Buehler wrote:
Enrique Cruiz wrote:
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
I'm pretty sure it's because the processor for which you are compiling
has an INC-command which increments a given number quite fast. Your
"pixel++" is compiled into such an INC, whereas "pixel+=imgWidt h" can
only be translated into an ADD-command which takes much longer.
<snip>
<OT>
With regard to modern x86s, as far as I can tell, the ADD instruction
is faster than the INC instruction. The OP didn't specify the size of
the image being processed, and the cache sizes of the CPU, so one can't
tell if the slowdown is due to cache misses.
</OT>
On 2007-01-25 11:55:10 +0000, "santosh" <sa*********@gm ail.comsaid:
>
<OT>
With regard to modern x86s, as far as I can tell, the ADD instruction
is faster than the INC instruction. The OP didn't specify the size of
the image being processed, and the cache sizes of the CPU, so one can't
tell if the slowdown is due to cache misses.
</OT>
10 MB pixels image, 3650x2730
I have no idea how to find out about the sizes of the CPU cache?!
Enrique
In article <11************ *********@a75g2 000cwd.googlegr oups.com ma**********@po box.com writes:
On 25 Jan, 11:00, Enrique Cruiz <jni6l03mdo6n.. .@jetable.orgwr ote:
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
....
You are almost certainly right, and there's probably not a lot you can
do to change it with this processing style.
In your horizontal pass you can process a cache's worth of data at a
time before pulling the next cachefull. In the worst case, your
vertical pass could have a cache miss for each pixel accessed.
Indeed. Cache misses can give a huge loss of cycles.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Enrique Cruiz wrote:
On 2007-01-25 11:55:10 +0000, "santosh" <sa*********@gm ail.comsaid:
<OT>
With regard to modern x86s, as far as I can tell, the ADD instruction
is faster than the INC instruction. The OP didn't specify the size of
the image being processed, and the cache sizes of the CPU, so one can't
tell if the slowdown is due to cache misses.
</OT>
10 MB pixels image, 3650x2730
I have no idea how to find out about the sizes of the CPU cache?!
Under Linux 'dmesg | grep cpu' should do the trick. Under Windows
msinfo.exe should show the sizes.
Anyway, a 10 Mb file will definitely cause hundreds of cache misses,
when you read non-sequentially. I don't there's anything much you can
do about it. Generally the delay is acceptable.
Enrique Cruiz wrote:
>
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. 'pixel' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:
############### #####
// HORIZONTAL
// loop for every row
...
// then for every column
for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}
// VERTICAL
// loop for every column
...
// then for every row
for(row=firstRo w+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth ;
}
############### ###
For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical = 0.135 sec.
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes 'image_width'. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.
Your guess is almost certainly correct. But, for the code you
show, there is no reason to even have the second scan. The
operation could simply be *pixel *= 4.0 / 9;
Assuming the operations are really more complex, and depend on
adjacent pixels, you could maintain cache coherency by keeping
things local. No need for the two sweeps.
/* init pixel pointer */
for (row = firstRow + 1; row < lastRow; ++row) {
/* adjust pixel pointer, maybe pixel += 2 */
for (col = firstCol + 1; col < lastCol; ++col, ++pixel) {
/* look through adjacencies */
}
}
--
<http://www.cs.auckland .ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
On Jan 25, 6:00 am, Enrique Cruiz <jni6l03mdo6n.. .@jetable.orgwr ote:
Hello all,
I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!
The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. 'pixel' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:
############### #####
// HORIZONTAL
// loop for every row
...
// then for every column
for(col=firstCo l+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}// VERTICAL
// loop for every column
...
// then for every row
for(row=firstRo w+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth ;}############# #####
For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical = 0.135 sec.
Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidt h;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.
My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes 'image_width'. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.
I would really appreciate if anybody could enlighten me on this topic.
Thanks in advance,
This is quite standard, and your guess is quite correct. The problem
is
that when you process by rows the elements are close together
and when you process by columns the elements are far apart.
This has little to do with C (if you use Fortran you will probably
find that processng by columns is faster than processing by rows).
You have several options
Live with things as they are (a drop in speed of three times
is not that severe. Suppose your matrix
was so large that a column operation caused a disc read
for every element. Can you say "several orders of magnitude")
Change to an algorithm that can be done with
only row processing (not always possible).
Do the row processing, transpose the matrix,
do the column processing, transpose the matrix
again. The two transposes may take less time
than the processing time difference.
Store the image in "tiles", so that close
by pixels in any direction are usually close
in memory. This is the most general solution
and also the hardest. Note that the
overhead of the tiling sheme will probably
make row processing slower. Hopefully, you
will more that make up for this in column processing.
Process in a different order. Process a few rows
and then do the needed column processing. Process
a pixel at a time, doing the needed row and column operations.
The change in memory access pattern may help.
-William Hughes This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: lawrence |
last post by:
I had some code that worked fine for several weeks, and then yesterday
it stopped working. I'm not sure what I did. Nor can I make out why it
isn't working. I'm running a query that should return 3 items from the
database. I then get a count of the return, which correctly tells me
that I've got 3 items. I then go into a for loop which I expect to
loop 3 times and print out the 3 items. Here is where things get
strange: it loops 3 times and...
|
by: Eva |
last post by:
Hi
i am trying to insert a new row into one of my datatabels that i have in my dataset when a button is clicked. here is my code
Dim ClientInsRow As DataRow = dtClient.NewRo
ClientInsRow("Surname") = txtSurname.Tex
ClientInsRow("Forename") = txtForename.Tex
ClientInsRow("OrgName") = txtOrganisation.Tex
ClientInsRow("Address") = txtAddress.Tex
|
by: AdamG |
last post by:
I am trying hard for days now to add and delete rows to a table. I
could really use some help...
Each row contains two buttons (images) - 'add row' and 'delete row'.
When the user clicks add row within a specific cell/row, the index of
that row should be passed to a function that creates a new row using
that index (the new row should be added directly below the row where
the user clicked. The new row should contain all of the cells and...
|
by: Micha? |
last post by:
Hello this is my problem:
<script language="javascript">
function Show()
{
RowNumber="???"; // I trying this.rowIndex
CellNumber="???"; //
TableID="???"; // :-(
alert(RowNumber);
|
by: Dave Elliott |
last post by:
After inserting a new data row to a DataTable that is bound to a
datagrid, I am unable to change data in a row that is after the newly
added row without getting bizarre results.
I have added the full code for the test below. Create a project drop
in the code and run.
Any help would be appreciated.
Cheers,
| |
by: NDady via DotNetMonster.com |
last post by:
Hi,
I have a datagrid populated from a dataset. On every row in the grid I have
a delete button. When the user presses on the delete button I remove the
row from the dataset and rebind the datagrid.
The problem is that after a couple of delete the index in the dataset does
not match the index in the grid and the wrong record i deleted from the
dataset. How can I solve this Problem? I am using the following procedure:
Private Sub...
|
by: michael sorens |
last post by:
I tried to do a simple operation on a Windows Form in VS2005 inside a
key_down handler:
if (e.Control && e.Shift && e.KeyCode == Keys.V)
{
int selectedRowIndex = dataGridView.SelectedCells.RowIndex;
dataGridView.Rows.AddCopy(selectedRowIndex);
}
So when the user presses Ctrl-Shft-V, a copy of the first row of a user's
|
by: RoomfulExpress |
last post by:
Here's the problem that I'm having- I'm trying to pull in 2 fields from my database and place them in the title of the HTML. I'm connecting to the db and selecting everything exactly the same as I am doing below, and it works fine. For some reason it's not pulling in the fields. Any ideas?
Here's the link to the actual page I'm working on.
http://www.roomfulexpress.com/newsite/php/familyprofile.php?FAMILY_CD=558167959
Please see below...
|
by: btreddy |
last post by:
Hii experts,
I've been trying for this but i didn succeeded.The problem is
I've a datagird which is having 2 cols displaying name and emial id .wht i want is when i select a paricular row from the gridview i want to capture the emial id value of tht paricular row.
I've added this code in the even onrowdatatbound.
protected void GridViewParticipants_OnRowDataBound(object sender, GridViewRowEventArgs e)
/
.
.
if (e.Row.RowType ==...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
| |
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |