473,387 Members | 1,790 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

clustering in C

Can someone help me with C programming here.
If I have
x is a 2D array of 0's and 1's
ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
what I want is a boundingbox over the region where we find clusters of
1's.So for instance in the above list first 3 roes and colums have 1's
so the area of that box is 3x3
so my final array should have an array of clusters of
1's like
coords = [ (x1,y1,x2,y2), ....]
where x1,y1 -> top left point
x2,y2 -> bottom right point of that rectangle.
Hope I am clear with my question.

[0, 0, 0, 0, 0,
+-----+
0,|1, 1,|0, 0,
| |
0,|0, 1,|0, 0,
+-----+
0, 0, 0, 0, 0]

something like above. The resultant array should have the area of the
each such box. SOmething like a flood fill algorithm.
I would like the algorithm to return all the rectangles .Not just the
minimum ones.

Nov 14 '05 #1
10 2901
qu*****@gmail.com wrote:
Can someone help me with C programming here.

<snip>

Hi there "querypk",

google for Hoshen-Kopelman algorithm...

Jan

Nov 14 '05 #2

<qu*****@gmail.com> wrote
Can someone help me with C programming here.
If I have
x is a 2D array of 0's and 1's
ex: x = array([1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
[1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0])
what I want is a boundingbox over the region where we find clusters of
1's.So for instance in the above list first 3 roes and colums have 1's
so the area of that box is 3x3
so my final array should have an array of clusters of
1's like
coords = [ (x1,y1,x2,y2), ....]
where x1,y1 -> top left point
x2,y2 -> bottom right point of that rectangle.
Hope I am clear with my question.

[0, 0, 0, 0, 0,
+-----+
0,|1, 1,|0, 0,
| |
0,|0, 1,|0, 0,
+-----+
0, 0, 0, 0, 0]

something like above. The resultant array should have the area of the
each such box. SOmething like a flood fill algorithm.
I would like the algorithm to return all the rectangles .Not just the
minimum ones.

First thing to do is to allocate a temporary array of integers. Set the ones
to one and the zeroes to zero.

Then iterate over the array. When you hit a one, do a flood fill, filling
with 2, 3, 4 etc, like this

void floodfill(int * array, int x, int y, int width, int height, int fill)
{
if(x < 0 || x >= width)
return;
if(y < 0 || y >= height)
return;
if(array[y * width + x] == fill)
return;
if(array[y * width + x] == 0)
return;
array[y * width + x] = fill;
floodfill(array, x + 1, y, width, height, fill);
floodfill(array, x - 1, y, width, height, fill);
floodfill(array, x, y + 1, width, height, fill);
floodfill(array, x, y -1, width, height, fill);
}

The resulting array should be a list of zeroes, and clusters from 2 up. Also
keep a count of how many clusters you have found.

Now allocate an array of rectangle structures, and an array of flags. Set
all the falgs to zero.

Iterate over the array a second time. When you hit a non-zero, check the
flag. If it is clear, you have the top point of the cluster. If the array is
relatively small and time isn't critical, it is probably easiest to iterate
from the top left going down columns to find the left point, to iterate from
the bottom up to find the bottom point, and from the right going up columns
to find the right point.
(If you have a huge array with small clusters you can search the cluster
using the floodfill again, and store maximum and minimum points).

Then set the flag to mark the fact that you have found the rectnagle for
that cluster, and continue until you reach the end of the array.
Nov 14 '05 #3
hi jburgy,
that kind of helped I tied to modify the code so that instead of
clustering 1's it would cluster 0's just to better understand the
algorithm..But I guess I am making some mistake and it does'nt do the
revers of what it does now. Can you help here. basically I am trying to
understand. what should I change in the C code to make it work for 0
clustering instead of 1's..

int hoshen_kopelman(int **matrix, int m, int n) {

uf_initialize(m * n / 2);

/* scan the matrix */

for (int i=0; i<m; i++)
for (int j=0; j<n; j++)
if (!matrix[i][j]) { // if occupied ...

int up = (i==0 ? 0 : matrix[i-1][j]); // look up
printf("%d\n",up);
int left = (j==0 ? 0 : matrix[i][j-1]); // look left

switch (!!up + !!left) {

case 0:
matrix[i][j] = uf_make_set(); // a new cluster
break;

case 1: // part of an existing cluster
matrix[i][j] = max(up,left); // whichever is nonzero is
labelled
break;

case 2: // this site binds two clusters
matrix[i][j] = uf_union(up, left);
break;
}

}

/* apply the relabeling to the matrix */

/* This is a little bit sneaky.. we create a mapping from the
canonical labels
determined by union/find into a new set of canonical labels, which
are
guaranteed to be sequential. */

int *new_labels = calloc(sizeof(int), n_labels); // allocate array,
initialized to zero

for (int i=0; i<m; i++)
for (int j=0; j<n; j++)
if (matrix[i][j]) {
int x = uf_find(matrix[i][j]);
if (new_labels[x] == 3) {
new_labels[0]++;
new_labels[x] = new_labels[0];
}
matrix[i][j] = new_labels[x];
}

int total_clusters = new_labels[0];

free(new_labels);
uf_done();

return total_clusters;
}
I canges the if(matrix[i][j]) to if(!matrix[i][j]) but it did not
update the other clusters.

http://splorg.org:8080/people/tobin/...nkopelman/hk.c

Nov 14 '05 #4
su*******@gmail.com wrote:
hi jburgy,
that kind of helped I tied to modify the code so that instead of
clustering 1's it would cluster 0's just to better understand the
algorithm..But I guess I am making some mistake and it does'nt do the
revers of what it does now. Can you help here. basically I am trying to
understand. what should I change in the C code to make it work for 0
clustering instead of 1's..

<snip>

Hi superprad,

Did you honestly believe you could change a single statement in Tobin's
code and it would magically do what you want? If programming were that
simple, we probably wouldn't discuss it so much... If you looked
carefully, Tobin overwrites the 1's in 'matrix' with the appropriate
cluster label. That's a common trick in implementations of the
Hoshen-Kopelman algorithm. Unfortunately for you, it also means that
you can't simply switch 0 and 1. What you could do is preprocess your
matrix thusly:

for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
matrix[i][j] = 1 - matrix[i][j];

before you feed it into the subroutine. Otherwise, you could take a
simpler implementation (sorry Tobin, but I find yours unnecessarily
complicated), sit down, have a beer, and only get up when you
completely understand what it does. Then, and only then, can you start
hacking it for your need. I'll look around home for a simpler
implementation.

Good luck,

Jan

Nov 14 '05 #5
Hi Jan,
I did preprocess my matrix replacing 1's with 0's. But the mistake i
did was I also modified the code for that '!' condition and some more
changes. That was the reason probably my resultant matrix was not
giving what I expected.But if the kept the code un modified and pre
processed the matrix alone it worked. Thanks for that. Yes as you said
I have to take more time to understand what exactly is going on.
Definitely a simpler implementation would really help.Thanks again for
your valuable time and advice.

Nov 14 '05 #6
One more question. If I pre process the matrix then what happens is we
are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
algorithm always replaces the 1's ie as i read the occupied cell. What
if you want to replace un occupied cells as i discussed the previous
post. And what is the necessity for 'union and find'. Cant we just
scan through the matrix and look for the 0's or 1's and compare the
neighbors and replace the neighbors with the same label as the parent.

Nov 14 '05 #7
su*******@gmail.com wrote:
One more question. If I pre process the matrix then what happens is we
are replacing 1's with 0's and otherway. So does Hoshen-Kopelman
algorithm always replaces the 1's ie as i read the occupied cell. What
if you want to replace un occupied cells as i discussed the previous
post. And what is the necessity for 'union and find'. Cant we just
scan through the matrix and look for the 0's or 1's and compare the
neighbors and replace the neighbors with the same label as the parent.


I reread you original question. Why did you want to modify Tobin's code
in the first place? It already labels clusters of 1's exactly like you
demand. It doesn't return a list of cluster coordinates because that's
not a well-defined question (consider the following situation):

000010000
000111000
001110000
000111100
000001000

what are the "corners" of this cluster of ones? In case I wasn't clear
enough in my first post, here's what the Hoshen-Kopelman algorithm does
(and if it's not what you're after, please accept my apologies)

110000000 110000000
100001110 100002220
000001110 -> 000002220
110000100 330000200
011000000 330000000

That's why it's called cluster labeling.

Yours,

Jan

Nov 14 '05 #8
I understand what you say. Ok let me explain clearly. I think
Hoshen-Kopelman algorithm is what I might need but ok consider this...
_ _ _ _ _ _ _
|2222| 1|3333 | 000010000
|2221| 1|1333 | <------ 000111000
|2211| 1|3333 | 001110000
- - - - - --------
111111111 000111100
_ _ _ _ _ _
|44444|1|555| 000001000
---------- ------
Considering that we are looking for un occupied cells (clustering 0's
rather than 1's)...
this is one cluster . and corners of cluster are in this case
{1,1,4,4}(topleft and bottom right corners of this rectangle)something
like a bounding box...
so that the resultant array will have
{ {1,1,4,4},{-,-,-,-} .........includes all rectangles. If there is a
cluster inside another cluster even that is considered as a seperate
rectangle...

Nov 14 '05 #9
su*******@gmail.com wrote:
I understand what you say. Ok let me explain clearly. I think
Hoshen-Kopelman algorithm is what I might need but ok consider this...


Dude, you need to work on your ascii art! Alright, this is what I
understood: you define the bounding box of an irregularly shaped
cluster as the smallest rectangle which inscribes it entirely, don't
you. For example

111111111 111111111
111111111 1+-----+1
111 11111 1|1 111|1
11 1 111 is bounded 1| 1 1|1
11 11 as follows 1| |1
1111 111 1|11 1|1
111111111 1+-----+1
111111111 111111111

Is that correct? Assuming it is, here's what you need to do:

1. make a backup copy of your matrix 'coz we're gonna overwrite it
2. swap its 0's and 1's
3. label the clusters of 1's (which now represent /empty/ cells)
4. allocate as many quadruples (for bbox coordinates) as you have
clusters, initialize them as follows (xmin=ymin=0, xmax=ncols,
ymax=nrows)
5. do a one-pass scan of the labeled matrix, updating the bbox
coordinates of the corresponding cluster each time you encounter one

That should be pretty simple. My initial guess was still right, step 3
does the "hard" work, 1 and 2 some pre-processing and 4 and 5 some
post-processing. I've gotta ask now: why exactly are you doing this?

Jan

P.S. By now, we are completely off-topic on comp.lang.c (I'm surprised
we haven't been flamed yet), you better follow-up on comp.programming.

Nov 14 '05 #10
hi thanks for the help. Just to let you know that I got it working
thanks for that. I get the Idea on HK,

Nov 14 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Nico de Groot | last post by:
I have a 2 node Microsoft 2000 cluster with a shared storage device. I want to create automatic failover for MS SQL 2000 server. I can do that wit the following options: 1. Active/Pasive (one...
1
by: kumar | last post by:
Dear Friends, I wanted to configure Failover cluster for SQL Server 2000 on Windows 2000 advanced servers. I have only 2 no.s of windows 2000 advanced server m/cs. I dont have any shared...
3
by: Shabam | last post by:
When a web application becomes overloaded with traffic, one can offload it by load balancing and clustering the front end web servers. What happens when the back-end MSSQL database becomes...
1
by: willie | last post by:
Hi all: I have a clustering SQL Server on Node1 and Node2, the Node1 has named Instance1 and Node2 has named Instance2, no default instance. We tested it that everthing is OK, then we decide to...
2
by: CSN | last post by:
Just wondering - is there something similar to this (clustering) for PostgreSQL? If so, how does it compare? http://www.mysql.com/press/release_2003_30.html ...
3
by: datapro01 | last post by:
Running DB2 version 8.1.1 on AIX 5.1.1 The table (employee) is being reorged and has a clustering index (empid). Is there any different between these two commands? db2 reorg table employee...
11
by: chmmr | last post by:
Hi, I am currently in the process of gathering info/experiences for an incoming Linux DB2 clustering phase we actually know nothing about (since we are doing it for the first time ever), so I...
3
by: dejavue82 | last post by:
Hi, Does anybody know of a software package that allows for several servers, running asp.net 2.0 to be clustered, regardless of where they are located (ie. without a hardware load balancer)....
5
by: Lakesider | last post by:
Hi NG, I have a question about data: I have travel-times from A to B like this from | to | sec. A B 17 A B 18 A B 30 A B 32
3
by: Manish | last post by:
I think this question has been asked number of times. However, I am looking for some specific information. Perhaps some of you can help close the gap. Or perhaps you can point me towards right...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.