473,404 Members | 2,137 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

How to use clustering evaluation elbow method in K-Medoids

i use this code as my reference :


Expand|Select|Wrap|Line Numbers
  1. import pylab as plt
  2. import numpy as np
  3. from scipy.spatial.distance import cdist, pdist
  4. from sklearn.cluster import KMeans
  5. from sklearn.datasets import load_iris
  6.  
  7. iris = load_iris()
  8.  
  9. k = range(1,11)
  10.  
  11. clusters = [KMeans(n_clusters = c,init = 'k-means++').fit(iris.data) for c in k]
  12. centr_lst = [cc.cluster_centers_ for cc in clusters]
  13.  
  14. k_distance = [cdist(iris.data, cent, 'euclidean') for cent in centr_lst]
  15. clust_indx = [np.argmin(kd,axis=1) for kd in k_distance]
  16. distances = [np.min(kd,axis=1) for kd in k_distance]
  17. avg_within = [np.sum(dist)/iris.data.shape[0] for dist in distances]
  18.  
  19. with_in_sum_square = [np.sum(dist ** 2) for dist in distances]
  20. to_sum_square = np.sum(pdist(iris.data) ** 2)/iris.data.shape[0]
  21. bet_sum_square = to_sum_square - with_in_sum_square
  22.  
  23. kidx = 2
  24.  
  25. fig = plt.figure()
  26. ax = fig.add_subplot(111)
  27. ax.plot(k, avg_within, 'g*-')
  28. ax.plot(k[kidx], avg_within[kidx], marker='o', markersize=12, \
  29. markeredgewidth=2, markeredgecolor='r', markerfacecolor='None')
  30. plt.grid(True)
  31. plt.xlabel('Number of clusters')
  32. plt.ylabel('Average within-cluster sum of squares')
  33. plt.title('Elbow for KMeans clustering (IRIS Data)')
  34.  
i want to change K-Means with K-Medoids.
this is my k-medoids code :
Expand|Select|Wrap|Line Numbers
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import matplotlib.cm as cm
  4. from copy import deepcopy
  5. from IPython import embed
  6. import time
  7.  
  8. def _get_init_centers(n_clusters, n_samples):
  9.     '''return random points as initial centers'''
  10.     init_ids = []
  11.     while len(init_ids) < n_clusters:
  12.         _ = np.random.randint(0,n_samples)
  13.         if not _ in init_ids:
  14.             init_ids.append(_)
  15.     return init_ids
  16.  
  17. def _get_distance(data1, data2):
  18.     '''example distance function'''
  19.     return np.sqrt(np.sum((data1 - data2)**2))
  20.  
  21. def _get_cost(X, centers_id, dist_func):
  22.     '''return total cost and cost of each cluster'''
  23.     st = time.time()
  24.     dist_mat = np.zeros((len(X),len(centers_id)))
  25.     # compute distance matrix
  26.     for j in range(len(centers_id)):
  27.         center = X[centers_id[j],:]
  28.         for i in range(len(X)):
  29.             if i == centers_id[j]:
  30.                 dist_mat[i,j] = 0.
  31.             else:
  32.                 dist_mat[i,j] = dist_func(X[i,:], center)
  33.     #print 'cost ', -st+time.time()
  34.     mask = np.argmin(dist_mat,axis=1)
  35.     members = np.zeros(len(X))
  36.     costs = np.zeros(len(centers_id))
  37.     for i in range(len(centers_id)):
  38.         mem_id = np.where(mask==i)
  39.         members[mem_id] = i
  40.         costs[i] = np.sum(dist_mat[mem_id,i])
  41.     return members, costs, np.sum(costs), dist_mat
  42.  
  43. def _kmedoids_run(X, n_clusters, dist_func, max_iter=3, tol=0.000001, verbose=True):
  44.     '''run algorithm return centers, members, and etc.'''
  45.     # Get initial centers
  46.     n_samples, n_features = X.shape
  47.     init_ids = _get_init_centers(n_clusters,n_samples)
  48.     if verbose:
  49.         print 'Initial centers are ', init_ids
  50.     centers = init_ids
  51.     members, costs, tot_cost, dist_mat = _get_cost(X, init_ids,dist_func)
  52.     cc,SWAPED = 0, True
  53.     while True:
  54.         SWAPED = False
  55.         for i in range(n_samples):
  56.             if not i in centers:
  57.                 for j in range(len(centers)):
  58.                     centers_ = deepcopy(centers)
  59.                     centers_[j] = i
  60.                     members_, costs_, tot_cost_, dist_mat_ = _get_cost(X, centers_,dist_func)
  61.                     if tot_cost_-tot_cost < tol:
  62.                         members, costs, tot_cost, dist_mat = members_, costs_, tot_cost_, dist_mat_
  63.                         centers = centers_
  64.                         SWAPED = True
  65.                         if verbose:
  66.                             print 'Change centers to ', centers
  67.         if cc > max_iter:
  68.             if verbose:
  69.                 print 'End Searching by reaching maximum iteration', max_iter
  70.             break
  71.         if not SWAPED:
  72.             if verbose:
  73.                 print 'End Searching by no swaps'
  74.             break
  75.         cc += 1
  76.     return centers,members, costs, tot_cost, dist_mat
  77.  
  78. class KMedoids(object):
  79.     '''
  80.     Main API of KMedoids Clustering
  81.  
  82.     Parameters
  83.     --------
  84.         n_clusters: number of clusters
  85.         dist_func : distance function
  86.         max_iter: maximum number of iterations
  87.         tol: tolerance
  88.  
  89.     Attributes
  90.     --------
  91.         labels_    :  cluster labels for each data item
  92.         centers_   :  cluster centers id
  93.         costs_     :  array of costs for each cluster
  94.         n_iter_    :  number of iterations for the best trail
  95.  
  96.     Methods
  97.     -------
  98.         fit(X): fit the model
  99.             - X: 2-D numpy array, size = (n_sample, n_features)
  100.  
  101.         predict(X): predict cluster id given a test dataset.
  102.     '''
  103.     def __init__(self, n_clusters, dist_func=_get_distance, max_iter=3, tol=0.000001):
  104.         self.n_clusters = n_clusters
  105.         self.dist_func = dist_func
  106.         self.max_iter = max_iter
  107.         self.tol = tol
  108.  
  109.     def fit(self, X, plotit=True, verbose=True):
  110.         centers, members, costs, tot_cost, dist_mat = _kmedoids_run(
  111.             X, self.n_clusters, self.dist_func, max_iter=self.max_iter, tol=self.tol, verbose=verbose)
  112.         if plotit:
  113.             fig = plt.figure()
  114.             ax = fig.add_subplot(111)
  115.  
  116.  
  117.             for i in range(len(centers)):
  118.                 X_c = X[members == i, :]
  119.                 ax.scatter(X_c[:, 0], X_c[:, 1], label = i+1,alpha=0.5, s=30)
  120.                 ax.scatter(X[centers[i], 0], X[centers[i], 1],alpha=1., s=250, marker='*')
  121.             #ax.legend(bbox_to_anchor=(1, 1), fontsize="small", loc=2, borderaxespad=0.)
  122.             colormap = plt.cm.gist_ncar  # nipy_spectral, Set1,Paired
  123.             colorst = [colormap(i) for i in np.linspace(0, 0.9, len(ax.collections))]
  124.             for t, j1 in enumerate(ax.collections):
  125.                 j1.set_color(colorst[t])
  126.  
  127.         return
  128.  
  129.  
  130.     def predict(self,X):
  131.         raise NotImplementedError()
  132.  
  133.  
could you halp how to do it?
thanks
Oct 24 '16 #1
0 2221

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: Leo Breebaart | last post by:
Hi all, I have a question about Python and delayed evaluation. Short-circuiting of Boolean expressions implies that in: >>> if a() and b(): any possible side-effects the call to b() might...
1
by: willie | last post by:
Hi all: I have a clustering SQL Server on Node1 and Node2, the Node1 has named Instance1 and Node2 has named Instance2, no default instance. We tested it that everthing is OK, then we decide to...
2
by: Martin Magnusson | last post by:
Hi, I have defined a method of a base class which writes to a C-style array without initializing it. (No, I don't like C arrays either, but I have to use it here because of another library I'm...
21
by: dragoncoder | last post by:
Consider the following code. #include <stdio.h> int main() { int i =1; printf("%d ,%d ,%d\n",i,++i,i++); return 0; }
11
by: chmmr | last post by:
Hi, I am currently in the process of gathering info/experiences for an incoming Linux DB2 clustering phase we actually know nothing about (since we are doing it for the first time ever), so I...
3
by: dejavue82 | last post by:
Hi, Does anybody know of a software package that allows for several servers, running asp.net 2.0 to be clustered, regardless of where they are located (ie. without a hardware load balancer)....
2
by: Shum | last post by:
Hi! I need help ragarding the k-mean clustering algo. Any one have a source code in c#.. or a dll file that could be used in the project.... I cannot seem to identify the objects that could be...
9
by: sturlamolden | last post by:
Python allows the binding behaviour to be defined for descriptors, using the __set__ and __get__ methods. I think it would be a major advantage if this could be generalized to any object, by...
5
by: Lakesider | last post by:
Hi NG, I have a question about data: I have travel-times from A to B like this from | to | sec. A B 17 A B 18 A B 30 A B 32
3
by: Manish | last post by:
I think this question has been asked number of times. However, I am looking for some specific information. Perhaps some of you can help close the gap. Or perhaps you can point me towards right...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.