473,548 Members | 2,716 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to use clustering evaluation elbow method in K-Medoids

2 New Member
i use this code as my reference :


Expand|Select|Wrap|Line Numbers
  1. import pylab as plt
  2. import numpy as np
  3. from scipy.spatial.distance import cdist, pdist
  4. from sklearn.cluster import KMeans
  5. from sklearn.datasets import load_iris
  6.  
  7. iris = load_iris()
  8.  
  9. k = range(1,11)
  10.  
  11. clusters = [KMeans(n_clusters = c,init = 'k-means++').fit(iris.data) for c in k]
  12. centr_lst = [cc.cluster_centers_ for cc in clusters]
  13.  
  14. k_distance = [cdist(iris.data, cent, 'euclidean') for cent in centr_lst]
  15. clust_indx = [np.argmin(kd,axis=1) for kd in k_distance]
  16. distances = [np.min(kd,axis=1) for kd in k_distance]
  17. avg_within = [np.sum(dist)/iris.data.shape[0] for dist in distances]
  18.  
  19. with_in_sum_square = [np.sum(dist ** 2) for dist in distances]
  20. to_sum_square = np.sum(pdist(iris.data) ** 2)/iris.data.shape[0]
  21. bet_sum_square = to_sum_square - with_in_sum_square
  22.  
  23. kidx = 2
  24.  
  25. fig = plt.figure()
  26. ax = fig.add_subplot(111)
  27. ax.plot(k, avg_within, 'g*-')
  28. ax.plot(k[kidx], avg_within[kidx], marker='o', markersize=12, \
  29. markeredgewidth=2, markeredgecolor='r', markerfacecolor='None')
  30. plt.grid(True)
  31. plt.xlabel('Number of clusters')
  32. plt.ylabel('Average within-cluster sum of squares')
  33. plt.title('Elbow for KMeans clustering (IRIS Data)')
  34.  
i want to change K-Means with K-Medoids.
this is my k-medoids code :
Expand|Select|Wrap|Line Numbers
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import matplotlib.cm as cm
  4. from copy import deepcopy
  5. from IPython import embed
  6. import time
  7.  
  8. def _get_init_centers(n_clusters, n_samples):
  9.     '''return random points as initial centers'''
  10.     init_ids = []
  11.     while len(init_ids) < n_clusters:
  12.         _ = np.random.randint(0,n_samples)
  13.         if not _ in init_ids:
  14.             init_ids.append(_)
  15.     return init_ids
  16.  
  17. def _get_distance(data1, data2):
  18.     '''example distance function'''
  19.     return np.sqrt(np.sum((data1 - data2)**2))
  20.  
  21. def _get_cost(X, centers_id, dist_func):
  22.     '''return total cost and cost of each cluster'''
  23.     st = time.time()
  24.     dist_mat = np.zeros((len(X),len(centers_id)))
  25.     # compute distance matrix
  26.     for j in range(len(centers_id)):
  27.         center = X[centers_id[j],:]
  28.         for i in range(len(X)):
  29.             if i == centers_id[j]:
  30.                 dist_mat[i,j] = 0.
  31.             else:
  32.                 dist_mat[i,j] = dist_func(X[i,:], center)
  33.     #print 'cost ', -st+time.time()
  34.     mask = np.argmin(dist_mat,axis=1)
  35.     members = np.zeros(len(X))
  36.     costs = np.zeros(len(centers_id))
  37.     for i in range(len(centers_id)):
  38.         mem_id = np.where(mask==i)
  39.         members[mem_id] = i
  40.         costs[i] = np.sum(dist_mat[mem_id,i])
  41.     return members, costs, np.sum(costs), dist_mat
  42.  
  43. def _kmedoids_run(X, n_clusters, dist_func, max_iter=3, tol=0.000001, verbose=True):
  44.     '''run algorithm return centers, members, and etc.'''
  45.     # Get initial centers
  46.     n_samples, n_features = X.shape
  47.     init_ids = _get_init_centers(n_clusters,n_samples)
  48.     if verbose:
  49.         print 'Initial centers are ', init_ids
  50.     centers = init_ids
  51.     members, costs, tot_cost, dist_mat = _get_cost(X, init_ids,dist_func)
  52.     cc,SWAPED = 0, True
  53.     while True:
  54.         SWAPED = False
  55.         for i in range(n_samples):
  56.             if not i in centers:
  57.                 for j in range(len(centers)):
  58.                     centers_ = deepcopy(centers)
  59.                     centers_[j] = i
  60.                     members_, costs_, tot_cost_, dist_mat_ = _get_cost(X, centers_,dist_func)
  61.                     if tot_cost_-tot_cost < tol:
  62.                         members, costs, tot_cost, dist_mat = members_, costs_, tot_cost_, dist_mat_
  63.                         centers = centers_
  64.                         SWAPED = True
  65.                         if verbose:
  66.                             print 'Change centers to ', centers
  67.         if cc > max_iter:
  68.             if verbose:
  69.                 print 'End Searching by reaching maximum iteration', max_iter
  70.             break
  71.         if not SWAPED:
  72.             if verbose:
  73.                 print 'End Searching by no swaps'
  74.             break
  75.         cc += 1
  76.     return centers,members, costs, tot_cost, dist_mat
  77.  
  78. class KMedoids(object):
  79.     '''
  80.     Main API of KMedoids Clustering
  81.  
  82.     Parameters
  83.     --------
  84.         n_clusters: number of clusters
  85.         dist_func : distance function
  86.         max_iter: maximum number of iterations
  87.         tol: tolerance
  88.  
  89.     Attributes
  90.     --------
  91.         labels_    :  cluster labels for each data item
  92.         centers_   :  cluster centers id
  93.         costs_     :  array of costs for each cluster
  94.         n_iter_    :  number of iterations for the best trail
  95.  
  96.     Methods
  97.     -------
  98.         fit(X): fit the model
  99.             - X: 2-D numpy array, size = (n_sample, n_features)
  100.  
  101.         predict(X): predict cluster id given a test dataset.
  102.     '''
  103.     def __init__(self, n_clusters, dist_func=_get_distance, max_iter=3, tol=0.000001):
  104.         self.n_clusters = n_clusters
  105.         self.dist_func = dist_func
  106.         self.max_iter = max_iter
  107.         self.tol = tol
  108.  
  109.     def fit(self, X, plotit=True, verbose=True):
  110.         centers, members, costs, tot_cost, dist_mat = _kmedoids_run(
  111.             X, self.n_clusters, self.dist_func, max_iter=self.max_iter, tol=self.tol, verbose=verbose)
  112.         if plotit:
  113.             fig = plt.figure()
  114.             ax = fig.add_subplot(111)
  115.  
  116.  
  117.             for i in range(len(centers)):
  118.                 X_c = X[members == i, :]
  119.                 ax.scatter(X_c[:, 0], X_c[:, 1], label = i+1,alpha=0.5, s=30)
  120.                 ax.scatter(X[centers[i], 0], X[centers[i], 1],alpha=1., s=250, marker='*')
  121.             #ax.legend(bbox_to_anchor=(1, 1), fontsize="small", loc=2, borderaxespad=0.)
  122.             colormap = plt.cm.gist_ncar  # nipy_spectral, Set1,Paired
  123.             colorst = [colormap(i) for i in np.linspace(0, 0.9, len(ax.collections))]
  124.             for t, j1 in enumerate(ax.collections):
  125.                 j1.set_color(colorst[t])
  126.  
  127.         return
  128.  
  129.  
  130.     def predict(self,X):
  131.         raise NotImplementedError()
  132.  
  133.  
could you halp how to do it?
thanks
Oct 24 '16 #1
0 2230

Sign in to post your reply or Sign up for a free account.

Similar topics

7
2835
by: Leo Breebaart | last post by:
Hi all, I have a question about Python and delayed evaluation. Short-circuiting of Boolean expressions implies that in: >>> if a() and b(): any possible side-effects the call to b() might have will not happen of a() returns true, because then b() will never be
1
2150
by: willie | last post by:
Hi all: I have a clustering SQL Server on Node1 and Node2, the Node1 has named Instance1 and Node2 has named Instance2, no default instance. We tested it that everthing is OK, then we decide to move to DR location. The relocation kept the same virtual and phusical server name, and we did not change SQL Server server network utility. But...
2
1884
by: Martin Magnusson | last post by:
Hi, I have defined a method of a base class which writes to a C-style array without initializing it. (No, I don't like C arrays either, but I have to use it here because of another library I'm using.) Calling this without first allocating memory would of course give a run-time error, but what puzzles me is that I get a segmentation fault...
21
4089
by: dragoncoder | last post by:
Consider the following code. #include <stdio.h> int main() { int i =1; printf("%d ,%d ,%d\n",i,++i,i++); return 0; }
11
730
by: chmmr | last post by:
Hi, I am currently in the process of gathering info/experiences for an incoming Linux DB2 clustering phase we actually know nothing about (since we are doing it for the first time ever), so I would really appreciate if you could share some of your experience here or recommend some additional websites/literature aside from IBM Developerworks...
3
1934
by: dejavue82 | last post by:
Hi, Does anybody know of a software package that allows for several servers, running asp.net 2.0 to be clustered, regardless of where they are located (ie. without a hardware load balancer). This way one could link two or more dedicated servers, perhaps even running on different hosts. Does something like this exist? Thank you for your...
2
6702
by: Shum | last post by:
Hi! I need help ragarding the k-mean clustering algo. Any one have a source code in c#.. or a dll file that could be used in the project.... I cannot seem to identify the objects that could be used in the algo.. I have to group the diseases according to provinces, and plot them in graph, this will show how many diseases are more in which...
9
3483
by: sturlamolden | last post by:
Python allows the binding behaviour to be defined for descriptors, using the __set__ and __get__ methods. I think it would be a major advantage if this could be generalized to any object, by allowing the assignment operator (=) to be overloaded. One particular use for this would be to implement "lazy evaluation". For example it would allow...
5
2119
by: Lakesider | last post by:
Hi NG, I have a question about data: I have travel-times from A to B like this from | to | sec. A B 17 A B 18 A B 30 A B 32
3
3323
by: Manish | last post by:
I think this question has been asked number of times. However, I am looking for some specific information. Perhaps some of you can help close the gap. Or perhaps you can point me towards right direction. Perhaps this group can help me fill in ms-sqlserver related following questions. 1. Do this database have data Clustering capabilities?...
0
7518
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7444
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7711
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7954
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7467
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
1
5367
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5085
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3497
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
755
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.