473,387 Members | 1,517 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,387 developers and data experts.

The research path of clustering

Hello, I'm happy to join in this platform. I'm a graduate student and my research interests is machine learning. I'm working on subspace clustering and related work. This is my first article in Bytes.

Unsupervised Learning

The core of Artificial Intelligence is machine learning(ML), whose main task is to identify and distinguish between things. ML is divided into two categories supervised learning and unsupervised learning. The main task of supervised learning is classification, i.e., to complete the distinction of new data with a large number of labeled data. The main task of unsupervised learning is clustering, i.e., to distinct data into many class without manual intervention.

Humanity must be clear aware of that the unsupervised learning is more difficult than supervised learning and there are far fewer researchers in unsupervised than in supervised. Thus, the process of unsupervised development is relatively slow. Nevertheless, the field of unsupervised learning has been explored by scholars for decades. Many research results such as the k-means algorithm were studied. Especially in recent years, with the importance of unsupervised learning has been recognized, more scholars have devoted themselves into this filed and have achieved breakthrough.

Clustering is one of the most important issue in the domain of unsupervised learning. Clustering is employed in many real-world problem, such as image segmentation, bioinformation and finance fraud. Clustering is able to group data which have no label, thus discovering the natural structure of data. Clustering always be apply in three areas as follow.
1. find latent structure of data
2. group data naturally
3. compressed data

Thousands of clustering algorithms have been published by humanity. These algorithms can be divided into division-based algorithm, hierarchy-based algorithm, density-based algorithm etc.

The research about clustering can be divided in three areas.
1. technology-centered research
2. data-centered research
3. clustering-derived-centered research


Some key research findings in field of clustering

Hartigan, J. A. , and M. A. Wong . A K-Means Clustering Algorithm. Applied Statistics, 1979, 28.1.

Luxburg, U. Von. A Tutorial on Spectral Clustering. Statistics and Computing, 2004, 17.4:395-416.

Comaniciu, D. , and P. Meer . Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Analysis & Machine Intelligence, 2002, 24.5:603-619.

Zhang, T. , R. Ramakrishnan , and M. Livny . BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Record, 1999, 25.2.

Frey, B. J. , and D. Dueck . Clustering by passing messages between data points. Science, 2007.

Ester, M., H. P. Kriegel, J. Sander, and X. Xu .A. Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, 226–231.

Koga, H. , T. Ishibashi , and T. Watanabe . Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowledge and Information Systems, 2007, 12.1:25-53.

Elhamifar E., Vidal R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012, 35(11):2765-2781.

Rodriguez A., Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191):1492.
Oct 12 '21 #1
0 4397

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Nico de Groot | last post by:
I have a 2 node Microsoft 2000 cluster with a shared storage device. I want to create automatic failover for MS SQL 2000 server. I can do that wit the following options: 1. Active/Pasive (one...
1
by: kumar | last post by:
Dear Friends, I wanted to configure Failover cluster for SQL Server 2000 on Windows 2000 advanced servers. I have only 2 no.s of windows 2000 advanced server m/cs. I dont have any shared...
3
by: Shabam | last post by:
When a web application becomes overloaded with traffic, one can offload it by load balancing and clustering the front end web servers. What happens when the back-end MSSQL database becomes...
1
by: willie | last post by:
Hi all: I have a clustering SQL Server on Node1 and Node2, the Node1 has named Instance1 and Node2 has named Instance2, no default instance. We tested it that everthing is OK, then we decide to...
2
by: CSN | last post by:
Just wondering - is there something similar to this (clustering) for PostgreSQL? If so, how does it compare? http://www.mysql.com/press/release_2003_30.html ...
3
by: datapro01 | last post by:
Running DB2 version 8.1.1 on AIX 5.1.1 The table (employee) is being reorged and has a clustering index (empid). Is there any different between these two commands? db2 reorg table employee...
11
by: chmmr | last post by:
Hi, I am currently in the process of gathering info/experiences for an incoming Linux DB2 clustering phase we actually know nothing about (since we are doing it for the first time ever), so I...
3
by: dejavue82 | last post by:
Hi, Does anybody know of a software package that allows for several servers, running asp.net 2.0 to be clustered, regardless of where they are located (ie. without a hardware load balancer)....
5
by: Lakesider | last post by:
Hi NG, I have a question about data: I have travel-times from A to B like this from | to | sec. A B 17 A B 18 A B 30 A B 32
3
by: Manish | last post by:
I think this question has been asked number of times. However, I am looking for some specific information. Perhaps some of you can help close the gap. Or perhaps you can point me towards right...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.