The research path of clustering

Hello, I'm happy to join in this platform. I'm a graduate student and my research interests is machine learning. I'm working on subspace clustering and related work. This is my first article in Bytes.

Unsupervised Learning

The core of Artificial Intelligence is machine learning(ML), whose main task is to identify and distinguish between things. ML is divided into two categories supervised learning and unsupervised learning. The main task of supervised learning is classification, i.e., to complete the distinction of new data with a large number of labeled data. The main task of unsupervised learning is clustering, i.e., to distinct data into many class without manual intervention.

Humanity must be clear aware of that the unsupervised learning is more difficult than supervised learning and there are far fewer researchers in unsupervised than in supervised. Thus, the process of unsupervised development is relatively slow. Nevertheless, the field of unsupervised learning has been explored by scholars for decades. Many research results such as the k-means algorithm were studied. Especially in recent years, with the importance of unsupervised learning has been recognized, more scholars have devoted themselves into this filed and have achieved breakthrough.

Clustering is one of the most important issue in the domain of unsupervised learning. Clustering is employed in many real-world problem, such as image segmentation, bioinformation and finance fraud. Clustering is able to group data which have no label, thus discovering the natural structure of data. Clustering always be apply in three areas as follow.
1. find latent structure of data
2. group data naturally
3. compressed data

Thousands of clustering algorithms have been published by humanity. These algorithms can be divided into division-based algorithm, hierarchy-based algorithm, density-based algorithm etc.

The research about clustering can be divided in three areas.
1. technology-centered research
2. data-centered research
3. clustering-derived-centered research

Some key research findings in field of clustering

Hartigan, J. A. , and M. A. Wong . A K-Means Clustering Algorithm. Applied Statistics, 1979, 28.1.

Luxburg, U. Von. A Tutorial on Spectral Clustering. Statistics and Computing, 2004, 17.4:395-416.

Comaniciu, D. , and P. Meer . Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Analysis & Machine Intelligence, 2002, 24.5:603-619.

Zhang, T. , R. Ramakrishnan , and M. Livny . BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Record, 1999, 25.2.

Frey, B. J. , and D. Dueck . Clustering by passing messages between data points. Science, 2007.

Ester, M., H. P. Kriegel, J. Sander, and X. Xu .A. Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, 226231.

Koga, H. , T. Ishibashi , and T. Watanabe . Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowledge and Information Systems, 2007, 12.1:25-53.

Elhamifar E., Vidal R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012, 35(11):2765-2781.

Rodriguez A., Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191):1492.
