Comparative Analysis of Clustering Approaches for Big Data Analysis

Satish S. Banait; Shrish S Sane

doi:http://doi.org/ https://doi.org/10.14445/22312803/IJCTT-V70I3P105

Article type :

Review article

Author :

Satish S. Banait ,Shrish S Sane

Volume :

Issue :

Abstract :

This paper performs a comparative study of the most popular big data clustering techniques. Clustering is an unsupervised classification of patterns (observations, data items or feature vectors) into teams (clusters). The drawbacks of clustering have been noticed in several contexts by researchers in many disciplines and react to its broad charm and quality in concert with the steps in exploratory data analysis. K-means clustering algorithm falls underneath the category of centroid-based clustering. Hierarchical clustering is a cluster analysis technique that seeks to construct a hierarchy of clusters. Agglomerative clustering is a form of hierarchical clustering that uses the backside-up technique. Density-based Spatial Clustering of Algorithms with Noise (DBSCAN) is a clustering algorithm that organisations collectively point near every other primarily based on a distance dimension (Euclidean distance) and a minimal quantity of factors. Map-reduce is a programming paradigm for huge datasets which may be processed speedily by processing them on distributed clusters in parallel. This paper compares k-means, hierarchical agglomerative clustering, DBSCAN and k-means with map-reduce strategies for clustering big data.

Keyword :

Big Data, Clustering Strategies, Density-Based Spatial Clustering, Hierarchical Agglomerative Clustering, K-Means.

Doi :

https://doi.org/10.14445/22312803/IJCTT-V70I3P105