Clustering Techniques in Machine Learning

Singh, Vatsal; Bharti, Monika [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/10306

Title:	Clustering Techniques in Machine Learning
Authors:	Singh, Vatsal Bharti, Monika [Guided by]
Keywords:	Data Clustering Machine Learning Genetic Algorithm
Issue Date:	2023
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	In today’s era data generated by scientific applications and corporate environment has grown rapidly not only in size but also in variety. This data collected is of huge amount and there is a difficulty in collecting and analyzing such big data. Data mining is the technique in which useful information and hidden relationship among data is extracted, but the traditional data mining approaches cannot be directly used for big data due to their inherent complexity. Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogenous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. Moreover, the choice of input parameters such as the number of clusters, number of nearest neighbors and other factors in these algorithms make the clustering more challengeable topic. Thus any incorrect choice of these parameters yields bad clustering results. Furthermore, these algorithms suffer from unsatisfactory accuracy when the dataset contains clusters with different complex shapes, densities, sizes, noise, and outliers. In this project, we propose a new approach for unsupervised clustering task. Our approach consists of three phases of operations. In the first phase we use the Genetic algorithm for finding first initial cluster centroid. In genetic algorithm we use a crossover and mutation of the dataset. The second phase, takes these initial cluster centroid produced by genetic algorithm for finding clusters using K-means clustering. From the second phase we obtain a set of clusters of the given dataset. Hence, the third phase considers these clusters for evaluation of cluster based on Davies Bouldin Index. This new algorithm is named as Genetic K-means Algorithm (GKA). We present experiments that provide the strength of our new proposed algorithm in discovering clusters with different non-convex shapes, sizes, densities, noise, outliers and higher accuracy. These experiments show the superiority of our proposed algorithm when comparing with K-means algorithm.
URI:	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/10306
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Clustering Techniques in Machine Learning.pdf		9.08 MB	Adobe PDF	View/Open

Show full item record