Analysis of Imbalanced Data

Singh, Chandan Partap; Singh, Hari [Guided by]

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6701

Title:	Analysis of Imbalanced Data
Authors:	Singh, Chandan Partap Singh, Hari [Guided by]
Keywords:	Imbalanced data Algorithmic ensemble
Issue Date:	2019
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	Due to the expansion of data in large scale organizations, it has become crucial to upgrade the data mining techniques for decision making. In some scenarios such as medical diagnosis, fraudulent detection, imbalanced data is not a unique feature. Any dataset can be identified as an imbalanced dataset if the number of instances in one class are significantly higher than the other one (around 10:1). For example, in a dataset, there are 100 instances in under Class A and 2 instances under Class B. Though many existing classification algorithms are there, but most of them are biased towards the majority class. In any normal scenario, it is normal, but in the areas such as medical diagnosis and fraud detection, the minority class is ignored and hence wrong outcomes are deduced. Resampling is a very popular method to tackle this issue. It involves generating synthetic instances (known as over-sampling) or removing instances (known as under-sampling). Some modern algorithms such as the Ensemble Classifiers such as Random Forest are also explained. In this report, I have demonstrated the existing algorithms and I have proposed a new model whose accuracy (calculated on the basis of F1 score) is better than the traditional models.
URI:	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6701
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Analysis of Imbalanced Data.pdf		1.73 MB	Adobe PDF	View/Open

Show full item record