Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6701
Title: Analysis of Imbalanced Data
Authors: Singh, Chandan Partap
Singh, Hari [Guided by]
Keywords: Imbalanced data
Algorithmic ensemble
Issue Date: 2019
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: Due to the expansion of data in large scale organizations, it has become crucial to upgrade the data mining techniques for decision making. In some scenarios such as medical diagnosis, fraudulent detection, imbalanced data is not a unique feature. Any dataset can be identified as an imbalanced dataset if the number of instances in one class are significantly higher than the other one (around 10:1). For example, in a dataset, there are 100 instances in under Class A and 2 instances under Class B. Though many existing classification algorithms are there, but most of them are biased towards the majority class. In any normal scenario, it is normal, but in the areas such as medical diagnosis and fraud detection, the minority class is ignored and hence wrong outcomes are deduced. Resampling is a very popular method to tackle this issue. It involves generating synthetic instances (known as over-sampling) or removing instances (known as under-sampling). Some modern algorithms such as the Ensemble Classifiers such as Random Forest are also explained. In this report, I have demonstrated the existing algorithms and I have proposed a new model whose accuracy (calculated on the basis of F1 score) is better than the traditional models.
URI: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6701
Appears in Collections:B.Tech. Project Reports

Files in This Item:
File Description SizeFormat 
Analysis of Imbalanced Data.pdf1.73 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.