GitHub - AlirezaKm/HUE: Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems

Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems

Abstract

Undersampling is a popular method to solve imbalanced classification problems. However, sometimes it may remove too many majority samples which may lead to loss of informative samples. In this article, the hashing-based undersampling ensemble (HUE) is proposed to deal with this problem by constructing diversified training subspaces for undersampling. Samples in the majority class are divided into many subspaces by a hashing method. Each subspace corresponds to a training subset which consists of most of the samples from this subspace and a few samples from surrounding subspaces. These training subsets are used to train an ensemble of classification and regression tree classifiers with all minority class samples. The proposed method is tested on 25 UCI datasets against state-of-the-art methods. Experimental results show that the HUE outperforms other methods and yields good results on highly imbalanced datasets.

Authors

Wing WY Ng, Shichao Xu, Jianjun Zhang, Xing Tian, Tongwen Rong, Sam Kwong

Publisher

IEEE Transactions on Cybernetics

Publish Date

2020 / 06 / 29

You can access to the paper with the following link: IEEE, PDF

Installation

$ pip install -r ./requirements.txt

Run

You can read the run.py file and do the same as that (e.g. change it how you want) then run the following command:

$ python ./run.py

Directories and Files

Path	Description
data/	selected datasets in order to evaluate the implementation
ploting.py	Generate samples (circular) and plot and apply ITQ
ensemble.py	Implementation of HashBasedUndersamplingEnsemble
utils.py	useful functions like prapering data and evaluation
run.py	Using the implemented proposed method on the datasets

Selected Datasets

Datasets
abalone
car
flare-F
glass
ILPD
letter
seeds
Skin
wine
yeast5

TODO(s)

we need to support labels other than 1 and -1
improve the code

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/raw		data/raw
.gitignore		.gitignore
README.md		README.md
ensemble.py		ensemble.py
ploting.py		ploting.py
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems

Abstract

Authors

Wing WY Ng, Shichao Xu, Jianjun Zhang, Xing Tian, Tongwen Rong, Sam Kwong

Publisher

IEEE Transactions on Cybernetics

Publish Date

2020 / 06 / 29

Installation

Run

Directories and Files

Selected Datasets

TODO(s)

About

Languages

AlirezaKm/HUE

Folders and files

Latest commit

History

Repository files navigation

Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems

Abstract

Authors

Wing WY Ng, Shichao Xu, Jianjun Zhang, Xing Tian, Tongwen Rong, Sam Kwong

Publisher

IEEE Transactions on Cybernetics

Publish Date

2020 / 06 / 29

Installation

Run

Directories and Files

Selected Datasets

TODO(s)

About

Topics

Resources

Stars

Watchers

Forks

Languages