Feature selection to increase the random forest method performance on high dimensional data

Random Forest is a supervised classification method based on bagging (Bootstrap aggregating) Breiman and random selection of features. The choice of features randomly assigned to the Random Forest makes it possible that the selected feature is not necessarily informative. So it is necessary to selec...

Full description

Saved in:

Bibliographic Details
Main Authors:	Prasetiyowati, Maria Irmina (Author), Maulidevi, Nur Ulfa (Author), Surendro, Kridanto (Author)
Format:	EJournal Article
Published:	Universitas Ahmad Dahlan, 2020-11-06.
Subjects:	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
Online Access:	Get Fulltext
Tags:	Add Tag No Tags, Be the first to tag this record!


LEADER	03034 am a22002893u 4500
001	IJAIN_471_ijain_v6i3_p303-312
042			\|a dc
100	1	0	\|a Prasetiyowati, Maria Irmina \|e author
100	1	0	\|e contributor
700	1	0	\|a Maulidevi, Nur Ulfa \|e author
700	1	0	\|a Surendro, Kridanto \|e author
245	0	0	\|a Feature selection to increase the random forest method performance on high dimensional data
260			\|b Universitas Ahmad Dahlan, \|c 2020-11-06.
500			\|a https://ijain.org/index.php/IJAIN/article/view/471
520			\|a Random Forest is a supervised classification method based on bagging (Bootstrap aggregating) Breiman and random selection of features. The choice of features randomly assigned to the Random Forest makes it possible that the selected feature is not necessarily informative. So it is necessary to select features in the Random Forest. The purpose of choosing this feature is to select an optimal subset of features that contain valuable information in the hope of accelerating the performance of the Random Forest method. Mainly for the execution of high-dimensional datasets such as the Parkinson, CNAE-9, and Urban Land Cover dataset. The feature selection is done using the Correlation-Based Feature Selection method, using the BestFirst method. Tests were carried out 30 times using the K-Cross Fold Validation value of 10 and dividing the dataset into 70% training and 30% testing. The experiments using the Parkinson dataset obtained a time difference of 0.27 and 0.28 seconds faster than using the Random Forest method without feature selection. Likewise, the trials in the Urban Land Cover dataset had 0.04 and 0.03 seconds, while for the CNAE-9 dataset, the difference time was 2.23 and 2.81 faster than using the Random Forest method without feature selection. These experiments showed that the Random Forest processes are faster when using the first feature selection. Likewise, the accuracy value increased in the two previous experiments, while only the CNAE-9 dataset experiment gets a lower accuracy. This research's benefits is by first performing feature selection steps using the Correlation-Base Feature Selection method can increase the speed of performance and accuracy of the Random Forest method on high-dimensional data.
540			\|a Copyright (c) 2020 Maria Irmina Prasetiyowati, Nur Ulfa Maulidevi, Kridanto Surendro
540			\|a https://creativecommons.org/licenses/by-sa/4.0
546			\|a eng
690			\|a Random forest; Feature selection; BestFirst method; High dimensional data; CNAE-9 dataset
655	7		\|a info:eu-repo/semantics/article \|2 local
655	7		\|a info:eu-repo/semantics/publishedVersion \|2 local
655	7		\|2 local
786	0		\|n International Journal of Advances in Intelligent Informatics; Vol 6, No 3 (2020): November 2020; 303-312
786	0		\|n 2548-3161
786	0		\|n 2442-6571
787	0		\|n https://ijain.org/index.php/IJAIN/article/view/471/ijain_v6i3_p303-312
856	4	1	\|u https://ijain.org/index.php/IJAIN/article/view/471/ijain_v6i3_p303-312 \|z Get Fulltext

Feature selection to increase the random forest method performance on high dimensional data

Similar Items