Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Rani, Sangeeta (Author), Singh Gill, Nasib (Author), Gulia, Preeti (Author)
Format:	EJournal Article
Published:	Institute of Advanced Engineering and Science, 2021-05-01.
Subjects:	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
Online Access:	Get fulltext
Tags:	Add Tag No Tags, Be the first to tag this record!


LEADER	02929 am a22003133u 4500
001	ijeecs24322_15002
042			\|a dc
100	1	0	\|a Rani, Sangeeta \|e author
100	1	0	\|e contributor
700	1	0	\|a Singh Gill, Nasib \|e author
700	1	0	\|a Gulia, Preeti \|e author
245	0	0	\|a Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool
260			\|b Institute of Advanced Engineering and Science, \|c 2021-05-01.
500			\|a https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322
520			\|a Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models.
540			\|a Copyright (c) 2021 Institute of Advanced Engineering and Science
540			\|a http://creativecommons.org/licenses/by-nc/4.0
546			\|a eng
690			\|a Computer Science, Artificial Intelligence, Machine Learning
690			\|a Ensemble Classifier; IBK; SMO; Lexicon based classifier; Meta Stacking; REPTree; Voting Ensemble
655	7		\|a info:eu-repo/semantics/article \|2 local
655	7		\|a info:eu-repo/semantics/publishedVersion \|2 local
655	7		\|2 local
786	0		\|n Indonesian Journal of Electrical Engineering and Computer Science; Vol 22, No 2: May 2021; 1041-1051
786	0		\|n 2502-4760
786	0		\|n 2502-4752
786	0		\|n 10.11591/ijeecs.v22.i2
787	0		\|n https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322/15002
856	4	1	\|u https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322/15002 \|z Get fulltext

Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Similar Items