Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' r...

Full description

Saved in:
Bibliographic Details
Main Authors: Rani, Sangeeta (Author), Singh Gill, Nasib (Author), Gulia, Preeti (Author)
Format: EJournal Article
Published: Institute of Advanced Engineering and Science, 2021-05-01.
Subjects:
Online Access:Get fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
LEADER 02929 am a22003133u 4500
001 ijeecs24322_15002
042 |a dc 
100 1 0 |a Rani, Sangeeta  |e author 
100 1 0 |e contributor 
700 1 0 |a Singh Gill, Nasib  |e author 
700 1 0 |a Gulia, Preeti  |e author 
245 0 0 |a Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool 
260 |b Institute of Advanced Engineering and Science,   |c 2021-05-01. 
500 |a https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322 
520 |a Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models. 
540 |a Copyright (c) 2021 Institute of Advanced Engineering and Science 
540 |a http://creativecommons.org/licenses/by-nc/4.0 
546 |a eng 
690 |a Computer Science, Artificial Intelligence, Machine Learning 
690 |a Ensemble Classifier; IBK; SMO; Lexicon based classifier; Meta Stacking; REPTree; Voting Ensemble 
655 7 |a info:eu-repo/semantics/article  |2 local 
655 7 |a info:eu-repo/semantics/publishedVersion  |2 local 
655 7 |2 local 
786 0 |n Indonesian Journal of Electrical Engineering and Computer Science; Vol 22, No 2: May 2021; 1041-1051 
786 0 |n 2502-4760 
786 0 |n 2502-4752 
786 0 |n 10.11591/ijeecs.v22.i2 
787 0 |n https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322/15002 
856 4 1 |u https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322/15002  |z Get fulltext