Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool

Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' r...

Full description

Saved in:
Bibliographic Details
Main Authors: Rani, Sangeeta (Author), Singh Gill, Nasib (Author), Gulia, Preeti (Author)
Format: EJournal Article
Published: Institute of Advanced Engineering and Science, 2021-05-01.
Subjects:
Online Access:Get fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper 'Twitter US Airline data' from Kaggle data repository is used for sentiment classification of customers' reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models.
Item Description:https://ijeecs.iaescore.com/index.php/IJEECS/article/view/24322