General Information
    • ISSN: 1793-8201 (Print), 2972-4511 (Online)
    • Abbreviated Title: Int. J. Comput. Theory Eng.
    • Frequency: Quarterly
    • DOI: 10.7763/IJCTE
    • Editor-in-Chief: Prof. Mehmet Sahinoglu
    • Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
    • Managing Editor: Ms. Cecilia Xie
    • Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI,  Google Scholar, EBSCO, etc.
    • Average Days from Submission to Acceptance: 192 days
    • APC: 800 USD
    • E-mail: editor@ijcte.org
    • Journal Metrics:
    • SCImago Journal & Country Rank
Article Metrics in Dimensions

IJCTE 2023 Vol.15(3): 101-110
DOI: 10.7763/IJCTE.2023.V15.1338

Predicting and Mitigating the Effect of Skewness on Credibility Assessment of Social Media Content Using Machine Learning: A Twitter Case Study

Shifaa Basharat, Saduf Afzal, Alwi M Bamhdi, Shozab Khurshid*, and Manzoor Chachoo

Manuscript received June 27, 2022; revised August 12, 2022; accepted March 9, 2023.

Abstract—Many strategies have been put forward to assess the credibility of online social media content, however, none of them focuses on the issue of accuracy paradox which mostly occurs in highly skewed datasets, a case that usually arises in real-life situations. The purpose of this paper is to explore the use of various machine learning models including Gaussian Naïve Bayes, Latent Dirichlet Allocation (LDA), Linear Regression, Logistic Regression, and Support Vector Machine (SVM) for identifying the credibility of tweets. This includes proposing a new algorithm where the generative properties of Gaussian naïve Bayes are integrated with the discriminative properties of logistic regression and the author evaluates its performance in terms of accuracy and prediction power of determining tweet credibility. The Machine Learning Models used in this study, implemented on the Twitter datasets extracted from various real-world events are compared based on their accuracy and predictive power, in determining the credibility of tweets, to identify various accuracy paradox cases. The proposed algorithm is then used for the credibility inference of tweets and the reduction in the number of accuracy paradox cases is monitored. An extensive experimental study is performed to evaluate the performance of the proposed model on Twitter datasets with varied degrees of skewness. Our proposed model achieved accuracy and predictive power of 97% and 94% for a balanced dataset and 99% and 93% for an imbalanced dataset with 99% skewness.

Index Terms—Gaussian Gradient Descent (GGD), Gaussian Naïve Bayes (GNB), intra-skewness, inter-skewness, predictive index

Shifaa Basharat, Saduf Afzal, Shozab Khurshid, and Manzoor Chachoo are with Department of Computer Science, University of Kashmir, India. Alwi Bamhdi is with the Computing College in AlQufudah, Umm Al-Qura University, Saudi Arabia.
*Correspondence: shozabkhurshid@gmail.com (S.K.)

[PDF]

Cite:Shifaa Basharat, Saduf Afzal, Alwi M Bamhdi, Shozab Khurshid, and Manzoor Chachoo, "Predicting and Mitigating the Effect of Skewness on Credibility Assessment of Social Media Content Using Machine Learning: A Twitter Case Study," International Journal of Computer Theory and Engineering vol. 15, no. 3, pp. 101-110, 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).


Copyright © 2008-2024. International Association of Computer Science and Information Technology. All rights reserved.