Manuscript received September 11, 2024; revised October 18, 2024; accepted December 4, 2024; published March 6, 2025
Abstract—Building upon our previous work on extracting and analyzing Stack Overflow data to uncover trends in programming languages, community contributions, and talent availability, this research investigates the impact of numeric attributes on tag recommendation. Utilizing the Stack Overflow Data Warehouse System developed in our prior study, we conduct a comprehensive analysis of multiple Machine Learning (ML) algorithms to evaluate their effectiveness in recommending tags based on an integration of specific numeric attributes with feature extraction techniques. The methodology involves extracting relevant data, preprocessing it, and applying Term Frequency-Inverse Document Frequency (TF-IDF) as a feature extraction technique alongside diverse ML algorithms, including Support Vector Machines (SVM), Gradient Boosting, Random Forest, and Decision Tree, to assess their performance. Our results indicate that this combination improves evaluation metrics, including F1 Score, Recall, and Precision, with a particularly significant influence on the Precision of tag recommendations, providing insights into the optimization of tagging systems on Q&A platforms. Future research will focus on integrating advanced models and refining data preprocessing techniques to further enhance tag prediction accuracy. This study extends the application of the Stack Overflow Data Warehouse System and contributes to the improvement of tag recommendation mechanisms in online technical communities.
Keywords—Stack Overflow Data Warehouse, tag recommendation, numeric attributes, TF-IDF, Machine Learning (ML), Support Vector Machines (SVM), gradient boosting
[PDF]
Cite: Seyede Sanaz. Jedari Jafari and Yousef. Emdadi, "Enhancing Tag Recommendation Precision on Stack Overflow Data Warehouse: An Integrated Approach Combining Numeric Attributes, Feature Extraction Techniques, and Multiple Machine Learning Algorithms," International Journal of Computer Theory and Engineering, vol. 17, no. 1, pp. 28-35, 2025.
Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).