DEEP LEARNING-BASED CLASSIFICATION OF TOXIC ONLINE COMMENTS USING LONG SHORT-TERM MEMORY (LSTM) FOR SENTIMENT ANALYSIS
DOI:
https://doi.org/10.35631/JISTM.937026Keywords:
Comment Classification, Deep Learning, Sentiment Analysis, LSTM, Toxic CommentsAbstract
Toxic online comments have become a growing issue, spread hate and negativity, and create hostile environments that discourage constructive dialogue in online communities. They can lead to psychological distress for individuals, reduce user participation, and harm the reputation of platforms. Thus, the study aims to identify different types of toxic comments and determine whether they are positive, negative, or neutral. Analyzing various articles revealed key types of toxicity, such as obscenity, threats, severe toxicity, identity hate, and insults. A dataset comprising approximately 159,000 comments from an open-source website, specifically Wikipedia’s talk page edits and thoroughly cleaned the dataset through pre-processing. Sentiment analysis was performed using the VADER Lexicon to understand sentiment polarities in these comments. Additionally, two deep learning approaches, LSTM and LSTM with GloVe word embeddings, were tested to compare the performance of both models. The data was split into an 80:20 ratio for training and testing, and tested different hyperparameters: batch sizes of 32, 64, and 128, and epochs set at 5, 10, and 15. The best results were achieved from LSTM with GloVe word embeddings, yielding an accuracy of 0.904, with a batch size of 64 and 5 epochs, and the highest precision recorded at 0.89. While the findings are promising, there is potential for improvement, including comparisons with other deep learning methods and alternative word understanding techniques.