Profanity and Sentiment Detection in Filipino Social Media Comments Using Transformer-Based NLP Models

Authors

  • Marc Laureta College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines
  • Wendell Alfred Feria College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines
  • Patrick Carl Limbag College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines
  • Bienmarc Montecillo College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines

Keywords:

Profanity Detection, NLP, Transformer Models, Filipino Language, Social Media Moderation, sentiment analysis

Abstract


Filipino is considered a low-resource language, which makes it challenging to process due to the limited availability of annotated datasets and linguistic tools. These challenges are further complicated by code switching, regional variations, and the evolving nature of slang in online conversations. To address these issues, the study used a developmental research design and applied three transformer-based models: BERT, DistilBERT, and XLNet. A total of 13,565 Reddit comments were collected using web scraping techniques and the Reddit PRAW API. The dataset underwent preprocessing, including annotation, cleaning, and augmentation. The models were trained and evaluated on their ability to classify profanity into four categories: Non-Profane, Mild, Moderate, and High. Among the models, BERT achieved the highest accuracy of 99.53%, followed by XLNet and DistilBERT. A web application and a Reddit bot were created to demonstrate real-time detection, filtering, and severity-based masking of profane content. Sentiment analysis was also performed to assess the emotional tone and intent behind each comment. The results highlight the system’s effectiveness in improving online content moderation through accurate and context-aware detection of profanity and sentiment in Filipino social media posts, and further suggest that handling profanity detection and sentiment analysis as separate but complementary tasks shows better performance and interpretability.

Downloads

Published

2025-06-30

How to Cite

Laureta, M., Feria, W. A., Limbag, P. C., & Montecillo, B. (2025). Profanity and Sentiment Detection in Filipino Social Media Comments Using Transformer-Based NLP Models. Isabela State University Linker: Journal of Education, Social Sciences and Allied Health, 2(1), 102–117. Retrieved from https://www.isujournals.ph/index.php/jessah/article/view/216