Hope Speech Detection Using Social Media Discourse (Posi-Vox-2024): A Transfer Learning Approach

Muhammad Ahmad; Sardar Usman; Humaira Farid; Iqra Ameer; Muhammad Muzzamil; Hmaza Ameer; Grigori Sidorov; Ildar Batyrshin

doi:10.17323/jle.2024.22443

Muhammad Ahmad Instituto Politecnico Nacional (CIC-IPN), Mexico City, Mexico https://orcid.org/0009-0003-8799-8212
Sardar Usman Institute of Arts and Culture, Lahore, Pakistan
Humaira Farid Independent Researcher, California, USA
Iqra Ameer Pennsylvania State University at Abington, PA, USA
Muhammad Muzzamil Islamia University of Bahawalpur, Pakistan
Hmaza Ameer Islamia University of Bahawalpur, Pakistan
Grigori Sidorov Instituto Politecnico Nacional (CIC-IPN), Mexico City, Mexico
Ildar Batyrshin Instituto Politecnico Nacional (CIC-IPN), Mexico City, Mexico https://orcid.org/0000-0003-0241-7902

DOI: https://doi.org/10.17323/jle.2024.22443

Keywords: Hope Speech, BERT, Mashine learning, Twitter Analysis, Social Media, Transfer learning, NLP

Abstract

Background: The notion of hope is characterized as an optimistic expectation or anticipation of favorable outcomes. In the age of extensive social media usage, research has primarily focused on monolingual techniques, and the Urdu and Arabic languages have not been addressed.

Purpose: This study addresses joint multilingual hope speech detection in the Urdu, English, and Arabic languages using a transfer learning paradigm. We developed a new multilingual dataset named Posi-Vox-2024 and employed a joint multilingual technique to design a universal classifier for multilingual dataset. We explored the fine-tuned BERT model, which demonstrated a remarkable performance in capturing semantic and contextual information.

Method: The framework includes (1) preprocessing, (2) data representation using BERT, (3) fine-tuning, and (4) classification of hope speech into binary (‘hope’ and ‘not hope’) and multi-class (realistic, unrealistic, and generalized hope) categories.

Results: Our proposed model (BERT) demonstrated benchmark performance to our dataset, achieving 0.78 accuracy in binary classification and 0.66 in multi-class classification, with a 0.04 and 0.08 performance improvement over the baselines (Logistic Regression, in binary class 0.75 and multi class 0.61), respectively.

Conclusion: Our findings will be applied to improve automated systems for detecting and promoting supportive content in English, Arabic and Urdu on social media platforms, fostering positive online discourse. This work sets new benchmarks for multilingual hope speech detection, advancing existing knowledge and enabling future research in underrepresented languages.

Downloads

Download data is not yet available.

References

Alawadh, H. M., Alabrah, A., Meraj, T., & Rauf, H. T. (2023). English language learning via YouTube: An NLP-based analysis of users' comments. Computers, 12(2), 24. DOI: https://doi.org/10.3390/computers12020024

Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., & Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. DOI: https://doi.org/10.1016/j.tcs.2022.06.020

Anjum, & Katarya, R. (2024). Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities. International Journal of Information Security, 23(1), 577-608. DOI: https://doi.org/10.1007/s10207-023-00755-2

Arif, M., Shahiki Tash, M., Jamshidi, A., Ullah, F., Ameer, I., Kalita, J.,.. & Balouchzahi, F. (2024). Analyzing hope speech from psycholinguistic and emotional perspectives. Scientific Reports, 14(1), 23548. DOI: https://doi.org/10.1038/s41598-024-74630-y

Austin, D., Sanzgiri, A., Sankaran, K., Woodard, R., Lissack, A., & Seljan, S. (2020). Classifying sensitive content in online advertisements with deep learning. International Journal of Data Science and Analytics, 10(3), 265-276. DOI: https://doi.org/10.1007/s41060-020-00212-6

Balouchzahi, F., Sidorov, G., & Gelbukh, A. (2023). Polyhope: Two-level hope speech detection from tweets. Expert Systems with Applications, 225, 120078. DOI: https://doi.org/10.1016/j.eswa.2023.120078

Chakravarthi, B. R. (2022). Hope speech detection in YouTube comments. Social Network Analysis and Mining, 12(1), 75. DOI: https://doi.org/10.1007/s13278-022-00901-z

Chakravarthi, B. R. (2022). Multilingual hope speech detection in English and Dravidian languages. International Journal of Data Science and Analytics, 14(4), 389-406. DOI: https://doi.org/10.1007/s41060-022-00341-0

Chinnappa, D. (2021). Dhivya-hope-detection@ LT-EDI-EACL2021: Multilingual hope speech detection for code-mixed and transliterated texts. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 73-78). Association for Computational Linguistics. https://aclanthology.org/2021.ltedi-1.11.

Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516.

Gowen, K., Deschaine, M., Gruttadara, D., & Markey, D. (2012). Young adults with mental health conditions and social networking websites: seeking tools to build community. Psychiatric Rehabilitation Journal, 35(3), 245. DOI: https://doi.org/10.2975/35.3.2012.245.250

Ghanghor, N., Ponnusamy, R., Kumaresan, P. K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITK@ LT-EDI-EACL2021: Hope speech detection for equality, diversity, and inclusion in Tamil, Malayalam and English. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 197-203). Association for Computational Linguistics.

Irfan, A., Azeem, D., Narejo, S., & Kumar, N. (2024). Multi-Modal Hate Speech Recognition Through Machine Learning. In 2024 IEEE 1st Karachi Section Humanitarian Technology Conference (KHI-HTC) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/KHI-HTC60760.2024.10482031

Kogilavani, S. V., Malliga, S., Jaiabinaya, K. R., Malini, M., & Kokila, M. M. (2023). Characterization and mechanical properties of offensive language taxonomy and detection techniques. Materials Today: Proceedings, 81, 630-633. DOI: https://doi.org/10.1016/j.matpr.2021.04.102

Kumar, A. Saumya, S., & Roy, P. (2022). SOA_NLP@ LT-EDI-ACL2022: An ensemble model for hope speech detection from YouTube comments. In Proceedings of the second workshop on language technology for equality, diversity and inclusion (pp. 223-228). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/2022.ltedi-1.31

Lee, Y., Yoon, S., & Jung, K. (2018).Comparative studies of detecting abusive language on twitter. arXiv preprint arXiv:1808.10245.

Louati, A., Louati, H., Albanyan, A., Lahyani, R., Kariri, E., & Alabduljabbar, A. (2024). Harnessing machine learning to unveil emotional responses to hateful content on social media. Computers, 13(5), 114. DOI: https://doi.org/10.3390/computers13050114

Malik, M. S. I., Nazarova, A., Jamjoom, M. M., & Ignatov, D. I. (2023). Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model. Journal of King Saud University-Computer and Information Sciences, 35(8), 101736. DOI: https://doi.org/10.1016/j.jksuci.2023.101736

Mnassri, Kh., Farahbakhsh, R., Chalehchaleh, R., Rajapaksha, P., Jafari, A.R., Li, G., & Crespi, N. (2024). A survey on multi-lingual offensive language detection. PeerJ.Computer Science, 10, e1934-e1934. DOI: https://doi.org/10.7717/peerj-cs.1934

Nagar, S., Barbhuiya, F. A., & Dey, K. (2023). Towards more robust hate speech detection: Using social context and user data.

Social Network Analysis and Mining, 13(1), 47. DOI: https://doi.org/10.1007/s13278-023-01051-6

Nath, T., Singh, V. K., & Gupta, V. (2023). BongHope: An annotated corpus for Bengali hope speech detection. Research Square. DOI: https://doi.org/10.21203/rs.3.rs-2819284/v1

Palakodety, S., KhudaBukhsh, A. R., & Carbonell, J. G. (2020). Hope speech detection: A computational analysis of the voice of peace. In ECAI 2020 (pp. 1881-1889). IOS Press.

RamakrishnaIyer LekshmiAmmal, H., Ravikiran, M., Nisha, G., Balamuralidhar, N., Madhusoodanan, A., Kumar Madasamy, A., & Chakravarthi, B. R. (2023). Overlapping word removal is all you need: Revisiting data imbalance in hope speech detection. Journal of Experimental & Theoretical Artificial Intelligence, 36(8), 1837-1859. DOI: https://doi.org/10.1080/0952813X.2023.2166130

Roy, P., Bhawal, S., Kumar, A., & Chakravarthi, B. R. (2022, May). IIITSurat@ LT-EDI-ACL2022: Hope speech detection using machine learning. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 120-126). Association for Computational Linguistics. https://aclanthology.org/2022.ltedi-1.13.

Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1-10). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W17-1101

Snyder, C. R., Rand, K. L., & Sigmon, D. R. (2002). Hope Theory: A Member of the Positive Psychology Family. In C. R. Snyder, & S. J. Lopez (Eds.), Handbook of Positive Psychology (pp. 257-276). Oxford University Press.

Subramanian, M., Sathiskumar, V. E., Deepalakshmi, G., Cho, J., & Manikandan, G. (2023). A survey on hate speech detection and sentiment analysis using machine learning and deep learning models. Alexandria Engineering Journal, 80, 110-121. DOI: https://doi.org/10.1016/j.aej.2023.08.038

Wang, Z., & Jurgens, D. (2018). It's going to be okay: Measuring access to support in online communities. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 33-45). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/D18-1004

Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848.

Yenala, H., Jhanwar, A., Chinnakotla, M. K., & Goyal, J. (2018). Deep learning for detecting inappropriate content in text. International Journal of Data Science and Analytics, 6, 273-286. DOI: https://doi.org/10.1007/s41060-017-0088-4

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666.

Hope Speech Detection Using Social Media Discourse (Posi-Vox-2024): A Transfer Learning Approach

Abstract

Downloads

References

Indexing