DistilmBERT and XLM RoBERTa for Multilingual Toxic Comment Classification.
We studied how to use pre-trained language model-based methods for toxic comment classification and the performances of different pre-trained language models on these tasks. This study introduces an ensemble approach, where we have used pre-trained models - DistilmBert and xlm-Roberta-large-xnli to perform the toxic comment classification task. We trained our model on an English dataset, tested it on Wikipedia talk page comments in several languages, and achieved an accuracy of over 93%.