Abstract
Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT’s performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: EMNLP 2021 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1792-1798 |
DOIs | |
Publication status | Published - Nov 2021 |
Event | The 2021 Conference on Empirical Methods in Natural Language Processing - Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 https://2021.emnlp.org/ |
Conference
Conference | The 2021 Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP 2021 |
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 7/11/21 → 11/11/21 |
Internet address |