TY - JOUR
T1 - ALEXSIS-PT
T2 - 29th International Conference on Computational Linguistics, COLING 2022
AU - North, Kai
AU - Zampieri, Marcos
AU - Ranasinghe, Tharindu
N1 - Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
PY - 2022/10/17
Y1 - 2022/10/17
N2 - Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions. To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled following the ALEXSIS protocol for Spanish opening exciting new avenues for cross-lingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.
AB - Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions. To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled following the ALEXSIS protocol for Spanish opening exciting new avenues for cross-lingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.
UR - http://www.scopus.com/inward/record.url?scp=85165738769&partnerID=8YFLogxK
UR - https://aclanthology.org/2022.coling-1.529/#:~:text=ALEXSIS%2DPT%20is%20the%20first,performance%20across%20all%20evaluation%20metrics.
M3 - Conference article
AN - SCOPUS:85165738769
SN - 2951-2093
VL - 29
SP - 6057
EP - 6062
JO - Proceedings - International Conference on Computational Linguistics, COLING
JF - Proceedings - International Conference on Computational Linguistics, COLING
IS - 1
Y2 - 12 October 2022 through 17 October 2022
ER -