TY - GEN
T1 - Semantic and Heuristic Based Approach for Paraphrase Identification
AU - Mohamed, Muhidin A.
AU - Oussalah, Mourad
PY - 2019/4/30
Y1 - 2019/4/30
N2 - In this paper, we propose a semantic-based paraphrase identification approach. The core concept of this proposal is to identify paraphrases when sentences contain a set of named-entities and common words. The developed approach distinguishes the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from the crowd-sourced knowledge in Wikipedia database. Besides, we improve WordNet similarity measure by nominalizing verbs, adjectives and adverbs with the aid of Categorial Variation database (CatVar). The paraphrase identification system is then evaluated using two different datasets; namely, Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. Experimental results on the aforementioned datasets show that our system outperforms baselines in the paraphrase identification task.
AB - In this paper, we propose a semantic-based paraphrase identification approach. The core concept of this proposal is to identify paraphrases when sentences contain a set of named-entities and common words. The developed approach distinguishes the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from the crowd-sourced knowledge in Wikipedia database. Besides, we improve WordNet similarity measure by nominalizing verbs, adjectives and adverbs with the aid of Categorial Variation database (CatVar). The paraphrase identification system is then evaluated using two different datasets; namely, Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. Experimental results on the aforementioned datasets show that our system outperforms baselines in the paraphrase identification task.
KW - named-entity relatedness
KW - Paraphrase identification
KW - Sentence semantic similarity
KW - Word category subsumption
UR - http://www.scopus.com/inward/record.url?scp=85065762953&partnerID=8YFLogxK
U2 - 10.1109/SKG.2018.00037
DO - 10.1109/SKG.2018.00037
M3 - Conference publication
AN - SCOPUS:85065762953
SN - 978-1-7281-0442-3
T3 - 2018 14th International Conference on Semantics, Knowledge and Grids (SKG)
SP - 203
EP - 210
BT - Proceedings - 2018 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
PB - IEEE
T2 - 14th International Conference on Semantics, Knowledge and Grids, SKG 2018
Y2 - 12 September 2018 through 14 September 2018
ER -