TY - JOUR
T1 - SRL-ESA-TextSum
T2 - A text summarization approach based on semantic role labeling and explicit semantic analysis
AU - Mohamed, Muhidin
AU - Oussalah, Mourad
PY - 2019/7/1
Y1 - 2019/7/1
N2 - Automatic text summarization attempts to provide an effective solution to today's unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia.
AB - Automatic text summarization attempts to provide an effective solution to today's unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia.
KW - Concept graphs
KW - Iterative ranking algorithm
KW - Semantic role labeling
KW - Semantic similarity
KW - Text summarization
KW - Wikipedia concepts
UR - http://www.scopus.com/inward/record.url?scp=85064273642&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/science/article/pii/S0306457318306745?via%3Dihub
U2 - 10.1016/j.ipm.2019.04.003
DO - 10.1016/j.ipm.2019.04.003
M3 - Article
AN - SCOPUS:85064273642
SN - 0306-4573
VL - 56
SP - 1356
EP - 1372
JO - Information Processing and Management
JF - Information Processing and Management
IS - 4
ER -