Extracting topical phrases from clinical documents

Yulan He*

*Corresponding author for this work

Research output: Chapter in Book/Published conference outputConference publication


In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the “bag-of-words” assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients’ discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.
Original languageEnglish
Title of host publicationProceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)
Number of pages7
ISBN (Electronic)9781577357605
Publication statusPublished - 12 Feb 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: 12 Feb 201617 Feb 2016


Conference30th AAAI Conference on Artificial Intelligence, AAAI 2016
Country/TerritoryUnited States

Cite this