Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

Altuncu, M. Tarik; Sorin, Eloise; Symons, Joshua D.; Mayer, Erik; Yaliraki, Sophia N.; Toni, Francesca; Barahona, Mauricio

Computer Science > Machine Learning

arXiv:1909.00183 (cs)

[Submitted on 31 Aug 2019]

Title:Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

Authors:M. Tarik Altuncu, Eloise Sorin, Joshua D. Symons, Erik Mayer, Sophia N. Yaliraki, Francesca Toni, Mauricio Barahona

View PDF

Abstract:The large volume of text in electronic healthcare records often remains underused due to a lack of methodologies to extract interpretable content. Here we present an unsupervised framework for the analysis of free text that combines text-embedding with paragraph vectors and graph-theoretical multiscale community detection. We analyse text from a corpus of patient incident reports from the National Health Service in England to find content-based clusters of reports in an unsupervised manner and at different levels of resolution. Our unsupervised method extracts groups with high intrinsic textual consistency and compares well against categories hand-coded by healthcare personnel. We also show how to use our content-driven clusters to improve the supervised prediction of the degree of harm of the incident based on the text of the report. Finally, we discuss future directions to monitor reports over time, and to detect emerging trends outside pre-existing categories.

Comments:	To appear as a book chapter
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR); Spectral Theory (math.SP); Machine Learning (stat.ML)
Cite as:	arXiv:1909.00183 [cs.LG]
	(or arXiv:1909.00183v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.00183

Submission history

From: Muhammed Tarik Altuncu [view email]
[v1] Sat, 31 Aug 2019 10:03:11 UTC (7,575 KB)

Computer Science > Machine Learning

Title:Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators