Presentation Details
Two-center external validation of a machine learning natural language processing (NLP) algorithm for venous thromboembolism (VTE) ascertainment

Shengling Ma1, Omid Jafari1, Arash Maghsoudi1, Jennifer La2, 3, Emily Zhou4, Iuliia Kovalenko5, Steven Horng3, 6, Nathanael Fillmore2, 3, Ang Li1, Barbara Lam6.

1Baylor College of Medicine, Houston, TX, USA.2VA Boston Healthcare System, Boston, MA, USA.3Harvard Medical School, Boston, MA, USA.4University of Texas Health Science Center at Houston, Houston, TX, USA.5UPMC Harrisburg, Harrisburg, PA, USA.6Beth Israel Deaconess Medical Center, Boston, MA, USA

Abstract


Background The ability to identify VTE accurately and quickly in large clinical datasets is paramount for hemostasis and thrombosis research. Previously developed NLP algorithms have often relied on specific keyword extraction that are difficult to generalize to different healthcare systems.   Objectives We recently derived a transformer-based machine learning NLP algorithm from 800 cancer patients at the Harris Health System (HHS) (doi.org/10.1182/blood-2023-184756). We aimed to externally validate this algorithm at two large healthcare systems.   Methods We previously derived a novel VTE NLP algorithm by combining keyword-based pre-processing and a finetuned BioClinicalBERT transformer model on 2,000+ notes from HHS to reach a positive predictive value (PPV) of 98% and sensitivity of 98%. For the current study, we utilized free-text data from two clinical datasets: 1) Beth Israel Deaconess Medical Center (MIMIC-IV) and the 2) Veterans Affairs Healthcare System (VA) (Figure 1). For note-level validation (MIMIC-IV), we tested the algorithm on computed tomography angiography (CTA) radiology reports to identify PE during hospitalization. For patient-level validation (VA), we tested the algorithm on sequential radiology reports, discharge summaries, and outpatient progress notes to identify the first-ever VTE date after cancer diagnosis. Similar data pre-processing and model application was performed without model retraining. Gold standard, defined as acute pulmonary embolism (PE) in MIMIC-IV and PE or deep vein thrombosis (DVT) in VA, was determined by physician researcher manual chart adjudication. For patient-level validation, we classified an event as false positive if VTE occurred >90 days from the first predicted date.   Results Free-text clinical notes from MIMIC-IV and VA had different physician/patient characteristics, formatting styles, and line break/return structures. The pre-processing algorithm was able to successfully identify the keyword-containing sentences in each dataset. In MIMIC-IV, we identified 17,438 CTA reports with PE keywords. The transformer NLP model correctly labeled 1,437 out of 1,598 positive predictions (PPV 90%) and identified 1,437 out of 1,531 positive PE events (sensitivity 94%) (Figure 2). In VA, we sampled 800 cancer patients with VTE keywords. After excluding 24 patients with known historic events, the transformer NLP model correctly labeled first VTE events in 410 out of 462 patients (PPV 89%) and identified 410 out of 474 positive VTE events (sensitivity 87%). The majority of false positives were related to sentences describing chronic events such as “Multiple bilateral pulmonary emboli are again seen…” and “There is residual pulmonary embolism...”.    Conclusions In this external validation study, our recently derived transformer-based NLP algorithm for VTE performed well at two healthcare systems. On a note level, the algorithm correctly labeled 9 out of 10 PE among 17,438 CTA reports. On a patient level (more challenging), the algorithm correctly identified 9 out of 10 first-ever VTE dates among 38,533 notes in 776 patients after cancer diagnosis. The algorithm accomplished these tasks in <2 hours with minimal researcher guidance. It also appeared generalizable to different populations (non-cancer vs. cancer) and outcomes (PE vs. PE/DVT). We believe machine learning-based NLP algorithms have the potential to replace cumbersome billing codes and chart reviews for VTE outcome ascertainment in database studies.

No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.