Published June 2025 | Version v1
Thesis Open

Predicting Clinical Trial Completion and Success using Machine Learning and NLP

Creators

  • 1. University of Chicago

Contributors

Advisor:

Committee member:

Description

This study introduces a dual-task machine learning framework for predicting both the opera- tional completion and scientific success of clinical trials using data from ClinicalTrials.gov. Leverag- ing structured trial metadata and unstructured textual descriptions, we develop predictive models that assess whether trials are likely to complete and whether they meet their primary endpoints. For the first task, ensemble models like XGBoost significantly outperform traditional baselines, partic- ularly when enriched with contextual embeddings derived from BioLinkBERT. For the second task, we propose a novel large language model (LLM)-driven annotation pipeline using GPT-4o-mini to label trial success based on publication content. Human evaluation confirms its high accuracy. Across both tasks, our framework demonstrates the value of combining structured features, natural language processing, and scalable LLM-based labeling to improve the understanding and forecast- ing of clinical trial performance. This approach not only enhances predictive accuracy but also contributes to better utilization of large-scale biomedical data.

Files

thesis.pdf

Files (1.2 MB)

Name Size Download all
md5:712aea1ff71cb7787096c75441188eab
1.2 MB Preview Download

Additional details

Identifiers

Other
oai:uchicago.tind.io:15346

UChicago Information

Division(s)
Social Sciences Division
Department(s)
Computational Social Sciences (MACSS)