Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This study introduces a dual-task machine learning framework for predicting both the opera- tional completion and scientific success of clinical trials using data from ClinicalTrials.gov. Leverag- ing structured trial metadata and unstructured textual descriptions, we develop predictive models that assess whether trials are likely to complete and whether they meet their primary endpoints. For the first task, ensemble models like XGBoost significantly outperform traditional baselines, partic- ularly when enriched with contextual embeddings derived from BioLinkBERT. For the second task, we propose a novel large language model (LLM)-driven annotation pipeline using GPT-4o-mini to label trial success based on publication content. Human evaluation confirms its high accuracy. Across both tasks, our framework demonstrates the value of combining structured features, natural language processing, and scalable LLM-based labeling to improve the understanding and forecast- ing of clinical trial performance. This approach not only enhances predictive accuracy but also contributes to better utilization of large-scale biomedical data.

Details

PDF

from
to
Export
Download Full History