In clinical research, databases are currently filled predominantly by data experts known as clinical trial technicians (TEC) and clinical research associates (CRA). This information comes mainly from patient records, medical reports, and biological or imaging examinations. However, data extraction is not simple because the majority of health records are stored in unstructured formats, written in natural language, with heterogeneous terminology, abbreviations, styles, and quality levels that differ from one center to another. Systematic verification of sensitive data entry is performed by a supervising CRA in prospective clinical trials.
The LUCC project aims at building large-scale clinical databases by automating the extraction of medical data via artificial intelligence (AI).
This retrospective study was conducted on 311 patients and 31 clinical variables (demographics, risk factors, genomic biomarkers, treatments) from 10 participating centers (public and private healthcare facilities). It compared three methods of data extraction from medical reports: the manual method where experienced clinical research associates manually extracted data via an electronic platform; the AI-automated method using artificial intelligence for automatic extraction; and the hybrid method combining AI extraction, AI targeting, and complementary manual review.
The results demonstrate that:
- AI alone systematically outperforms the manual approach on each of the 31 clinical variables studied and for each of the 10 participating centers.
- AI alone notably reduces errors by half compared to manual entry (error rate of 7.0% for AI versus 14.2% for the manual method). This study being retrospective, human verification by a supervising CRA was not performed.
- AI reduces variability between participating centers compared to manual structuring. This means AI is better able to systematize the way data is collected and addresses the challenge of data homogeneity in multicenter studies.
- Most importantly, the hybrid AI/human expertise method goes even further: it combines AI extraction with targeted manual review by AI on the 30% of cases it deems most uncertain, which brings the error rate down to 4.4%, while maintaining a processing speed four times higher than strictly manual analysis.
By automating medical data extraction, AI reduces the error rate, accelerates data processing, and guarantees precision and homogeneity essential for managing large data volumes. The AI approach augmented by human expertise is faster but also more powerful and efficient. It also allows the inclusion of smaller centers, often excluded due to lack of resources, thus enriching patient diversity in multicenter studies. These advances facilitate the conduct of large-scale research projects and open promising prospects for French clinical research.
Methodology
|
Source
Next-Generation Multicenter Studies: Using Artificial Intelligence to Automatically Process Unstructured Health Records of Patients with Lung Cancer across Multiple Institutions
Annals of Oncology, online publication December 15, 2025
DOI: https://doi.org/10.1016/j.annonc.2025.12.006