La salle de presse AI combined with human expertise will transform data collection in clinical research, according to a landmark study published in Annals of Oncology

AI combined with human expertise will transform data collection in clinical research, according to a landmark study published in Annals of Oncology

HealthDiseases / CareHealthPublic health
Gustave Roussy

Gustave Roussy

Paris, December 18, 2025 – Gustave Roussy, France's and Europe's leading cancer center, and Lifen, French leader in health data intelligence, announce the results of a multicenter study on the automation of data extraction from medical reports, published in Annals of Oncology. This study demonstrates that AI enables, from highly heterogeneous data, the acceleration of large database creation in a homogeneous, reliable, and secure manner, thereby accelerating disease understanding and the development of new treatments. These results pave the way for a paradigm shift in clinical research. The study is part of the LUCC cohort (Large & Unified Cancer Cohort), a French initiative supported by France 2030 and led by Lifen and Gustave Roussy.
 

Partager cet article sur les réseaux sociaux
AI performs in extracting and structuring data from medical reports and outperforms when combined with human expertise.

In clinical research, databases are currently filled predominantly by data experts known as clinical trial technicians (TEC) and clinical research associates (CRA). This information comes mainly from patient records, medical reports, and biological or imaging examinations. However, data extraction is not simple because the majority of health records are stored in unstructured formats, written in natural language, with heterogeneous terminology, abbreviations, styles, and quality levels that differ from one center to another. Systematic verification of sensitive data entry is performed by a supervising CRA in prospective clinical trials.
 
The LUCC project aims at building large-scale clinical databases by automating the extraction of medical data via artificial intelligence (AI).
 
This retrospective study was conducted on 311 patients and 31 clinical variables (demographics, risk factors, genomic biomarkers, treatments) from 10 participating centers (public and private healthcare facilities). It compared three methods of data extraction from medical reports: the manual method where experienced clinical research associates manually extracted data via an electronic platform; the AI-automated method using artificial intelligence for automatic extraction; and the hybrid method combining AI extraction, AI targeting, and complementary manual review.
 
The results demonstrate that:
  • AI alone systematically outperforms the manual approach on each of the 31 clinical variables studied and for each of the 10 participating centers.
  • AI alone notably reduces errors by half compared to manual entry (error rate of 7.0% for AI versus 14.2% for the manual method). This study being retrospective, human verification by a supervising CRA was not performed.
  • AI reduces variability between participating centers compared to manual structuring. This means AI is better able to systematize the way data is collected and addresses the challenge of data homogeneity in multicenter studies.
  • Most importantly, the hybrid AI/human expertise method goes even further: it combines AI extraction with targeted manual review by AI on the 30% of cases it deems most uncertain, which brings the error rate down to 4.4%, while maintaining a processing speed four times higher than strictly manual analysis.
The results of this study are unprecedented. They suggest that AI can redefine the way clinical research teams work, enabling them to focus on higher value-added tasks.
 
By automating medical data extraction, AI reduces the error rate, accelerates data processing, and guarantees precision and homogeneity essential for managing large data volumes. The AI approach augmented by human expertise is faster but also more powerful and efficient. It also allows the inclusion of smaller centers, often excluded due to lack of resources, thus enriching patient diversity in multicenter studies. These advances facilitate the conduct of large-scale research projects and open promising prospects for French clinical research.
 
Methodology
  • Among the 10,000 patients in the LUCC Lung Cancer cohort, 311 patients from 10 centers were randomly selected.
  • Inclusion criteria for each patient: having at least 5 medical reports in French (excluding laboratory results, imaging, and appointments); no oncological history or concurrent cancer.
  • Three approaches were evaluated: manual extraction by clinical research professionals, automated extraction by AI, and hybrid approach combining AI and human review.
  • Each group worked on the same pseudonymized documents, following identical extraction rules and applying automatic consistency checks.
  • For each patient and each variable, the values extracted by different methods were compared.
  • Concordant values between methods were considered correct.
  • In case of divergence, a senior clinical research professional, supervised by oncologists and blind to the method used, assigned the correct value. The final value could thus differ from all initial assessments.
  • The patients concerned were informed, in accordance with current regulations. The LUCC cohort's scientific and ethics committee approved the study before it began.
  • Lifen's AI comes from fine-tuning an open-source Mistral model.
 
Source
Next-Generation Multicenter Studies: Using Artificial Intelligence to Automatically Process Unstructured Health Records of Patients with Lung Cancer across Multiple Institutions
Annals of Oncology, online publication December 15, 2025
DOI: https://doi.org/10.1016/j.annonc.2025.12.006
 
Gustave Roussy

Gustave Roussy

Contacts

Créer gratuitement votre compte pour accéder aux contacts des communicants MediaConnect

C'est parti !