AI in Clinical Trials: Training Machine Learning Algorithms
How is artificial intelligence developed for clinical research, and what datasets are machine learning algorithms trained on? Read here to train AI efficiently for use in clinical trials.
What is Machine Learning?
Machine learning is a subfield of artificial intelligence that describes the ability of an artificial system to generate knowledge from experience. Algorithms learn certain patterns and rules from fed data that they apply to foreign data and cases. This is how artificial intelligence takes over complex areas of human work.
In clinical research, AI relieves departments such as data management and pharmacovigilance (drug safety), among others. Here, artificial intelligence analyses data and forwards error messages or forecasts of possible risks to the responsible employees. With the help of the algorithm, AI generates reliable data, enabling more valid study results.
How data is used to develop AI?
Before artificial intelligence delivers forecasts for potential risks using collected information, it needs a data basis to train this ability. But where does this data come from?
According to Good Clinical Practice1 regulations, clinical trial data must be archived for at least ten years. With the patient’s agreement, the anonymised patient data could be used for training artificial intelligence. National databases and big data projects could also provide datasets.
By using this information, countless data sets would come together to train the algorithm for clinical studies. In trial runs, the results of AI could be compared with this data, and the course of a real study would be simulated.
Development of AI: Why data quality matters for clinical studies
In addition to the amount of data, reliability is also essential in the development of AI. Data observation and formulation are essential for training artificial intelligence:
· What information is contained in the data?
· Is this data valid?
· For which fields of the clinical study is this information relevant?
The quality of data varies from study to study. Some datasets contain incorrect information, and information can also be missing. These must be recognized and evaluated. Otherwise, the machine learning algorithm would mistakenly identify the absence of information as a regularity.
For example, hair loss is one of the most well-known and obvious side effects of chemotherapy. In fact, hair loss is so obvious that study doctors often do not document this symptom as an adverse event. Therefore, hair loss does not appear in the data sets to be trained.
How data is prepared for use in clinical studies
Missing or incorrect information in datasets must be recognized, evaluated, and edited before being used for AI training. This includes consultation with medical experts such as medical writers or study doctors who are familiar with the respective studies. They know the common problems with collecting clinical study data.
Why this is important becomes clear during a test run: If a study course is simulated with data where hair loss was not documented as a symptom of chemotherapy, the AI responds to the missing data and sends an error message to the data manager. These messages must be followed up at the study centre. This is an additional and avoidable task that leads to delays.
The solution to the problem: In oncological studies with patients undergoing chemotherapy, hair loss could be given less weight in the algorithm. The AI recognizes the missing event but does not react.
This example demonstrates that in the development of AI, continuous monitoring and adaptation of data is essential. Only well-crafted datasets and constant feedback create efficient artificial intelligence for use in clinical trials.