AI in Clinical Studies: An Interview with Data Scientists Daniel-Timon and Paul

Created on: 09/20/2023

Their work enhances the quality and leads to a reduction of time and costs of clinical studies. In this interview, Dr. Daniel-Timon Spanka and Dr. Paul Wallbott provide valuable insight regarding their responsibilities and the use of AI in clinical research. 


You are part of the Data Science Team at Alcedis. Not everyone is familiar with that term. Please briefly describe your areas of responsibility.


My role as Product Manager Data Analytics is very diverse. A large part of my time is spent on product management of our AI-based software solutions. My goal is to create innovative software products that provide real value to users such as healthcare professionals and patients.

I am also heavily involved in the management of large IT projects – building data warehouses, structuring medical data through AI and making it usable. Apart from that, we also do research in our team. We write scientific publications and give talks at conferences.



My work varies a lot depending on the project and its phase. At the beginning of a project, I am strongly engaged in developing the use cases. Based on the team's input I try to identify suitable AI-based approaches to address the user's needs. A common understanding of the processes and technologies is crucial for an effective solution. After that, we examine datasets and pre-select technologies. During the practical implementation step, we design a proper infrastructure, process data, select and train models and finally embed the AI algorithms in the surrounding software.  


Find out here what exactly Data Scientists are and what role they play in clinical research.


What is your goal as a Data Science team in clinical research?


We develop AI solutions and software, but in all of this, the focus is on the people. We develop software that can be used by as many people as possible. One example is our AI-based software Meteor, which is used in data management and pharmacovigilance. This improves processes, thereby increasing patient safety on the one hand and improving the quality of the data collected in the clinical trial on the other. The solution is, of course, GAMP-5-validated and compliant with relevant EMA guidelines.  

Another concern for us is making medical data usable. When documenting patient data in hospitals and doctors' practices, a wide variety of software systems and file formats are used. This makes it difficult to gain insights from this data pool. We develop project-based solutions to bring together the different data, process them and make them available to authorised users. This enables medical and research teams to perform analyses that were unthinkable until recently.


What projects are you currently working on and what specific problems are you trying to solve?


Currently, one focus is on the processing of text data. We have developed various solutions here. On the one hand, incorrectly entered study data can be recognised automatically. The solution is currently being integrated into our EDC system. It works as follows: When investigators enter patient data in the system, they receive real-time feedback on whether the entered data contains errors or is expected in this context. This enables them to correct any errors immediately. By checking the data quality already in this step, we prevent additional work in the follow-up. 

On the other hand, we are working on multimodal solutions that enable the detection of erroneous data in data reviews with Meteor. We have developed both an approach that allows intuitive exploration of the data by the user and an approach that automatically identifies erroneous data. Both save our data management and pharmacovigilance teams a lot of routine work. 

Technologically, by the way, various text processing models come into play here, including large language models like ChatGPT. 


But you don't use ChatGPT itself for your applications?


No. Data protection is our top priority. However, we use the rapid development of the open source community and now have our own large language models in use. This is 100% secure because the models are hosted locally and therefore the data never leaves our IT infrastructure.

This development enables us to use this revolutionary technology in other areas as well. However, the training of such models is very costly, and their use currently still requires expensive hardware.


Tell us a little more about how AI is developed. For example, where does the data come from that is used to train the algorithms?


This is a very exciting topic, especially in Germany. Patient data naturally requires special protection, which is why we work according to the highest data protection standards. We can only process personal data if the patient has given their written consent and the sponsor (i.e. the client of the clinical trial) has also approved this. We are seeing more and more innovative sponsors wanting to participate in the development of AI solutions and asking us specifically for such technologies. The necessary concepts for data protection are then considered between all parties right from the start. 

The type of data we work with then depends heavily on the project. This can be study data, laboratory values, biometric data and increasingly also individual genomic data such as genomics and metabolomics. In recent years, patient-generated data, for example from sensors such as smartwatches that measure oxygen saturation, pulse and steps, also contribute to the data basis. Unfortunately, there are often technical hurdles with care data, especially in hospitals. Here, different data sources have to be integrated, and unstructured data has to be put into a usable form. This may also include converting handwritten doctor's letters into machine-readable data from time to time.  

Fortunately, there are occasionally freely available data sets that are suitable for some issues. Then we can get started straight away and don't have to work out the legal framework individually with patients and sponsors. 


How are models then usually trained and validated?


Training and validation of AI models usually follows a standard procedure, in which the data is divided into a training data set and a test data set. The learning of the AI model takes place on the training data set and the validation on the test data set. This procedure ensures that a model is tested on data that differs from the training data. Like in an exam: you learn with exercise problems, i.e. training data, but the exam consists of new problems you have to solve, i.e. test data. That’s how we test whether AI models have only memorised training data or have understood the underlying mechanisms. 

During validation, the model is then tested to see how well it reproduces the test data set. This usually paints a decent picture of the real performance. Furthermore, the performance is monitored on new data in live operation. In this way, it is possible to detect performance degradation of the model at an early stage. 


Certainly, it takes medical expertise to develop an AI for use in clinical trials. Do you bring this expertise with you?


We ourselves are scientists, but not medical experts. Thus, it is all the more important that we work closely with experts. These are for example our colleagues from the data management and pharmacovigilance departments, who know exactly the requirements, expected results and common challenges of the individual indications. They help us with the annotation of special entities, such as adverse events, in texts. This collaboration is crucial to obtain high-quality training data for AI models and to validate the results. Depending on the context, collaborating directly with doctors or study assistants is also helpful. 


Thanks for the little insight into your work. One final question: What do you think the future holds for data science and AI in clinical research?


It is likely that the number of AI systems in use will increase significantly. In particular, the rapid development of large language models such as ChatGPT will contribute to this, in my view. In terms of design, I expect a high proportion of hybrid human-AI systems. I see this as being mainly because the two actors, human and AI, can compensate for each other's weaknesses. However, we will also see more autonomously acting AI systems. 

Especially in clinical research and other high-risk areas, these systems must be trustworthy - fair, controllable, transparent, robust, and secure. The regulatory framework is currently being worked out. This places high demands on testing authorities and AI developers, but it is necessary. The resulting systems will offer us great opportunities in many areas and enable new developments. 



The future of data science and AI in clinical research looks extremely promising. Currently, virtually all major companies are engaged in initiatives to use AI in clinical research. This is because more and more data is available and new technologies, such as generative models, are driving development. In the medium term, I also see human expert and machine as a team, in which everyone brings their own special skills. These semi-autonomous systems can be designed in such a way that humans have the final say in decisions. This is of course important in high-risk cases, not least when it comes to the question of responsibility. 

Overall, I look forward to the benefits that this development will bring in terms of increased efficiency, patient safety and data quality. By making better use of data from medical care (real world data), AI will speed up drug development in the future. This will enable life-saving drugs to receive market approval faster and be available to patients sooner. 


Text: Alcedis - Editorial Team