DNAlytics: building a bridge between data science and healthcare

DNAlytics, a start-up based in Louvain-la-Neuve, Belgium, applies its expertise in data science and Artificial Intelligence (AI) to improve public healthcare, medical and pharmaceutical R&D and biomanufacturing. Next to their service activity, their RheumaKit solution uses AI for the early diagnosis of arthritis. Yet the successful application of AI in these areas requires much more than mastery of sophisticated data analysis techniques. Regulatory, juridical and ethical obstacles must all be overcome before AI can be applied to its full potential. Co-founder and CEO Thibault Helleputte and business development manager Damien Bertrand talk about their approach to data-driven healthcare. "The medical and healthcare sectors have come to see that data science can create value for them."

How did the story of DNAlytics begin?

Thibault Helleputte: "When I started my engineering master thesis in 2004, new technologies were entering medical and biology labs: DNA sequencing, transcriptomics and proteomics, RNA analysis, and so on. What they all have in common is that they generate enormous amounts of data. That was quite new to doctors and medical researchers who previously used to do only relatively simple statistical tests, and for whom an excel sheet was enough to analyse the data they got. When they were suddenly flooded with data on thousands of variables, new approaches were needed. During my PhD we developed a number of machine learning methods for this. In 2012, together with Professor Pierre Dupont, I decided to create DNAlytics as a spin-off company."

…this is a very powerful example of the value that advanced data science techniques can bring to medical research.”

What specific capabilities did you have at the time that were valuable and promising enough to build a new company on?

Thibault Helleputte: "We had three different things. First, we had developed a lot of know-how and a new culture: how can you make the bridge between data science engineering and healthcare. That combination wasn't very common at the time. Second, we had developed a set of reusable computer codes that serve as building blocks for the models: we still use those pieces of code today at DNAlytics. Third, which in fact came a bit later, but still early in the history of the company, we had a patent on a gene signature for diagnostic application in rheumatology. This gene sequence was identified with the use of the machine learning methods I mentioned. Of course, the patent was property of the university at the time, but we in-licensed it and developed it into our RheumaKit platform."

Arthritis is a disease that causes pain and inflammation of the joints. In the EU alone, each year 1.5 million new patients are affected with either rheumatoid arthritis or osteoarthritis, or one of the many other forms of the disease. A particular difficulty with arthritis is that in its early stages, when patients start to experience the first symptoms, for some of those patients, the doctors are not able to identify the underlying form of the disease. For those patients, that means effective treatment cannot be started. In addition, for the patients who are diagnosed with rheumatoid arthritis (RA), various drugs are available, but these only prove effective in roughly 2/3 of the cases. For the remaining third, the choice of treatment thus becomes a matter of trial-and-error, leading to wasteful medical expenses, not to mention the loss in quality of life for patients. In 2015, DNAlytics launched RheumaKit, which was specifically developed to address those issues.

What exactly is RheumaKit?

Thibault Helleputte: "It is an online solution for differential diagnosis of patients with undifferentiated arthritis. In other words: it identifies early on what type of arthritis a patient has, when their doctors are still in the dark about this. It works by doing a genomic analysis of tissue sample extracted from a patient's joint, such as the knee. The data we collect in this way is combined with other patient data, specifically clinical data provided by his or her rheumatologist. All this data is fed into our mathematical model, which then predicts what underlying form of arthritis the patient has. This functionality is available today for clinical use, and we are continuing to work on developing a second functionality for RA patients, which will indicate the most effective treatment option. In addition, with RheumaKit, doctors can also access standard disease evaluation scores and monitor the treatment response and the progression of the disease online."

How does RheumaKit perform compared to rheumatologists?

Thibault Helleputte: "It is important to realise that RheumaKit is specifically designed to disambiguate complex diagnostic cases about which rheumatologists are unsure, when a patient's symptoms are not sufficiently clear to distinguish between various forms of arthritis. So you could say that in such cases, by definition, we do better than rheumatologists. A more relevant way to evaluate the RheumaKit performance is to note that in about 90 percent of these cases, its early diagnosis was proven correct once the disease had progressed. We think that is a very high number. However, the vision is clearly not to outperform rheumatologists, but to provide them with smart tools that will support and enhance their clinical practice!"

I think so too. Does it get even better over time as more data becomes available to learn from?

Thibault Helleputte: "That is a very interesting point. Ideally, that would indeed be the case if you were able to collect feedback about the performance of your algorithm from the field. With such a feedback loop the tool should get better and better. In our situation however, the possibilities to do so are limited for various reasons. The first is technical: the very fact that RheumaKit makes a prediction will influence the way a patient is cared for, and that in turn will influence the symptoms that patient exhibits. Another reason is that the regulatory framework does not allow it. To validate a diagnostic tool and obtain, in Europe, a CE-marking, the performance of the tool must be assessed in a clinical study and then documented. After that the model has to remain fixed, you cannot make it evolve as you would in principle consider with an AI-based tool. Still, what we can do is to accumulate feedback over time, and then regularly introduce updated versions of our tool. So we can have periodic improvements in performance, not a continuous one."

You mentioned you are bridging the worlds of engineering and healthcare. How have medical experts received your solution? Do they trust and accept the recommendations from an algorithm when their own knowledge and experience proves insufficient?

Thibault Helleputte: "They are still the experts in their domain and will remain so. We are just bringing them a new tool that enables them to expand their expertise. However, there is a real interaction. When we want to build a new diagnostic tool based on machine learning, the specifications for that tool must come from the medical side: what specificity and sensitivity of the model are acceptable, what are the relevant data that we need to use to teach the algorithm, and so on. Vice versa, we have become more knowledgeable in the treatment of arthritis, so that we can discuss how our technology can best be integrated in the clinical workflow in order to give optimal results. But at the end of the day, it is still the medical specialist who decides how he or she will apply our tool."

Damien Bertrand: "The main issue with the adoption of our technology is not so much on the level of medical specialists. Our interactions with rheumatologists are generally positive. They provide us with input for our technology roadmap and recognise the value in our approach. Our main challenge is to get RheumaKit accepted into the healthcare systems of the various countries we target. As long as our solution is not established as part of standard procedure in the treatment of arthritis, doctors or patients will not be reimbursed for its use, and it becomes very difficult for doctors to include it in their routine practice."


Damien Bertrand: "To be honest, the protection of our IP has not been our biggest challenge so far. We rely on a combination of patents, copyrights and also secrecy; the complexity of what we do as well as the importance of our know-how are our main differentiators to protect us from the competition. That said, the work we did with NLO on writing the paper on 'Patenting artificial intelligence in the life sciences: a practical guide’ [Ed: read the full article here] gave us a new viewpoint. Typically our mindset has been that when we use well-known and published algorithms in our applications, there is no ground for claiming IP rights. In the future however we will probably review the entire application context more systematically, instead of only the software. The algorithms we use may not always be innovative, but when they are incorporated in a specific approach for a specific purpose, that combination may be innovative and patentable."

‘…the regulatory framework to get these innovative, AI-based technologies approved as medical devices is not straightforward…Existing regulations are not geared to such technologies yet.’

Why is it so difficult to include RheumaKit in healthcare systems?

Damien Bertrand: "There are several difficulties. A key one is that the regulatory framework to get these innovative, AI-based technologies approved as medical devices is not straightforward, to say the least. Existing regulations are not geared to such technologies yet, and approval requires a lot of back and forth with the relevant authorities."

Thibault Helleputte: "Fortunately, with new EU-level regulations on medical devices and in-vitro diagnostic tools, the path to approval is becoming clearer. It is now increasingly being accepted that software can be a medical device or a diagnostic tool. There are other challenges too. Some ethical aspects of AI-based tools are very interesting, for example with respect to responsibility for their use. There is a parallel with autonomous driving: if an autonomous vehicle causes a crash and passengers are injured, who is responsible? The same question arises with diagnostic tools driven by AI. If a diagnosis is wrong - which will inevitably occur because no diagnostic tool can be 100 percent perfect - who has the responsibility in case this leads to bad consequences? The doctor or the tool? Today, such AI-based tools provide recommendations and constitute one of many inputs a doctor has to evaluate in his final decision. In such context, the doctor remains responsible. But this might evolve in the future, for example if one imagines a scenario in which hospitals or health authorities stipulated that their doctors had to follow to the tool's recommendations, because it had been proven that these lead to better overall results. Those are very interesting questions for the future."

Another ethical aspect of AI is about the transparency of the algorithms that are used: how should humans be able to understand and control how AI generates certain outcomes?

Thibault Helleputte: "Indeed, we often get questions about the understandability or transparency of the decision processes of these tools. If an AI-based tool makes a recommendation to a doctor, is it important or not that this doctor is then able to explain how the system came to that recommendation? The first reaction of healthcare professionals is usually to say: "Yes, we need to understand." However, if you then ask them to explain how a painkiller works exactly, they can't either. I think that to a certain extent we must indeed be able to explain how our tools work. But if at some point AI-based tools have been validated through extensive clinical trials, just like any other medical solution, then it is fine not to understand the details of the decision-making process. There is also an important trade-off to be made here. Some AI tools are intrinsically easy to interpret and their decision-making processes can be tracked very easily. Others are much more complex and it is almost impossible to follow exactly how they arrive at their decisions, but in some cases these tools are much more powerful. So sometimes we will have to choose between how effective we want a solution to be versus how intelligible we want it to be."

In addition to the RheumaKit product, your business model also involves consultancy services. There are some impressive examples of consultancy projects on your website. Is there a particular project that best shows how powerful your approach to bringing data science to healthcare can be?

Damien Bertrand: "I remember a project we did for one leading pharmaceutical company in Belgium some years ago. They were running a clinical trial for an immunotherapy targeting melanoma, and were observing a very mixed response from their patient group. They essentially asked us to understand why some patients were responding to the treatment and others were not. We received a huge set of data from the company containing information on over 50,000 biomarkers for each of the 200 to 300 patients. Our job was to find the smallest set of markers that, prior to treatment, would predict if a patient would respond positively to it or not. We had to do this blindfolded so to say: labels had been removed from all of the markers and we had no idea what any of the data meant. We were able however to identify a set of 33 markers that predict treatment response very effectively. Some of the results from this collaboration gave rise to a patent application for a method to classify cancer patients as responders or non-responders to immunotherapy. I think this is a very powerful example of the value that advanced data science techniques can bring to medical research."

Thibault Helleputte: "Another area where our capabilities can be very powerful is biomanufacturing. For the more complex drugs that are on the market nowadays, the production process often involves the use of living entities like viruses or cell cultures. At that level of complexity, things become less predictable. You do the exact same thing twice, but still the results will exhibit some level of variability. Pulling all the data from a production line about raw materials, automation and quality control data, sensor data and so on, and building a multi-variate model of that production line helps you to improve yield and quality in biomanufacturing. This is a field where we are more and more active."

How can you scale up both the product and the consultancy parts of your business model?

Thibault Helleputte: "There is an ongoing clinical trial for the RheumaKit, and the future will depend on the results of that study, as well as on the possibilities for getting our solution reimbursed for patients. Our consultancy activities are growing and we are introducing products that we can re-use to leverage our capabilities in this area. As an example, the analytics tools that are used for improving bio-manufacturing are packaged into a software suite that can be easily deployed and enable ourselves and our clients to automate the data analysis."

Where do you see DNAlytics in five years?

Thibault Helleputte: "We want to grow, of course. If you look back five years and see where we are now, it is clear that the medical and healthcare sectors have come to see that data science can create value for them. Data analytics and AI will be more and more integrated into various parts of their businesses, and our ambition is to play a part in that evolution..."


How is DNAlytics involved in the fight against COVID-19?

Thibault Helleputte: "We have been supporting the Belgian health authorities in monitoring hospital capacity and demand. We have access to the data about the resources of each hospital, we know how much of those resources are being used and what the demand will be based on the expected development of the number of COVID-19 cases. With this information we created a dashboard that provides the authorities with an up-to-date view of the situation. That proved very useful for the management of the outbreak at the peak of the crisis, for example to elaborate scenarios for the optimal distribution of patients over hospitals or to support purchasing decisions for new equipment."

Where is the AI part in that?

"First, one has to realise that before you can begin to apply AI, you need to perform a huge amount of work to clean, validate and understand your data. Only then can you think of doing smarter things with the data. Our expertise in improving data quality in the healthcare environment is why we have been asked to help with monitoring COVID-19. Secondly, of course everyone wants to use AI and predictive analytics to make forecasts, but in the case of COVID-19 we were very reluctant to dive into that. Indeed, a machine learning model requires a constant and stable set of hypotheses. And in this crisis that has clearly not been the case. For example, screening and testing policies may change from one day to the next. We've seen that in Belgium, where at first only medical professionals were tested, but then it was decided to test much larger groups in the population. But if you feed the model the number of positive cases under one testing scenario, the numbers it predicts based on that will not mean much under a different scenario."

Wim Naudé has published a review article on AI versus COVID-19 in which he concludes that AI so far has not been very effective in predicting the speed and patterns of the outbreak [Ed: read the interview here]. Would you agree with that?

"Yes, for the reasons I just outlined. The best thing we can do now to answer requests for forecasts is to provide scenarios rather than exact predictions. We say: if this and this and this hypothesis is kept constant, then this is what we think will happen. And we do not go further than five or six days ahead, otherwise it would be just guessing."

Please also see “AI versus COVID-19, part I"