Covid-19: AI model by Devoteam highlights the efficiency of large-scale Covid testing strategy

08 May 2020

Covid-19: AI model by Devoteam data scientists, based on a scenario without lockdown, reveals 1) 59M cases in Germany, with 290 000 deaths, 2) 46M cases in France, with 800 000 deaths; and highlights the efficiency of large-scale Covid testing strategy.

The differences in terms of the number of deaths between France and Germany continue to fuel debates on how to manage the crisis: more than 25,000 deaths in France compared to just over 7,000 in Germany.

Devoteam’s data scientists have modeled the spread of the epidemic in France and Germany, studying scenarios without or with targeted lockdown. The results show that Germany is doing much better, in particular thanks to its strategy of large-scale testing and early detection, with more than 300,000 tests performed by week.

The real gap observed today is confirmed by the IA Devoteam modelling in a scenario without lockdown, which highlights the differences of the adapted government’s strategy.

Inventory of the situation without / with total or targeted containment according to Devoteam IA modelling:

NB: the number of actual cases of current contamination [12] is more accurate for Germany than for France since it carries out more tests per week. Indeed, the AI model predicts that the number of real cases would be 9 times higher than the number of confirmed cases in France, and only 1.8 times higher in Germany.

According to the IA modeling performed by Devoteam’s data scientists, in a context without lockdown, Germany would have had more cases of contamination, i.e. more than 59 million (out of 83 million inhabitants) compared to more than 46 million in France (out of 67 million inhabitants). However, the number of deaths would have been almost three times lower, with more than 290,000 deaths on German soil compared to more than 800,000 deaths on French territory. 

Germany, thanks to its great capacity to carry out screening tests and thus early detection, is stricter on the isolation of people potentially carrying the virus, even asymptomatic, which considerably reduces its speed of spread [7]. The number of contaminations predicted in the unconfined situation is not proportional to the percentage of the population of the country in question, but impacted by the parameters of the AI model, the most important of which are the speed of spread of the virus [8] in each country and the circulation of individuals.  

Methodology and data sources 

In this study, Devoteam’s AI teams used data officially communicated by the French government, the Pasteur Institute, the Robert Koch Institute, hospital and EHPAD data, open data (INSEE and the German Federal Statistical Office – Statistisches Bundesamt in particular [9])[1] and data from the primary health insurance fund. They were combined with data on public transport travel and food purchasing habits: sources of information that epidemiologists do not generally use, but which are essential for calculating the probabilities of spread. 

The analysis was based on several statistical tests on these data and used Machine Learning approaches to predict certain situations.

It should be noted that the approach used is based purely on smart data and trained models on weighted parameters, which differs from traditional statistical methods based on population samples. 

As for the model that predicts the number of deaths, it combines several characteristics, the most impacting of which are the capacity to care for severely affected people (number of respiratory beds), the speed of resuscitation unit resupply, and especially the early management of at-risk patients [11] from the first days of incubation of the virus [8]. This explains the low number of deaths in Germany compared to France, since Germany, with 300,000 tests per week compared to 35,000 in France, is able to identify positive individuals very quickly (even without symptoms) and therefore to organise rapid and effective care from the outset. 

The approach used to estimate the actual number of cases is based on the SEMMA methodology (sample, explore, modify, model, assess). The data scientists took into consideration the number of contaminations estimated per day since the beginning of the epidemic (source: Institut Pasteur) and the variance of the mortality rate announced by the different scientific communities around the world, including the Institut Pasteur (the mortality rate varies from 1 to 2% depending on the studies).

Devoteam’s data scientists used Machine Learning algorithms such as linear regression combined with internally developed approaches such as UCB (upper confidence bound) to find the estimated number of contaminations per region and ultimately deduce the total number of contaminations in France.   

The approach is therefore different from the classical parametric statistical approach. The Machine Learning model used for predictions by Devoteam is an adaptive model, i.e. it adjusts its metrics (features) according to different age groups [10], different regions and different behaviours of the inhabitants per region. Consequently, the model is quite intelligent in adapting the mortality rate according to the population and features by region.

The estimation AI defined by Devoteam has been trained on INSEE [1] [5], Statista [3] [4], planetoscope [2] and Wuhan laboratory [6] data. It embeds hybrid predictive models composed of classical decision tree and regression models combined with conditional symbolic AI under constraints.  Concerning the data sets used for in training, data augmentation techniques have been applied to some data to provide sufficient data for training.

In addition, Devoteam has validated the robustness of the AI on interim results until early April. Indeed, the predicted results of the number of deaths were compared to the actual number of deaths of the current situation, which gave results very close to the current situation. The accuracy of the predictions ranged from 90.3% to 96.7% depending on the date. The data scientists then constructed what is called a confusion matrix to validate the quality of the ML model.

Bibliographie 

[1] Number of employees in France – https://www.insee.fr/fr/statistiques/4466574 – 

[2] Number of people taking public transport per day (every second, 77 people take urban public transport in France) https://www.planetoscope.com/Mobilite/1334-nombre-de-trajets-en-transports-en-commun-en-france.htm

[3] Number of people in shopping centres and markets per day  https://fr.statista.com/statistiques/529501/centres-commerciaux-francais-nombre-visiteurs-annuel/

[4]Number of people in emergencies per day https://fr.statista.com/themes/3462/les-urgences-medicales-en-france/

[5] Pollution rate https://www.insee.fr/fr/statistiques/4160040

[6]Clinical characteristics of 138 patients at hospitals infected with Covid pneumonia in Wuhan, China

  • Virus penetration rate 
  • Lifetime of the virus on obstacles 
  • The speed of transmission of the virus 

[7]https://www.unibe.ch/aktuell/medien/media_relations/medienmitteilungen/2020/medienmitteilungen_2020/coronavirus_berner_forscher_berechnen_die_ausbreitung/index_ger.html

[8]https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/2020-05-06-en.pdf?__blob=publicationFile

[9]https://www.destatis.de/EN/Themes/Society-Environment/Population/Current-Population/_node.html

[10]https://www.destatis.de/EN/Themes/Society-Environment/Health/Causes-Death/Tables/deaths-cardiovascular-disease-total.html

[11]https://de.statista.com/statistik/daten/studie/707617/umfrage/umfrage-zur-verbreitung-von-chronischen-krankheiten-in-deutschland/

[12]https://www.spiegel.de/consent-a-?targetUrl=https%3A%2F%2Fwww.spiegel.de%2Fwissenschaft%2Fmedizin%2Fcoronavirus-infizierte-genesene-tote-alle-live-daten-a-242d71d5-554b-47b6-969a-cd920e8821f1

 

 

devoteam