milan batista
Estimation of coronavirus COVID-19 epidemic size by the logistic model


Updated 14 Apr 2020

Editor's Note: This file was selected as MATLAB Central Pick of the Week

The function fitVirus03 implements a logistic model for estimation of epidemy final size from daily predictions. The model is data-driven, so its forecast is as good as data are. Also, it is assumed that the model is a reasonable description of the one-stage epidemic. If however, the epidemic evolves to the second phase the model becomes useless. The model is also useless to the initial epidemic phase.

The contribute contains data for coronavirus for Austria, Belgium, China, Croatia, Denmark, Germany, Hungary, France, Iran, Italy, Lombardia, Norway, Netherlands, NY State, Portugal, Slovenia, South Korea, Spain, Switzerland, UK, USA and data for outside of China (up to 24.Mar.2020)

The regression convergence may fail for a pure initial guess or small data set. Therefore the method does not apply to the early stages of an epidemic. Also, results are useless if regression statistic does not meet minimum criteria, say R^2 > 0.8, p-value < 0.05.

On the epidemy evaluation graph regions colors separate epidemy phases (these are not standard but arbitrarily chosen for convenience):
red - fast growth phase
yellow - transition to steady-state phase
green - ending phase (plateau stage)

The second figure produced is the evaluation of daily epidemy size. If these values do not converge to a constant then epidemic is probably not yet stable.

A more detailed description can be found in
Examples can be found in

A new version based on SIR model is available at

Data for other countries are available from

DISCLAIMER. Software and data are for education and not for medical or commercial use.

Comments and Ratings

Rock Interpreter

Hi Milan!
First of all, I must appreciate and congratulate you for the codes. I have a little doubt regarding SIR model. As you said, SIR works better than the logistic model and more robust. I tried with your SIR model. But I find difficulties to understand the total population number N, which appears to be very small (about 6,26,000) while analyzing for India. Since India has a huge population (1.35 billion), how to implement SIR model? Your comment will be highly appreciated. Look forward for your reply.
Shib G

Roberto Parente

Yusuf Kursat Tuncel

Jeta Statovci

Can you add the Kosovo data?

roberto fragoso

Thank you very much update my programs and fixed the failure.

milan batista

Roberto, do you have the Statistical toolbox installed?

roberto fragoso

Hello, I have a problem trying to run the fitVirus03 function and Matlab presents me with this error:

>> fitVirus03(@getDataGermany);
**** Estimation of epidemy size for Germany
Initial guess K = 126123 r = 0.287012 A = 18645
Error using optimoptions (line 105)
'SpecifyObjectiveGradient' is not an option for LSQCURVEFIT.
A list of options can be found on the LSQCURVEFIT documentation page.

Error in fitVirus03 (line 50)
opts = optimoptions('lsqcurvefit','Display','off',...

They can help me know I'm doing wrong.
Thank you

David Franco

Thank you!

David Franco

Please, update this code with the graphics from fitVirusCOVID19.

David Franco

Thank you!

lue mark


Great, but the modell would have a much bigger impact if it would run in GNU Octave too (i.e. optimoptions and nested functions need compatible version).

Ricardo Pinheiro

Does anyone tried to port the Matlab code to other solution, like GNU Octave?

I sent do Mr. Batista the data from Brazil, so he can add to the report.

milan batista

To all. The SIR model version has improved convergence and initial guess calculation. I think it works better than the logistic model, nevertheless, it is more robust.

milan batista

Dear Claudio, Thanks for your suggestion. Please, keep in mind that the logistics model is very simple. Daily forecasts can be very good. My forecast for Slovenia was a few percents by March 19th. But on that day, we had a local outbreak (jump). After such an event, the forecasts are useless for a few days because the daily predicted values are below the actual ones. This situation changed in a few days (as in Chana Feb 12). The SIR model has a similar problem.

Adam Hepworth

Claudio Gelmi

Dear Milan, I have been using your function in Chile, and for the last three days, the predictions are quite good. I added a 95% confidence interval for the "next day" prediction. Since you are already using the SML Toolbox, it may be useful for more users. Here are my lines of code:

[betaNL,RNL,JNL] = nlinfit(samplaTime(1:n),sampleC(1:n),@fun,coef);
[Ypred,delta] = nlpredci(@fun,[samplaTime(end)+1]',betaNL,RNL,'Jacobian',JNL);
T = table(samplaTime(end)+2,round(Ypred),round(delta),'VariableNames',{'Day','Prediction','CI'})

Thanks again for sharing.

Diego Roldan

Very cool!

Ivo L

Great job Milan. For Portugal I suggest to check this source (Portugal's health department):

Sebastian Hölz

'fitnlm' requires Statistics and Machine Learning Toolbox, you should update the requirements.

Joshua McGee

For an updated version with condensed code (all in a single .m file) and automatic data retrieval for COVID-19 and each country:

Joshua McGee

Great job milan!

milan batista

Hi, what do you mean by your last sentence?



First of all thank you for the Matlab model. It seems to work perfectly. I updated with Portugal cases and it appears to be predicting perfectly also. How can I update the portuguese numbers?

Andrea Augello

milan batista

Prof. Rolf Boelens provides the data and scripts for Netherland and USA.

Morgan Evans

Excellent model. Use it everyday. Thank you to the hard work. Any idea when we can expect a USA model?

milan batista

Thank you. The intended goal of the program is to help people evaluate when an epidemic will be over and to estimate if the measures are effective. For now, I publish daily reports at the web address above.

Thank you for the update! Great work!
Can we expect new graphs every other day?
Here or somewhere else?

Maurice Politis

Mike Rudolph

Morgan Evans

Peter Graat

Nice, but requires Optimization Toolbox


I tried the model updating the italian data. Good job. Thanks for sharing.

milan batista

Successive regressions use MATLAB function lsqcurvefit which has no statistics output. So another fit is made with MATLAB function fitnlm. The results may differ (for the small data set) - I don't know why - therefore the warning just to remind one to be careful with the interpretation of results.

James Gana

Absolutely incredible job and model, just trying a couple of countries through it right now. In Germany, although the regression model seems to fit, I get the following message: "***Warning: results of lsqcurvefit and fitnlm differs significantly.
Knlm/Klsq = 358.476
rnlm/rlsq = 0.998951
Anlm/Alsq = 357.319"

I cannot understand the root cause, as the initial guess is succesfull...

milan batista

I have no experience with Github, but I will try to do what you are suggested.

Christian Schröder

@milan I was thinking of perhaps putting the code on Github/Gitlab so others could send pull requests etc.

Claudio Gelmi

Nice job Milan! Thanks for sharing.

milan batista

They can make their own MATLAB contribution and freely add fitVirus script. What do you suggest?

Christian Schröder

This is an excellent idea, and a great opportunity for students to learn a bit about both MATLAB and statistics.

What's the best way of contributing data files for other countries?

Gianmarco Zonta



