DANIEL VAUGHAN: Stop scaring people with COVID-19 models. Focus on real data.

Statistical models are great tools that can help us understand large-scale events and how they move. They can also provide worst-case scenarios for governments and businesses, allowing them to plan and act accordingly to avoid such scenarios.

But what is happening with COVID-19 is that people are treating statistical models of the disease’s potential exponential growth as the gospel truth, as opposed to one of many ranges of possibilities.

If you’re going around quoting a statistical model, claiming everyone is going to get the virus or that millions of people are going to die — as some of the people in my social media feeds are — you’re just wrong. And you’re wrong because you’re accepting a model as reality when we know, right now, without a doubt, that every statistical model we have is incorrect.

Let’s start with the basics. For starters, as XLSTAT explains, “statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e., what generates your data) and optionally to make predictions from this approximation.” In pure form, it’s how we generate predictions for things like the outcomes of sporting events — we try to determine which teams will win or lose based on stats they’ve achieved up to that point.

A statistical model is heavily dependant on two things to make these predictions: 1) the data fed into the model, and 2) the assumptions the model makes about that data in the future. Every model has a specific amount of weight it gives to various kinds of data and different methods for how it should be treated.

Take Nate Silver and FiveThirtyEight’s current model of the Democratic primary race. Right now, it has Joe Biden with a more than 99% chance of winning the nomination. The contest is effectively over at this point, unless something extreme happens in the race. But at others times, we’ve seen “no one” leading this contest, predicting a contested convention; Bernie Sanders with a sizeable lead; and a few other candidates with plurality leads. Everything in the model changes as we get new polling data and more Democrats cast their ballots.

Now let’s jump back to COVID-19. Some people have tossed around models based on artificial intelligence that claim 2.5 billion people globally will get the disease, and 53 million will die. Of course, that is something that, even if no one did anything to stop the virus’ spread, would almost certainly not happen.

This week, others provided a more modest prediction that 1.1 million people would die, based on statistical modeling. And the media claimed that the Trump administration changed its policies this week based on this one model — ignoring that Trump had closed travel with China weeks ago.

While the media breathlessly reported on these models, actual medical professionals, like chief of the National Institute of Allergy and Infectious Diseases Dr. Anthony Fauci, are reminding them that this isn’t the case. According to The Hill:

Fauci [told ABC on Sunday] that a model is “only as good as the assumptions you put into a model.”

“The worst-case scenario is either you do nothing or your mitigation and containments don’t succeed,” he said. “So although that’s possible, it is unlikely if we do the kinds of things that we’re essentially outlining right now.”

Here’s why we know every single model is wrong right now: For one, we haven’t been testing anywhere near enough people. We do not have a firm grasp of how many infections we have across the United States right now.

We’ve only started ramping up testing this week, according to data from the COVID Tracking Project, which has attempted to figure out exactly how many people we’ve tested for this exact virus. Tracking from Wednesday to Wednesday, the project reported on March 11 that the U.S. had tested only about 7,617 people in total.

That’s not positive or negative; it’s just the total number of people we had tested up until that day.

At the end of the day on March 18, however, we had tested approximately 76,495 people. In other words, we went from testing a few hundred, maybe 1,000 people per day if we were lucky, to nearly 70,000 in one week. And even that doesn’t tell the full story, because capacity goes up every day.

From Sunday to Monday of this week, and Monday to Tuesday, we tested around 13,000 people. From Tuesday to Wednesday, we tested approximately 22,000 people. That number should continue going up — which means we’re going to finally get to the bottom of who has and doesn’t have this virus.

That deluge of data is going to radically change every model right now — because they’ll finally start running on data instead of assumptions.

Indeed, we’re now moving from a period of being able to merely guess with models to having hard data on the coronavirus outbreak. It’s easier to plan and instruct people once we have that hard data. Right now, though, every action we’re taking is about avoiding a worst-case scenario like Italy, China, and Iran have experienced. And because we’re actually taking preventative measures before getting the hard data, our end result should be better than any statistical model.

Models have their place. But people who quote them as gospel truth are doing more scaremongering than anything else.

The more testing we do, the more positive cases we will find. That’s a good thing — not something to be feared. We must define the scope to plan accordingly. Fortunately, we’re just hitting our stride as a country.