Covid postscript

Just a few comments to clarify complex ideas regarding “variances” in limited dependent variable models.

When I say “mountains are flattened”, I mean disease severity is reduced among that subgroup who were previously “hit hard”, but the burden of disease is spread more evenly across everyone. So in ten parallel universes, instead of the same 10% of people ALWAYS getting very ill, 90% of people will get somewhat ill. The particular 90% varies in each universe. You personally are no longer “assured” of getting (or not getting) covid. The variance goes up across the population. Though down for a lot of individuals who previously would have had the same outcome 10 out of 10 times.

Identifying people with zero variance is important. These people are deterministic, not probabilistic. They can’t be in a biostats (logit/probit) model. You just described them qualitatively according to what determines their disease status. Don’t attempt to “include them thinking that they just boost sample sizes and improve precision”. It’s like saying “I’m going to take an average of a bunch of numbers that includes infinity”. Dumb Dumb Dumb. These people, annoying though they are for logit/probit models, are actually useful in policy if you can find WHY they do/don’t get ill.

The KEY factor here is variance heterogeneity. If one group of people, in 100 parallel universes, experience 40 cases, but the SAME 40 people in all 100 universes, then they CANNOT be aggregated with another group of people who, in 100 parallel universes, experience on average 40 cases, but the 40 cases varies immensely across the 100 universes. Any universe from the first hundred has “consistency”. And universe from the second hundred has extreme inconsistency. Aggregating them can’t be done. It doesn’t, UNLIKE A LINEAR MODEL, just mess with standard errors. It causes BIAS.