Covid - death rates - a very simplistic view

New Topic
This topic has been archived, and won't accept reply postings.
 Michael Hood 24 Nov 2020

Just looking at the Gov dashboard https://coronavirus.data.gov.uk/ which gives the following for the most recent week:

     People tested +ve          219.1 per 100k population

     Deaths within 28 days    4.2 per 100k population

     Patients admitted           17.3 per 100k population

Now I know these quantities have different time lags so it's not really correct to compare them directly but if this was a steady state (i.e. the same numbers every week) then it would be ok.

So my simplistic looking at the above gives me:

     If you test +ve you've got about a 8% chance of ending up in hospital.

     If you test +ve you've got about a 2% chance of dying.

     If you end up in hospital you've got about a 25% chance of dying.

I know these don't take age, co-morbidities etc, etc, into account but as a rough and ready guide - how far out from a proper analysis of the data are they?

 balmybaldwin 24 Nov 2020
In reply to Michael Hood:

Not a million miles out I suspect, but I also would expect the numbers to look different (and higher) if you were to take account of Lag, but it would still only be a rough approximation.

If you took todays deaths, The hospitalisations from 10 days ago and the positive tests from 3 weeks ago it would be closer to reality

Post edited at 18:17
 wintertree 24 Nov 2020
In reply to Michael Hood:

The "proper" data is the "longitudinal" data - records for each individual with the date of their +ve test, the date of their hospitalisation (if there was one) and the date of their death.  All that data is presumably available to the NHS and as far as I know, they don't publish any outputs from it.  As far as I know, this dataset is not made public, although from what I can tell, an anonymised version could be so published.  Analysing this data would give the "true" case fatality rate.  

So, we're left scrabbling round making assumptions about lag from infection to death in the data that is published.  I prefer not to assume a probability distribution for the time from infection to deaths any choice is provably wrong (the distribution can be shown to change over time) and so has some sort of bias.   Instead I do the same measurement as you for every day in a period, and for a range of lags - plot below.  When cases are rising or falling, this approach under- or over- estimates the 'true" fatality rate depending on the lag and what the cases are doing.  All the curves converge on the right - this is because we have had a plateau phase of cases and then a corresponding plateau of deaths, where the rates are ~constant with time.  This means that the measurement is no longer sensitive to lag and so the lag "drops out" of the analysis.   So, from this plot I could say with some certainty that the CFR around 13th November was about 2%.  

How close is this 2% to the actual CFR?  I think the "plateau" phase means that these simple approaches are currently quite accurate.  But notice how none of the lines remains on 2% throughout the time period - this suggests either the CFR is changing or the distribution of times from test to death is changing; you can infer this from the demographic data on cases having ages changing and what's known about IRF vs age - so I don't think we can get better estimates of CFR without making a bunch of dodgy assumptions.

I also put a similar plot for IFR below that uses the ONS random sapling data as a measure of true infection levels.  To my shame they're not identically laid out so flip booking between them isn't so satisfying.  I'm scratching my head about why the CFR is tailing off on the right hand side of my CFR plot; I think it's probably because I'm using too-recent datapoint subject to reporting lag.  It could also be a sign of test-and-trace catching up to real infections as it improves.

Edit:  So a lot of work and words to get to the same place your quick measurement does; but when I gave such a quick measurement for IFR the other day I was snootily informed that my methodology was not anywhere near as good as that of the CEBM.  Hint:  The CEBM method was not good.  Not good at all.

Post edited at 18:48

OP Michael Hood 24 Nov 2020
In reply to wintertree:

Again me being simplistic and just checking I'm understanding things correctly - looking at your plots the IFR is very approx about half the CFR - does that imply that they're only detecting (i.e. getting +ve test results) about 50% of cases?

Presumably the "I" in IFR is calculated from the random testing (pillar 4?) done across the UK's demographic (or is it only England) to estimate the total number with Covid.

 wintertree 24 Nov 2020
In reply to Michael Hood:

Sorry my CFR plot is UK level and the IFR plot is England only indeed using the random sampling data from the ONS(pillar 4 I think?).  The ONS data does indeed suggest testing is getting about half the cases.

I made a plot dividing a trend line through the ONS England data into the cases data for England to give an idea of how that ratio changes over time.


 MG 24 Nov 2020
In reply to wintertree:

Excellent work with the data, as always.

Can you expand on the IFR plot? Why the wild swings, and why does the 22 day rate cross the 7 day on to the right?

 wintertree 24 Nov 2020
In reply to MG:

Wild swings - see the plot below - ONS infection data ± 1 standard error (Estimates as 1/2 the 95% CI) and deaths both for England.

  • The initial big decrease on the far left is probably because the trend line interpolation isn't accurate around the first ONS data point and is too low - I should type up some more ONS datapoint or truncate my plot a couple more weeks in.
  • There's a dramatic up-tick in the infection rate around 09-27. For lag times that are too-long, this means deaths start rising before the lagged infections do, so you get big peaks that then subside.  The smaller the lag, the sooner and lower the peak.  That's the main effect.  It's an artefact of 

22-day crossing 7-day - in a phase of rising infections, a too-large lag will over-estimate the IFR as it divides deaths by a too-small (over-lagged) infection number.  In a falling phase, the same too-large lag will under-estimate the IFR as it divides deaths by a too-large (over-lagged) infection number.  The opposite is true for a too-small lag which under-estimates the IFR in a rising phase and over-estimates it in a falling phase.  So, when infection then deaths tip over from growth to decay, which lags over- and under-estimate swap around meaning the lines cross.  It's a really useful moment as the most-horizontal line indicates the most accurate lag *at that point in time* - it's not generally applicable as the lag (really the full distribution of lag times over all the people) is in constant flux with the demographic shifts going on.  Think of the Lissajous oval for two sine waves out of phase...  

Edit: Here's a noddy model of a sinusoidal peak in infections and deaths with a lag of 16 days, and my analysis applied to it.   It you look at where the various curves cross each other on the IFR plot it's quite similar to the analysis of the real data.   You might find it makes more sense if you think in terms of a phase relationship and Lissajous figures.  I've done another plot comparing them; it makes me think a lag of around 16 days is pretty accurate as the crossing of all the different line-pairs is in the same precedence in X and Y.  This is lag from the mid-point of the ONS sample weeks which is a slightly nebulous concept.

Post edited at 20:40


New Topic
This topic has been archived, and won't accept reply postings.
Loading Notifications...