To illustrate the analytic issues related to the NNT, data from a hypothetical trial were generated, involving patients with a (imaginary) serious form of iron overload syndrome (IOS), which results in liver failure in almost 50% of patients within a year of diagnosis. A total of 3,000 patients were studied in this hypothetical three-arm trial, with 1,000 randomized to a treatment called Fedom, 1,000 randomized to another treatment called Feclad, both compared with 1,000 patients given placebo, and all patients followed for 1 year or until liver failure.
While the trial was intended to follow all patients for 1 year, 60% were censored before that time. Thus, the mean follow-up was 7 months, during which there were 324 liver failures in the placebo group, compared with 230 and 238 in the Fedom and Feclad groups, respectively. Figure 1 displays the results from this trial by presenting the cumulative incidence curves (the reverse of the Kaplan–Meier curves) of liver failure for the three treatment groups over the 1-year follow-up. We now apply to the data from this trial the different techniques used by the studies with problematic results to calculate the incidence of the outcome and the corresponding NNT.
| Figure 1 Cumulative Incidence of Liver Failure for the Hypothetical Data |
NNT: Using the Simple Proportion?
Several trials have used the simple proportion of patients with the outcome to compute the NNT. Only if all patients are followed for the full study period does the simple proportion equal the cumulative incidence of the outcome at 1 year as computed by the Kaplan–Meier approach. However, when the follow-up times vary, as is generally the case with most trials, the simple proportion is not a valid estimate of the cumulative incidence and can thus lead to erroneous and misleading values of the NNT.
Table 1 shows the NNT calculations from the hypothetical trial in iron overload syndrome. First, using the cumulative incidence of liver failure after 1 year of treatment, estimated from the Kaplan–Meier curves (Figure 1), it shows that the number needed to treat for 1 year with Fedom is six to prevent one liver failure over that time, while for Feclad the corresponding number needed to be treat is 77 over 1 year. In contrast, using the simple proportion of patients with liver failure, which does not account for the varying follow-up times, leads to values of the number needed to treat for 1 year of 11 with Fedom and 12 with Feclad to prevent one liver failure over that time. The differences with the NNT properly based on the Kaplan–Meier approach are significant.
| Table 1Comparison Between NNT Computed from the Simple Proportion of Liver Failure with That from More Proper Cumulative Incidence Based on the Kaplan–Meier Approach for the Hypothetical Three-arm 1-Year Trial of 3,000 Patients with the (Imaginary) Iron Overload (more ...) |
Note that the importance of the Kaplan–Meier approach is not as crucial in studies where the follow-up is short and mostly complete, in which case the simple proportions and Kaplan–Meier cumulative incidence are practically equal.19,20 However, as the follow-up times become more variable, the simple proportion can result in distorted values of the NNT.16,17,21,22
In the trial of nucleoside analogues against herpes simplex virus type 2 (HSV-2), the NNT calculation of 38 persons with recurrent genital herpes treated with valacyclovir for 1 year to prevent one case of HSV-2 infection was also based on the simple proportion of patients with HSV-2 infection.17 It was calculated as 1/[(27/741)–(14/743)]=57 from the 240-day data, which was then extrapolated to a year by multiplying by 240/365 to arrive at the “NNT” of 38 for 1 year. Instead, the Kaplan–Meier values of cumulative incidence of HSV-2 infection at day 240 are 4.3% versus 2.1%, giving a NNT of 45 for 240 days of treatment. In the trial of zoledronic acid in premenopausal women with endocrine-responsive early breast cancer, the authors used the simple proportion of patients whose disease progressed to compute the NNT plainly as 1/[(83/904)–(54/899)]=31.16 This is despite the fact that the Kaplan–Meier curves were estimated and provided and that follow-up varied extensively between patients.
NNT: Using the Incidence Rate?
Another measure that has been used to quantify the incidence of the outcome in calculating the NNT has been the incidence rate as a way to account for varying follow-up times. The incidence rate is computed as the number of patients with the outcome divided by the total amount of person-time generated by the follow-up of the study patients. Using this, some authors have computed the NNT as the inverse of the difference between the incidence rates for the two groups under study. However, here again, this can lead to incorrect values of the NNT.
Table 2 displays the NNT calculations from the hypothetical trial using the incidence rate compared with the proper cumulative incidence estimates. Using the incidence rate of liver failure per patient per year leads to values of the number needed to treat for 1 year of six with Fedom, and five with Feclad, to prevent one liver failure over that time. Here, the contrast is particularly striking for Feclad, where the NNT was 77 at 1 year using the proper Kaplan–Meier approach.
| Table 2Comparison between NNT Computed from the Incidence Rate of Liver Failure Per Patient-Year with That from More Proper Cumulative Incidence Based on the Kaplan–Meier Approach for the Hypothetical Three-arm 1-Year Trial of 3,000 Patients with the (Imaginary) (more ...) |
Several studies have improperly used the incidence rate approach in computing the NNT.18,23–28 An example is from a trial of 1,801 frail elderly adults randomized to a hip protector or to a control group to assess the risk of hip fracture, with varying follow-up times (mean 1.1 years).23 The incidence rate of hip fracture in the hip protector group was 21.3 per 1,000 person-years compared with 46.0 in the control group. The resulting reported “number needed to treat for one year to prevent one hip fracture was 41 persons” was based on these incidence rates rather than the cumulative incidence at 1 year. The paper in fact provided the Kaplan–Meier curves for the cumulative incidence of hip fracture, which indicate a 1-year cumulative incidence of 5.0% for the hip-protector group and 2.1% for the control group, corresponding to a NNT of 35 patients needing to be treated for 1 year to prevent one hip fracture, rather than the reported 41.
Similarly, in the Collaborative Atorvastatin Diabetes Study (CARDS), the authors used incidence rates to report that “27 patients would need to be treated for 4 years to prevent one (major cardiovascular) event.”27 However, the Kaplan–Meier curves for the cumulative incidence of a major cardiovascular event result in a NNT value closer to 20 patients at 4 years.27
Lastly, the previously mentioned trial of 3,845 very elderly hypertensives randomized to a diuretic or placebo also used incidence rates to compute the NNT of 94 treated for 2 years to prevent one stroke.18 The Kaplan–Meier curves for the cumulative incidence of stroke indicate a 2-year cumulative incidence of stroke of 2.2% for diuretic treatment and 3.8% for placebo, corresponding to a NNT of 63 patients needing to be treated for 2 years to prevent one stroke, rather than the miscalculated 94 patients.
NNT: Using Meta-analyses
The meta-analysis of the 22 trials of the anticholinergic tiotropium that involved over 23,000 COPD patients reported a NNT of 16 patients “over one year” with tiotropium to prevent one exacerbation.
13 These trials involved different study durations, varying from 3 to 48 months. However, the calculation of the NNT involved all 22 trials, using the proportion of patients with an exacerbation over the pooled data, irrespective of the study duration. Indeed, 37.7% of the tiotropium patients had an exacerbation, compared with 44.2% on placebo, proportions based on a mix of short-term (3 months) and long-term (48 months) trials. Nevertheless, the reported NNT of 16 referred to the time period of treatment as “one year.”
To evaluate the NNT for 1 year of treatment, the meta-analysis could have restricted its analysis exclusively to the 1-year trials. Indeed, for the six 1-year studies, the proportion of patients with a COPD exacerbation is 37.4% of patients on tiotropium compared with 44.2% on placebo, leading to a NNT over 1 year of 15, which coincidentally is practically equal to the reported NNT of 16 based on all 22 trials of variable duration. Note, however, that NNT is 72 for the 3-month trials and 250 for the 48-month trials.15
Lastly, it is important to note that even for the 1-year trials, the follow-up times likely varied between patients, so that the simple proportions used can be inaccurate to compute the NNT, as shown in Table 1. Thus, in this case, one should seek Kaplan–Meier estimates that account for variable follow-up times. Moreover, one could also use data from the longer-term trials, the 4-year trial for instance, identifying the 1-year cumulative incidence values from the Kaplan–Meier estimates that span the 4 years of the study.
NNT in Benefit–Risk Evaluation
The NNT can also be useful in evaluating the balance between a risk and benefit of a drug. Indeed, the number of patients that need to be treated to prevent an outcome of the disease can be compared to the number needed to treat to induce a patient having a harmful side-effect. For example, the NNT was useful in weighting the benefit of inhaled corticosteroids in preventing COPD exacerbations against their risk of inducing pneumonia.
29,30 It was a particularly important question since pneumonias are much less frequent than COPD exacerbations, so that one is tempted to assess benefit versus risk simply on the basis of the frequency of these outcomes, rather than the drug effects.
29 The NNT approach revealed, however, using the net effect of inhaled corticosteroids, that the risk of inducing pneumonia may outweigh the benefit of preventing exacerbations, particularly over the longer term, even if pneumonia is much less frequent.
29,31 Of course, such use of the NNT in a benefit–risk assessment must also make sure that outcomes of similar importance are being compared, such as avoiding the comparison between mild COPD exacerbations that are easily treated in the outpatient setting versus pneumonias that require extended hospitalization and possibly intensive care.
Another example is the previously mentioned meta-analysis of low-dose aspirin use, where prevention of mortality was assessed against causing gastrointestinal bleeding.3 Using the pooled data from six trials, the number needed to treat with low-dose aspirin to prevent one death from any cause was 67, while 100 needed to be treated to induce one non-fatal gastrointestinal tract bleeding. Here again, however, this meta-analysis included trials of varying durations, which can introduce bias in the NNT when not properly accounted for, particularly if the risks or benefits vary with duration of aspirin use. Moreover, the NNT values do not refer to a specific duration of treatment with aspirin.
In contrast, an analysis pooling data from four trials to assess the benefit–risk of rivaroxaban versus enoxaparin for the prevention of venous thromboembolism (VTE) after total hip or knee arthroplasty properly used the Kaplan–Meier curves to compute the NNTs.32 These were measured at specific time points, namely 70 days after total hip surgery and 47 days after knee arthroplasty, for the benefit outcome of VTE or death, versus the risk outcome of bleeding.