Understanding Confidence Intervals
A confidence interval (CI) gives you an estimate of the certainty of a statistic.
- A CI can be constructed at any certainty level between 0% and 100%; 95% and 99% are the most common (you can’t construct 0% or 100% CI’s because you’re never completely certain and you’re never 0% certain if you have even a single data point).
- What the CI means is that for a given certainty level (whatever x%) that if you were to repeat your sampling an infinite number of times and construct CI’s for each of your samples, x% of those CI’s would contain the true population value.
- This point is sometimes difficult to conceptualize. The graph below shows 100 confidence intervals at the 95% level (each of the vertical lines) plotted from a random sample of 20 data points from a standard normal distribution [1]Standard Normal means a normal distribution with a mean of zero and an SD of one. We can see from the graph that 94 of these lines contain zero (the true population mean), and 6 of the lines do not (the ones indicated with red stars). Thus, when repeating our sampling and 95% CI construction 100 times, we find that 94 times the CI contains the true value and 6 times it does not (pretty close to 95%).
- If you construct a 95% CI for a difference between groups and the CI does not cross zero, then the result is statistically significant at the 0.05 level. For example, if you calculate a 95% for a difference in weight loss between two diets and find the 95% CI is -15 lbs to +5 lbs, this interval contains zero and is therefore not statistically significant. If we calculate the p-value it will be >0.05. If the interval was -15 lbs to -4 lbs, this interval does not contain zero and will therefore be statistically significant when we calculate the p-value.
Practical examples:
- You have a study where you sampled 100 patients on a chemotherapy regimen and found that 20% of those patients have nausea. If you were to look at ALL the patients taking that chemotherapy regimen (instead of only the 100 you sampled), what are you likely to find? Assuming the sampling was random, we can take the 20% we found in the sample and build a CI around it. For binary outcomes (yes/no) all this requires is the sample estimate (20%) and the number of observations in the sample (100). In this case, for a 95% CI we get 13% to 29% [2]using the exact Poisson binomial interval. Reporting this in a manuscript, we would say that “Our results showed that 20% (95% CI: 13% – 29%) of patients on chemotherapy had nausea.” Interpreting this we understand that our sample estimate showed 20%, and there is a 95% probability that the interval from 8.6% to 23% contains the true population value.
- The more data we have, the smaller the confidence interval will be for the same estimate. In the same example above, if we had sampled only 30 patients, our 95% CI would be 7.7% to 39%. If we had sampled 1,000 patients our 95% CI would be 18% to 23%.
- In a study looking at the difference between two groups, we might report that “Drug A had a nausea rate of 20%, while Drug B had a nausea rate of 45%, a statistically significant difference of 25% (p=0.0174, 95%CI: 11% to 32%)”. We interpret this to mean that 11% to 32% has a 95% chance of containing the true difference in populations.
References
1. | ↑ | Standard Normal means a normal distribution with a mean of zero and an SD of one. |
2. | ↑ | using the exact Poisson binomial interval |