Let's break down the idea of 95% confidence and 80% power in the context of an A/B test

95% Confidence Level:

Imagine you are conducting an experiment (like testing a new website design) and want to know if the new design is better than the old one.

When you analyze the results, you use statistical methods to make a decision. But there's always some uncertainty – you can't be 100% sure.

Saying you have “95% confidence” means that if you were to repeat this experiment 100 times (each time with a new set of visitors), in about 95 of those experiments, your results would correctly show whether the new design is better, worse, or no different than the old one.

It's like saying, “I'm 95% sure that the conclusions I'm drawing from this test are correct.” But there's still a 5% chance you might be wrong; this is what we call a margin of error.

80% Power (Statistical Power):

Now, let's talk about power, specifically 80% power in this case.

Power is about the test's ability to detect a real difference when there actually is one. So, 80% power means that if there is truly a difference between the old and new website designs, your test will correctly detect this difference 80 times out of 100 trials.

Think of it like a metal detector. A metal detector with 80% power will find the metal 80% of the time if it's there. But there's a 20% chance it might miss the metal even if it's present.In your experiment, if the new design truly is better, you have an 80% chance of your test identifying this improvement. There's still a 20% chance you might miss noticing this improvement even if it's there.

In summary:

  • 95% Confidence is about how sure you are that your results are not just due to random chance.
  • 80% Power is about the likelihood of your test correctly identifying a real difference or improvement if it exists.

Both are important in determining the reliability and effectiveness of your A/B test. A high confidence level reduces the chance of false positives (thinking there is a difference when there isn't), and high power reduces the chance of false negatives (failing to detect a real difference).