Today, I am going to be talking about the all powerful p-value. But before I continue, let me preface this by saying I am a graduate student, not a statistician and I am still in the process of learning!
Like most graduate students, I worshiped the p-value and only cared about what the p-value result was. It was honestly the only part of statistics I really understood. If I had a p-value of p<0.05, I had a fantastic and informative research study. If not, I didn’t have significant results and it would be harder to prove my research had any value.
But the fact is, none of that is actually true. It wasn’t until I started taking my current statistics course that I realized how strange it was to put such a huge amount of weight on one simple value and how one value can define if your research is valuable or not.
"For decades, the conventional p-value threshold has been 0.05, but it is extremely important to understand that this 0.05, there's nothing rigorous about it. It wasn't derived from statisticians who got together, calculated the best threshold, and then found that it is 0.05. No, it's Ronald Fisher, who basically said 'Let's use 0.05,' and he admitted that it was arbitrary." Dr. Paul Wakim Chief of the Biostatistics and Clinical Epidemiology Service at the National Institutes of Health Clinical Center
The p-value isn’t just worshipped by graduate students; however, many journals love seeing reported p-values as well, especially significant ones. Therefore, it encourages and helps this measure survive and thrive.
But what makes the p-value a poor measurement tool? First, it was developed through a tea drinking experiment (learn more at the link below). But aside from the origins, lets think about p = 0.04. This would be a significant result if evaluating on a basis of p < 0.05. But what if I get a result of p = 0.06? Why does that one hundredth of a point make a difference? In fact it really doesn’t because 0.05 is an arbitrary cut off. Instead, I am trying to move my research towards evaluating point estimates, or a “best estimate”, and confidence intervals.
For example, I recently analyzed data from a large population research study I had completed. I had between 30,000 and 300,000 cases for each test. Because I had such large sample sizes, even a 1% percentage difference was significant with a p value < 0.001. Therefore, p-values were not a very valuable tool to help me interpret my data. Instead, I used epidemiology measures with confidence intervals which helped me establish a more precise view of my data.
Unfortunately, many believe that if you get a p value above 0.05 and their research is not significant, that they wasted their time. But it is just as important to share not significant results as it is to share significant results! We need to understand what doesn’t work in order to discover what does! Check out this quote below.
"People say, 'Ugh, it's above 0.05, I wasted my time.' No, you didn't wate your time. If the research question is important, the result is important. Whatever it is." Dr. Wakim NOVA Season 45 Episode 6 Prediction by Numbers
Math is not one of my strong suits, so when I learned I would be taking a year of statistics as part of my PhD program, I was nervous. I won’t lie, this class has been a ton of work, but I also understand more statistics now than I ever have before and feel more confident in thinking through statistical design for future research studies. You too can do statistics and I highly encourage you to do so!! Until this class, I never realized how placing so much value on one number is really not a valuable way to review my results. Now having one semester of graduate statistics under my belt, I realized how much I have learned, but I still feel rusty. We need to remember, statistics is a skill and skills take time to develop. We need to give ourselves the grace to make mistakes, ask questions, and learn.