This summer, I’ve been reflecting quite a bit on analytics. I’ve been trying to get a better feel for what I can expect of my research students and where I can better train them. I think that my research students are now getting to the point that they can often (but not always) generate meaningful figures on their own and interpret them properly. Can I expect more? Should I expect more?

Over the summer, I’ve seen a high number of papers and blogs related to the problem in reporting p-values in published research without some estimate of an effect size (of which, my favorites include the February write-up in Nature and the very recent post on FiveThirtyEightScience). I’ll be honest: I’ve been slow to get on board with reporting effect sizes in ** my own research** and I’ve done a worse job incorporating this into the way I train my research students. In part, I’ve not done this because the simplest forms of calculating effect sizes (e.g., Cohen’s d) don’t always explicitly translate to the type of statistics that I use with my data (i.e., multiple variables that are analyzed using regression based statistics such as mixed-effects models) and obtaining, and correctly interpreting, things like beta coefficients from these types of models is difficult [at least in R]. Nonetheless, I understand and agree with the critiques associated with the search for p < 0.05… but do we properly train or students in statistical interpretation or are we facilitating the problem in the way that we teach statistics?

In a brief “survey” on this topic via Twitter and Facebook, a few [N=5; admittedly low power] of my colleagues from Allegheny and other institutions chimed in and I was happy to see that 3 of them stated that they have at least 1 lecture in which they explicitly teach basic effect sizes using Cohen’s d and/or the use of confidence intervals; 1 stated that there is an emphasis on devaluing on p-values when reported alone. However, I imagine that if I were to choose a Biology student at Allegheny and provide them with a figure and a p-value of 0.03 and asked them to interpret the figure, they would proudly exclaim, with little hesitation, that there is a significant difference between treatment x and treatment y [and I have a hunch that this type of answer would extend to Psychology students and probably Economic students as well]. I’d be very surprised if they ever questioned statistical power or even brought up anything close to a critique on the result because of the lack of an effect size.

So, for those of us who do not teach stats, how do we emphasize the importance of effect sizes with our research students? Is it something that we attempt train them with? I can’t imagine even one of my better students sitting down and running a mixed-effects statistical model, such as a zero-inflated negative binomial model, on their own. Asking them to do that plus correctly pull out the standardized regression coefficients for each predictor would be insane. Or, do we train them with the concept and provide them with a completed statistical analysis?

For those of you who do teach stats, how do you incorporate this into undergraduate level statistics courses? If you do teach effect sizes, do you integrate them into every analysis or do you show them effect sizes in one or two occasions and leave it at that? If the latter, what prevents you from training students correctly?

For those of you who are faculty who mentor our undergraduates as graduate students, what are your expectations for statistical knowledge for an incoming student? What would you recommend as focal points for statistical training at the undergraduate level?

Best wishes to all of you as you start the new academic year.

-Venesky

*UPDATE (8/27)
*

*Apparently I wasn’t the only one thinking about stats with undergrads over the summer. Joan Strassman over at Sociobiology blogged on a similar topic earlier in August that I did not see (I was largely out of the blogging scene over the summer while working with my students).*

Nice post that raises important points and concerns. I do not understand why stats is such a difficult topic for undergrad, and for that matter grad students, to grasp. There seems to be a fear, but also an over-reliance on what we were first taught. For example, I was first taught as an undergrad that p-values were the end-all be-all, and chi square, ANOVA, or t-tests were valid for most questions. This seems common as I have helped a lot of grad students, before they took a grad stats course, with stats questions and many times they want to use a t-test or ANOVA for everything.

In my opinion p-values can be useful, but as pointed out effect sizes and summary stats are just as important. What I have learned trudging through my own analyses and helping undergrads is that finding the proper test most appropriate the data in question is necessary to get a confident result. Different tests can give different results and applying the wrong method can give a spurious result. To me this is what needs to be taught to undergrads. For example, count data on animal abundances can easily be analyzed with ANOVA or Kruskal-Wallis, but a zero-inflated model is usually more appropriate. The problem is many students are not familiar with zero-inflated models and many stats programs do not provide that platform. The increased use of R is great, but it requires a knowledge of coding and stats, which may be a lot to ask for undergrads.

I have had luck explaining all the different types of tests to analyze lizard densities among plots or treatments, for example. Then we work through the process and compare the results (p-value, summary stats, and effect sizes) to see which provides the most robust results as a whole. While doing this exercise, my goal is not to teach them all the nuances of stats etc. Instead I simply try to introduce them to the multitude of methods available and how this can impact a result and interpretation. At the undergrad level I try not to overwhelm a student, because I know as an undergrad I may have become overwhelmed and discouraged. Just cracking the door to the immense statistical toolbox available seems important for their development later on. Again, my hope is that when they get to grad school and take a grad level bio-stats class they’ll be sightly more prepared than I was!

Cheers.

Mason

Thanks for the comment, Mason. I think that part of the problem is that your tookkit for understanding different types of stats is much larger than the toolkit that most undergrads have. You can look at data and jog your memory of the many types of statistical procedures. Those took time (years… think of all the projects and/or courses that you took to learn those) to acquire.

I think that your approach of teaching them that the many ways of approaching stats can impact what type of statistical values that one can produce is a very useful form. What would it look like if we built an entire class around that? I think that it could be very powerful for some yet unbearable for others.