American Society For Nutrition

Efficiency in Data Analysis

Efficiency in Data Analysis

Excellence in Nutrition Research and Practice
Posted on 06/15/2009 at 03:57:48 PM by Student Blogger

By: Matt T.

I'm a big fan of statistics. I wouldn't admit that just anywhere, but since my target audience consists of scientists and scientists-in-training, I know I'm among my nerd-peers. You are all part of my nerd herd, if you will.

With this nerdly confession out of the way, let me step up on one of my soap boxes. Occasionally a paper describes cutting-edge lab techniques capable of producing highly sophisticated data, only to then describe an unsophisticated and generic statistical model.

While I recognize my penchant for stats makes me a “special” kind of nerd, even among ASN members, I think it's important for all of us to realize that the tools we use to analyze the data are ultimately just as important as the tools we use to collect the data.

This is not to say analysis should always be complicated; sometimes the simplest model is the best. However when we place so much effort into maximizing precision from our instruments, shouldn't we do the same for our statistics? After all, we lose part of the benefit of gold-standard lab techniques if we analyze with methods that are, shall we say, inefficient?

“Efficiency” in stats-speak refers to getting as much power out the data as is possible.  An efficient model is one that is precise, or that minimizes the standard error of estimates.

It's often frustrating to find many statistics for the same thing. Which of the 20 similar tests provided by my stats program should I use? The answer depends largely on the specific assumptions you can impose, but it is also a question of efficiency. That is, other things equal, which statistic is more precise for my data, or which test will give me the smallest standard error?

In honor of my alma mater (Illinois), consider a straightforward example: I simulated a data set of 30 subjects randomized to consume all-corn or all-soy diets. We simply want to know which diet produces greater weight loss. The figures illustrate predicted values and p-values using several different analytical approaches.

Fig 1Edit


For the sake of argument, 20% of participants drop out with no differences between finishers and completers (missing completely at random), and there are no confounding variables. 

Fig2 

Even though the underlying data are the same, choosing a more efficient statistical model increases power, and depending on the data, may make the difference between a “trend” and a significant result.

Fig3

I am by no means a statistician, but as I learn, I realize more and more that statistics is a language rather than a cookbook. As we become more fluent, we become more flexible in mixing techniques to obtain the most efficient model available for a unique dataset. In turn, we obtain more precise estimates, and a sharper depiction of the world we are all trying to understand.
3 Comments
Posted Jul 06, 2009 12:31 PM by Harini Sampath
An excellent topic! One of the things that my colleagues and I often discuss is the importance of establishing statistical power. For instance, in conducting rodent studies, it is common to see cohorts of 6-10 animals per treatment. However, every once in a while, you come across a paper that uses 30+ animals in a group and shows a

Posted Jul 06, 2009 12:31 PM by M@
Yes, I agree. In performing the analyses above I was impressed with how much power was gained by the pre-post analyses vs. the endpoint only comparisons. I expected the trend, but I was surprised at what a large difference it made.

Pre-post testing in animal models is difficult for many outcomes, but it seems we can use smaller groups if we can figure out a way to measure within-animal change rather than end values only.

It's also, of course, a balancing act - the only thing worse than using more animals than necessary would be wasting animals in an underpowered and consequently inconclusive study.

Posted Jul 06, 2009 12:32 PM by BB
This is fascinating - I think computer science has some interesting answers for you too...