So, you have some data and you want to analyze it using Proc GLIMMIX. You have some data which you’ve collected and have a few treatments which you’d like to compare. So how do you start this?
My goal is to provide steps to tackle these types of analyses, whether you are working with weed data, or animal data, or yield data. I suspect I’ll be updating this post as we clarify these steps.
First Step – your experimental design
Ah yes! Despite popular belief you DO have an experimental design! Find it or figure it out now before you go any further. Why? Because your model depends on this! Your analysis comes down to your experimental design.
Second Step – build your MODEL statement
You know what your outcome variable is, you know what your experimental design is, which means you know what factors that you’ve measured and whether they are fixed or random. So… you now know the basis of your MODEL statement and your initial RANDOM statement.
Third Step – expected distribution of your outcome variable
You already know whether your outcome variable comes from a normal distribution of not. Chances are it is not, but what is it? Check out the post on Non-Gaussian Distributions to get an idea of what distribution your outcome variable may be. Think of it as the starting point.
Add this distribution and the appropriate LINK to the end of our MODEL statement.
Fourth Step – run model and check residuals
Remember that when we run the Proc GLIMMIX – we need to check our assumptions – the residuals! How do they look? How’s the variation between your fixed effect levels? Homogeneous or not? Are the residuals evenly distributed? Are the residuals normally distributed?
Fifth Step – residuals NOT normally distributed
Is there another LINK for the DISTribution that you selected? If so, please try it.
Sixth Step – fixed treatment effects not homogeneous
Now the fun begins. To fix this one, we need to add a second RANDOM statement – essentially telling SAS that we need to it to use the variation of the individual treatment levels rather than the residual variation. As an example, a RANDOM statement, for a design that has a random block effect, would be as follows:
RANDOM _residual_ / subject = block*treatment group=treatment;
Seventh Step – try another distribution
Now – we do NOT want you trying ALL the distributions possible – this just doesn’t make sense. Remember you need to think back to the distribution possibilities for our outcome variable. Please use the link provided in Step 3 as a guide. However, one distribution I have discovered works for many situations is the lognormal distribution. At the end of your model statement you would add / DIST=lognormal LINK=identity.
Another option is to transform the data in the GLIMMIX procedure. The one transformation that researchers like is the arcsine square root transformation. To try this one please use the following code.
Proc GLIMMIX data=first;
trans = arsin(sqrt(outcome));
…
model trans = …;
…
Run;
Last Step – results will not always be perfect!
You will do the best that you can when analyzing your data. But please recognize that you may not be able to match all the assumptions everytime. Go back, review your data, review your experimental design, to ensure you have the correct proc GLIMMIX coding.
As I’ve noted earlier, as we continue to learn more about GLIMMIX this post will probably be updated to include and/or refine these steps.