Writing about your Experimental Design and Statistical Analysis for Publications

Reproducibility

All experiments and trials should be reproducible!  No questions about it.  A reader should be able to read your materials and methods section of your report and/or your journal publications and be able to replicate your experiment or trial.  Think of this as the recipe for your trial.  You can read and follow recipe instructions, as you should also be able to follow the materials and methods section of a trial and replicate it.

But this isn’t always true in publications.  There are a number of thoughts as to why this may happen:

  • word limitations – should the words be used in the materials and methods, or should we reserve them for our Results and Discussions?
  • uncomfortable talking about the statistical analysis – lack of knowledge or too much knowledge?
  • lack of confidence – so lets skip it, nobody is going to read it?

Examples – can you replicate the following studies?

Read through the Materials and Methods section of these papers and decide whether you have enough information to replicate the study.

  1. Mist Blowing versus Other Methods of Foliar Spraying for Hardwood Control (1968)
  2. Seedling year management of Alfalfa-grass mixtures established without a companion crop (1969)
  3. Leaf and stem nutritive value of timothy cultivars differing in maturity (1996)

Goals of writing:

Our readers should be able to determine:

  1. that the analysis fits the objective stated for your experiment or trial
  2. that your research methodology and data collection processes match the analyses
  3. that your data management processes ensure data quality

Checklist:

  • Objectives clearly stated
  • Materials and Methods:
    • Identify your target population
    • State your treatment effects
    • Identify how you selected your experimental units and your sample size
    • State your experimental design, be sure to include number of replicates
    • Identify your analysis variables:
      • Are you using your raw data?
      • Are you using derived variables?  If so, what are they?
      • Are you using transformed variables?  If so, how and why were they transformed?
    • State your statistical model – don’t be shy about this!
    • State your method of analysis – if you say something like “model was produced using stepwise regression” – this is a flag for a statistical reviewer!!  Provide your reader with clear directions
    • State your p-value
    • State the software that was used.

Additional items to consider and add to your description

  • If you have several trials, consider combining them rather than reporting several single trials.
    • Remember that you must account for your error variances if they are heterogenous.
  • If the data was transformed for your statistical analysis, back transform your results for presentation in the publication
  • When your Null hypothesis is NOT rejected (in other words, when there are no differences observed), report the observed p-value.  As an example p=0.35
  • No Significant difference is correct and accepted in publications.  Do NOT say the difference was non-significant
  • Report means with their standard errors.

Can you replicate the following trial?

Control of glyphosphate-resitant Canad fleabane in soybean with preplant herbicides (2017)

 

Data Visualization: The Graph

Before we start on the adventure of creating different visualizations, let’s continue to talk about forms of visualization.  Last time we talked about the Table, now let’s take a similar approach and discuss the Graph or the Chart.

Why use a Graph / Chart?

First let me note that I will more than likely use the words graph and chart interchangeably.

  • When you want to communicate a trend or a pattern in your data
  • Graphs/Charts are more interesting than a table of numbers – let’s be honest here
  • Your reader will tend to remember a graph / chart more readily than a table

Types of Charts

I have been asked on many an occasion, what type of chart should I use to represent my data?  There are so many different types of charts, and which one you choose will always depend on what you are trying to communicate to your reader.  Here is a partial list of the different types of charts:

  • Line chart
  • Bar chart
  • Histogram
  • Scatterplot
  • Pie chart
  • Bubble scatterplot
  • Heat chart
  • Area chart
  • Box and whisker chart
  • Radar chart
  • …..

Anatomy of a Chart

Similar to a table, a chart should be able to stand on its own.  A Title, legend, and footnotes, should provide the reader with enough information that they be able to interpret the graph as it was meant to be.  There are many documents available in textbooks and online to provide you with a guideline on the proper construction of a chart, but I will highlight one.

However, let’s first start with the title:

  • It should be clear and concise
  • Also known as the HEADING – according to the Operations Manual for the Canadian Journal of Plant Science, Canadian Journal of Soil Science, and the Canadian Journal of Animal Science.
  • Capitalize the heading in sentence format with no period at the end
  • Do not indent the second and any subsequent lines
  • No units of measure in the title

From the Designing Science book http://dx.doi.org/10.1016/B978-0-12-385969-3.00008-8   you can find a wonderful diagram on the Anatomy of the Chart in Figure 1.  Highlighting the structure of a chart which may include the following items:

  • y-axis
  • y-axis label
  • x-axis
  • x-axis label
  • key to symbols used in the chart
  • Statistical symbols
  • Major ticks
  • Minor ticks
  • Error bars
  • Symbols on the chart

Depending on the type of chart you will be creating all or some of the above features will be extremely important.

Best Practices for creating a Chart

  • 2D chart is always better and easier to understand than a 3D chart
  • Background colour – keep it simple – white works great or the colour of your presentation
  • Axes colour – Use the highest contrast colour – black?
  • Data colour – keep group colours consistent – distinct from each other
  • Gridlines – only if you need them – they can be very distracting
  • Font – use sans serif type fonts.  Helvetica is recommended
  • Significance – use *, **, *** – consistent
  • Error/variability – do not clutter your chart
  • Ticks on your scales – use a natural count

Selecting a Graph or a Table?

When should you use which one?

  • Remember a table is appealing to the “reader”
  • If you need your audience to remember a number or if you need to highlight a number/value – a table may be best
  • Graphs / charts – may have that impact that tables do not.
  • Each has their place and purpose

Examples of Charts

While reviewing these examples of different types of charts published in journals and on the web – think about the best practices listed above.  Do all of these examples follow them?  Are they easily read?  Do they stand alone?

When you embark on creating your own table and/or chart, think about these guidelines and best practices.

Name

Tackling an analysis using GLIMMIX

So, you have some data and you want to analyze it using Proc GLIMMIX.  You have some data which you’ve collected and have a few treatments which you’d like to compare.  So how do you start this?

My goal is to provide steps to tackle these types of analyses, whether you are working with weed data, or animal data, or yield data.  I suspect I’ll be updating this post as we clarify these steps.

First Step – your experimental design

Ah yes!  Despite popular belief you DO have an experimental design!  Find it or figure it out now before you go any further.  Why?  Because your model depends on this!  Your analysis comes down to your experimental design.

Second Step – build your MODEL statement

You know what your outcome variable is, you know what your experimental design is, which means you know what factors that you’ve measured and whether they are fixed or random.  So…  you now know the basis of your MODEL statement and your initial RANDOM statement.

Third Step – expected distribution of your outcome variable

You already know whether your outcome variable comes from a normal distribution of not.  Chances are it is not, but what is it?  Check out the post on Non-Gaussian Distributions to get an idea of what distribution your outcome variable may be.  Think of it as the starting point.

Add this distribution and the appropriate LINK to the end of our MODEL statement.

Fourth Step – run model and check residuals

Remember that when we run the Proc GLIMMIX – we need to check our assumptions – the residuals!  How do they look?  How’s the variation between your fixed effect levels?  Homogeneous or not?  Are the residuals evenly distributed?  Are the residuals normally distributed?

Fifth Step – residuals NOT normally distributed

Is there another LINK for the DISTribution that you selected?  If so, please try it.

Sixth Step – fixed treatment effects not homogeneous

Now the fun begins.  To fix this one, we need to add a second RANDOM statement – essentially telling SAS that we need to it to use the variation of the individual treatment levels rather than the residual variation.  As an example, a RANDOM statement, for a design that has a random block effect, would be as follows:

RANDOM _residual_ / subject = block*treatment group=treatment;

Seventh Step – try another distribution

Now – we do NOT want you trying ALL the distributions possible – this just doesn’t make sense.  Remember you need to think back to the distribution possibilities for our outcome variable.  Please use the link provided in Step 3 as a guide.  However, one distribution I have discovered works for many situations is the lognormal distribution.  At the end of your model statement you would add / DIST=lognormal LINK=identity.

Another option is to transform the data in the GLIMMIX procedure.  The one transformation that researchers like is the arcsine square root transformation.  To try this one please use the following code.

Proc GLIMMIX data=first;
trans = arsin(sqrt(outcome));

model trans = …;

Run;

Last Step – results will not always be perfect!

You will do the best that you can when analyzing your data.  But please recognize that you may not be able to match all the assumptions everytime.  Go back, review your data, review your experimental design, to ensure you have the correct proc GLIMMIX coding.

As I’ve noted earlier, as we continue to learn more about GLIMMIX this post will probably be updated to include and/or refine these steps.

Name