Welcome to the documenting your data analysis workshop. The notes for this workshop are available here. Please note that by attending the workshop you will be creating your own version of notes.

Welcome to the documenting your data analysis workshop. The notes for this workshop are available here. Please note that by attending the workshop you will be creating your own version of notes.

Oh my!!! I updated the registration page but forgot to update the blog. Oish!!! Summer workshops are happening and they start next week!! We will be using TEAMS for the workshops. The Friday before the workshops are happening you will receive an invitation to the Meeting space. Hope to see you there!!
To register please visit https://oacstats_workshops.youcanbook.me/
A few more workshops have been added to this summer’s roster:
Tuesday, July 7: Regression in SAS: Nonlinear
Wednesday, July 8: Regression in R: Nonlinear
Tuesday, July 14: PCA and Cluster Analysis in SAS
Wednesday, July15: PCA and Cluster Analysis in R
Tuesday, July 21: GLMM using Multinomial data in SAS
Wednesday, July 22: Visualizing your analysis results in R
_______________________________________________
Tuesday, May 5: Starting your research off on the right foot. How to organize and collect your data to help make your adventure into statistics a little easier.
Wednesday, May 6: Documenting your data analysis. Whether you are planning on using R or SAS to conduct your analysis, come learn how to use R Markdown to document your syntax and output. If you’re curious what this is – check out the Workshop notes on the OACStats Blog
Tuesday, May 12: Intro to SAS
Wednesday, May 13: Intro to RStudio
Tuesday, May 19: Getting Comfortable with your data in SAS
Wednesday, May 20: Getting Comfortable with your data in R
Tuesday, May 26: ANOVA in SAS – CRD and RCBD
Wednesday, May 27: ANOVA in R – CRD and RCBD
Tuesday, June 2: Regression in SAS: Linear and Multiple regression
Wednesday, June 3: Regression in R: Linear and Multiple regression
Tuesday, June 9: ANOVA in SAS: GLMM
Wednesday, June 10: ANOVA in R: GLMM
Tuesday, June 16: Regression in SAS: Nonlinear
Wednesday, June 17: Regression in R: Nonlinear
Tuesday, June 23: ANOVA in SAS: Repeated Measures
Wednesday, June 24: ANOVA in R: Repeated Measures
Tuesday, July 7: PCA and Cluster Analysis in SAS
Wednesday, July 8: PCA and Cluster Analysis in R

Ever wondered how you can write R script, document it, run the script, and document the output – all in one file? Come join us, on February 18 in Crop Science Rm 121a starting at 9am, as we learn all about R Markdown. I’ll introduce R Markdown and then encourage everyone to use as we learn more about ANOVAs and GLMMs.
The Excel file that we will use for the second half of the workshop is downloadable here.
If you can’t make it here is a copy of the notes (created with R Markdown).
If you are a SAS user – keep an eye on this webpage for an upcoming workshop on how to use R Markdown with SAS.

For those that attended the RDM workshop on January 8, 2020, here is a copy of the slides we used as talking points. Please note that if you have specific questions regarding RDM you should contact the Library at lib.research@uoguelph.ca
Continuing on from our last blog post R vs SAS Series: Statistical Models Review – ANOVA, let’s take a look at how we need to get the data ready for our analysis.
Let’s review our statistical model.
Nitrateij = μ + trmti + eij
Where:
Nitrateij = Stem nitrate amount of the jth observation in the ith trmt
μ = Overall mean or model intercept
trmti = the effect of the ith treatment group
eij = random error or experimental error
This means that in order to run our analysis, we need to have stem nitrate measures and information about our treatments. Specifically, we need to have in our dataset a column with the nitrate measures and a second column that tells us which treatment each nitrate measure was on. You may also have a column that is an identifier – in this case Plot_ID which helps me to identify which plot the measurements were taken from. A sample data table or Excel file may look like this:
| Plot_ID | Treatment | Nitrate |
| 101 | 1 | 34.98 |
| 102 | 2 | 40.89 |
| 103 | 3 | 42.07 |
| … | … | … |
| 124 | 6 | 43.29 |
Now we need to do a little bit of background work. We’ve all heard of FIXED and RANDOM effects. These should be driven by your statistical model! In the example we are currently working with, we only have one effect: Treatment. Is it a FIXED or is it a RANDOM effect?
Let’s go back and look at some definitions and examples of these 2 terms.
Fixed effects are something you want to study – you set out the levels that you are interested in. You “fix” the levels. The results from your experiment can only talk about the levels you studied.
Random effects are factors in your design that may contribute variation in your outcome measure, but you are not interested in it. You only want to account for it, before looking at your treatment effects.
Back to our example – what do you think our Treatment effect is? If you said FIXED – you are correct!
Alrighty – so Treatment is a FIXED effect. In our dataset, we entered the Treatment levels as 1, 2, 3, 4, 5, or 6 – in other words, we used numbers. We could have used letters / alphanumeric / strings – doesn’t matter. However, using numbers we need to let our programs know that these values are not numbers that we will calculate means or manipulate in any way. They are to be used as a grouping or classification or as a factor variable. Something that tells us and the program which treatment each of our nitrate values comes from.
In SAS – we can do this very simply by including the Treatment variable in a CLASS statement. However, in R, we need to change the format of the variable to a factor. TO do this we need to use the following R script:
Treatment <- as.factor(Treatment)
We’ll see how this fits in with our ANOVA coding in the next Blog post. For know – remember:
Everything is based on that statistical model – please remember what it is for your trial
Factors in our model may be FIXED or RANDOM
In SAS we can tell the program which variables are factors by listing them in a CLASS statement.
In R, we need to use the as.factor() function to change the format of our factor variables to a factor
