ARCHIVE: Summer 2018 – Special Topics Workshops in June

In response to requests during the SAS and SPSS workshops, I will be offering the following 4 Special Topics workshops in June.  If you are interested in adding R to any of these workshops, please email oacstats@uoguelph.ca to let me know.

Special Topics Workshops

  • Principal Component Analysis  – examples will be demonstrated in both SAS, SPSS, and R
  • Date: Jun 13, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Experimental Designs – Split-plot, Split-block, Split-split-plot, Latin Squares. Examples will be demonstrated in SAS and R.
  • Date: Jun 14, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Linear and Nonlinear Regression for SAS.  We will use PROC REG, PROC NLIN, PROC GLIMMIX, and possibly PROC NLMIXED.
  • Date: Jun 27, 2018  9am-12Noon
    Location: ANNU Rm 102
  • Please register here.
  • CANCELLED – Surface Analysis in SAS
  • Date: Jun 28, 2018  9am-12Noon
  • Location: ANNU Rm 102

ARCHIVE: Summer 2018: R-users and SASsy Fridays Hiatus!

Good day everyone!

I’ve been talking with a few folks about these during the summer and consensus is that everyone is busy and it will be tough to maintain attendance.  So, with that in mind, both the R-Users and SASsy Fridays sessions will be back in the Fall!

Apologies to anyone who was planning on coming.  Topics listed will be presented in September.

Have a great summer!  See you all in the Fall!
Name

R: Getting the data in, merging files, and creating new variables

PDF copy of the Getting the data in, merging files, and creating new variables notes

As we begin the coding part of the R workshop, let us try to bring things together and make it as easy as possible for you to run the code and work with me as we progress through the topics of the next 2 days.  On that note, ror each session of the R workshop, I will have an R script already prepared for you, complete with comments for every step that you will need to download and open in RStudio at the beginning of every workshop section.

Please download the following:

Getting the Data in

As with any program out there today, there are several ways to bring data into the R platform.  We will work through 2 different ways to accomplish this task, and once we’ve worked through both, I would like to hear which one you prefer!

Reading a CSV file, gets you in the habit of creating a preservation-ready format for your data, but you’ve probably already figured out, that it also means having documentation at the ready – so you remember what variable is what, and with respect to reading it into R, you need to pick and choose the location, or make sure your working directory has been set at the beginning.  Reading an Excel file, is just SOooo much easier and probably the way most of us like to work.  Just remember to save your data as you work!

Merging files

Merging files sounds like such an innocent task.  I have an Excel file with 4 monthly worksheets and all I want to do is put them all together into 1 file, so I can analyse the data.  Easy peasy right??

There are a few ways of merging files in R.  The most straightforward method is to use the merge function available in Base R.  Try it out with our data and tell me what happens when you merge the January data which has 25 observations with the February data which only has 23 observations?

So, we’ve noticed that there’s something NOT quite right with this merge.  The 2 observations that had a measure in January but not in February were not included in our final dataset. What happens if later on, say in March or April, we do have measures for these individuals?  We want them to be included.  So we need to consider other methods of merging our files.

We will use the joining functions available in the DPLYR package.  By doing this we need to take a quick little detour to remind ourselves about sets, unions, and joins?  This is the way that R takes when merging or rather joining datasets.  You’ll also see that by taking this approach we can do merge all of our data using this one function, unlike SAS and SPSS.

This is a perfect opportunity to show you the Cheatsheets in R.  In RStudio follow these steps:

  • Help
  • Cheatsheets
  • Data Transformation with DPLYR

Let’s work through the examples of Combine Tables to get a better understanding of how to merge in R.

Based on these examples, we are interested in performing a FULL_JOIN.  Did the coding in the R script work for you?  Can you see how this might work for your own research data?

Creating new variables

Creating a new variable is very straightforward function:  Ynew = Var1 + Var2 or whatever variable you need to create.  The tricky part is ensuring that it becomes a part of your dataset.  Let’s work through the examples in the R script.

Now what if we want to recode a variable rather than just creating a new one?  For example:  we want to create a new variable called wtclass that will take the weights measured in January and put them into 3 weight classes:  1 = 13-16; 2 = 17-20;  3 = 21-24

Quick recap

Getting your data into R, can be as easy as using the READXL package and importing your Excel worksheets directly into R.

Once you have your dataset in R, you can merge files using the join functions available in the DPLYR package.

Creating new variables and recoding variables is straightforward, just remember to make sure that you have added them to your R datafile by using the attach() and detach() functions.  Note there are other ways of doing this as well, this is just one.

Don’t be afraid to check out the Help resources – Cheatsheets are fun and very informative.

Name

 

SPSS Workshop: GLMM and Non-gaussian Distributions

When we collect data from our research trials, we do not always have data that is “well-behaved” or that comes from the traditional normal distribution curve.  It can have a number of distributions and with the latest statistical methodological advances, SPSS can handle some of these as well.  Your first job is to be able to recognize when a normal distribution is NOT appropriate and which distribution is an appropriate starting place.   Non-Gaussian distributions are what these are referred to.  Remember Gaussian is the same as calling it Normal.

Where do we start?  Think about your data – what is it?

  • A percentage?
  • A count?
  • A score?

How do we know that our data is not from a normal distribution?

  • Always check your residuals!
  • Remember the assumptions of your analyses?
    • Normally distributed residuals is one of them!

Let’s work with the following example.  We have another RCBD trial with 4 blocks and 4 treatments randomly assigned to each block.  There were 2 outcome measures taken:   proportion of the plot that flowered, and the number of plants in each plot at the end of the trial.

Generalized Linear Models in SPSS

Those of you that follow the terminology that is commonly used when talking about ANOVAs – Analysis of Variance – may notice that there is one term that is NOT included in this heading?  Any guesses?  MIXED.  In SAS, we talk about and work with Generalized Linear Mixed Models or GLMM (GLIMMIX), but I just realized that the MIXED part is missing here.  That’s way I could not figure out how to add a RANDOM effect to the models!

We will however, still use the RCBD data that was used in the SAS workshop.  It contains, as noted above proportion data and count data – perfect examples of non-gaussian data.

Please download the Excel spreadsheet here and load it into SPSS.

We will work through each of the outcome variables in the workshop using the following steps:

  • Analyze
  • Generalized Linear Models
    • Generalized Linear Models
    • We will work through the tabs for each variable together and discuss the outputs

Please note that if you have attended the SAS workshop and this workshop – and are trying to compare the results – they will NOT match as we are creating a different model in each program – since we cannot add a RANDOM effect in SPSS yet.  You can, however, create the same fixed effects model and the results match perfectly.

See you later in May!

Name