ARCHIVE: S19 R Workshop

To complete the contents of the day-long R workshop offered on June 11, 2019, we will work through the following sessions:

  1. R Workshop:  Introduction to R and Definitions
  2. R Workshop:  Introduction to RStudio and R packages
  3. R Workshop:  Cleaning and Tidying your data
  4. R Workshop:  Getting the data in, merging files, and creating new variables
  5. R Workshop:  Getting comfortable with your data:  Descriptive statistics, Normality, and Plotting
  6. R Workshop:  ANOVA / Partitioning of Variance with an RCBD

Labels in SAS – Variable and Value

Adding variable labels

Do you know what group, trmt represent?  We can probably guess what age, height, and eye_colour mean, but would you know what units age and height were measured in?  Without a codebook or information, such as labels for the variables and value labels for the variable values, you would be guessing!

In SAS, and with many other statistical programs, you can add both a variable label and value labels.

Whenever you work with the data, you need to be working in a DATA step.  Drawing parallels to Excel, you will need to open a new dataset or excel worksheet, make the changes and then save it.  In SAS, you will create a new DATA Step, make the changes to the variable(s), and save it.

Data tuesday_new;
  set tuesday;        * this tells SAS that you want to use the dataset called tuesday that you                                    created earlier;
label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;
Run;

To view these changes, try a Proc print – what happens??

Try the following:

Proc Contents data=tuesday_new;
Run;

What do you see?

Adding Value labels

Sometimes you will collect variables that are coded.  Rather than writing Blue eyes, brown eyes, you might provide them with a code such as 1,2, etc…  But how do you remember what code you gave what value?  Writing it down on a piece of paper is fine, but what if you misplace that paper?  Adding value labels to your data is a great way to keep all the information together.

To accomplish this in SAS, it is a 2-step process.  We need to create the codes and their labels first, and then we need to apply these to the variables in the dataset.  This allows you to re-use the labels.

CREATING THE VALUE LABELS

Proc format;
  value $groupformat
                a = “Group A – Monday morning”
                b = “Group B – Monday afternoon”
                c = “Group C – Tuesday morning”
                d = “Group D – Tuesday afternoon”;

  value trmtformat
               1 = “Treatment 1 – Placebo”
               2 = “Treatment 2 – Vitamin C”;
Run;

This creates SAS formats.  One called groupformat and another called trmt format.  Think of these as boxes that say a represents Group A – Monday morning, etc..

APPLYING THE VALUE FORMATS TO THE DATA

Remember that we are touching the data or making changes to the data, so we need to use a Data Step.  Let’s re-use the one where we added variable labels:

Data tuesday_new;
  set tuesday;       

label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;

format
  group groupformat.
  trmt trmtformat.

Run;

ARCHIVE: S19 Workshops

A couple of workshops are now available for booking.  I will be hosting 2  1-day long workshops in June.  June 4 will be a 1-day SAS workshop followed by a 1-day R workshop on June 11.  The workshops will be held in ANNU Rm 102 starting at 9am and ending the latest by 4pm.

Please register for the one(s) you would like to attend by visiting https://oacstats_workshops.youcanbook.me/.    Please note you will need to bring a laptop with the software already installed.  If you do not have the software, you may watch the demos – however, I will not be able to help you with any software installations.

June 4 – SAS:  We will begin by touring the different versions of the SAS program that are available to us on campus.  Our next stop will be getting data into SAS, followed by some descriptive statistics. We will then move onto Regression and ANOVAs, and if time premits PCA and/or Factor analysis.  If you have a particular analysis in mind that you would like to work through in SAS, please let me know beforehand – email oacstats@uoguelph.ca.

June 11- R/RStudio:  We will again begin our tour with RStudio and discuss the merits and challenges of using the R software.  We will then work through a number of ways to get the data into RStudio, followed by some descriptive statistics and data visualization options.  We will move onto Regression and ANOVAs, and if time permits we will try our hand at some on-demand analyses.  If you have a particular analysis in mind that you would like to work through in R, please let me know beforehand – email oacstats@uoguelph.ca.