ARCHIVE: S20 Workshops – UPDATED

Oh my!!!  I updated the registration page but forgot to update the blog.  Oish!!!  Summer workshops are happening and they start next week!!  We will be using TEAMS for the workshops.  The Friday before the workshops are happening you will receive an invitation to the Meeting space.  Hope to see you there!!

To register please visit https://oacstats_workshops.youcanbook.me/

A few more workshops have been added to this summer’s roster:

Tuesday, July 7: Regression in SAS: Nonlinear

Wednesday, July 8: Regression in R: Nonlinear

Tuesday, July 14: PCA and Cluster Analysis in SAS

Wednesday, July15: PCA and Cluster Analysis in R

Tuesday, July 21: GLMM using Multinomial data in SAS

Wednesday, July 22: Visualizing your analysis results in R

_______________________________________________

Tuesday, May 5: Starting your research off on the right foot. How to organize and collect your data to help make your adventure into statistics a little easier.

Wednesday, May 6: Documenting your data analysis. Whether you are planning on using R or SAS to conduct your analysis, come learn how to use R Markdown to document your syntax and output. If you’re curious what this is – check out the Workshop notes on the OACStats Blog

Tuesday, May 12: Intro to SAS

Wednesday, May 13: Intro to RStudio

Tuesday, May 19: Getting Comfortable with your data in SAS

Wednesday, May 20: Getting Comfortable with your data in R

Tuesday, May 26: ANOVA in SAS – CRD and RCBD

Wednesday, May 27: ANOVA in R – CRD and RCBD

Tuesday, June 2: Regression in SAS: Linear and Multiple regression

Wednesday, June 3: Regression in R: Linear and Multiple regression

Tuesday, June 9: ANOVA in SAS: GLMM

Wednesday, June 10: ANOVA in R: GLMM

Tuesday, June 16: Regression in SAS: Nonlinear

Wednesday, June 17: Regression in R: Nonlinear

Tuesday, June 23: ANOVA in SAS: Repeated Measures

Wednesday, June 24: ANOVA in R: Repeated Measures

Tuesday, July 7: PCA and Cluster Analysis in SAS

Wednesday, July 8: PCA and Cluster Analysis in R

Name

R – Documenting in R and ANOVA/GLMM analyses

Ever wondered how you can write R script, document it, run the script, and document the output – all in one file?  Come join us, on February 18 in Crop Science Rm 121a starting at 9am,  as we learn all about R Markdown.   I’ll introduce R Markdown and then encourage everyone to use as we learn more about ANOVAs and GLMMs.

The Excel file that we will use for the second half of the workshop is downloadable here.

If you can’t make it here is a copy of the notes (created with R Markdown).

If you are a SAS user – keep an eye on this webpage for an upcoming workshop on how to use R Markdown with SAS.

Name

Getting comfortable with your data in R: Descriptive statistics and visualizing your data

We will continue our journey with R and data, with a workshop that concentrates on data visualization and descriptive statistics.  Steps that we would undertake as we being working with our research data.

The notes and accompanying R code are available here.

The podcast for this workshop is available here.  PLease note that you must have a UGuelph account and you must either be on campus or using the UG VPN to view the podcast.

Name

Data types in R

When you think about conducting any statistical analysis, your starting point is data.  R has a slightly different way of working with your data.  Being aware of the differnt types of data in R, can help save a little time when you use a new package and it is asking you about your data.   So let’s review a few definitions of the different data types observed in R.

Numeric, Character, or Logical

A quick overview of the different types of data you can work with in R.

  • Numeric = numbers
  • Character = words
  • Logical = TRUE or FALSE – not all data is in the form of numbers or letters, sometimes you might have data that has been collected as matching a criteria (TRUE) or not matching a criteria (FALSE).  We’ll work through examples of this in another session, for now just be aware that this type of data is commonly used in R.
  • How do you find out what form your data are in?
    • class(…)
    • The results of this statement will tell you exactly what form your data are.
    • Example:

testform <- c(12, 13, 15)
class(testform)

> class(testform)
[1] “numeric”

Numeric Classes in R

Numbers are handled in a couple of ways in R.  These are referred to as the Numeric Classes of R, and two that we will are known as integer and double.  Having a basic understanding of these different numeric classes will come in handy.

  • Integer:
    • If you think back to high school math, you’ll probably remember the term “integer”.  First thing that comes to my mind when I think of integer – is Whole number, no fractions, no decimal places.
    • As you can imagine storing numeric data as integers does not require a lot of space.  So, in terms of computing, if you do not foresee your analysis needing decimals and precision numbers, then integers are the way to go.
  • Double:
    • Double precision floating point numbers – think of this as the decimals side of your numeric data.
    • Storing Double numeric data takes up more space than Integer data.  But sometimes you’re just not sure what you will need, so R will switch between the 2 numeric classes as it is required for your analysis.

Data Types in R

Let’s review the different data types available to you in R.

VECTORS

  • Let’s not panic at some of these terms, but work through examples of each.  Think of a vector as a column of data or one variable.
  • Vectors can be numeric, characters, or logical format.
  • How to create a vector:

# a numeric vector
a = c(2, 4.5, 6, 12)

# a character vector
b = c(“green”, “blue”, “yellow”)

# a logical vector
c = (TRUE, TRUE, FALSE, TRUE)

Coding Explanation:

a = ; b = ; c = ;  creating vectors called a, b, c respectively.  Please note that a <- is the same as a =

c(x, x, x  )  tells R that we are creating a vector or a column with the contents found in the parentheses.  The , tells R to drop to the next row in the vector/column being created.

character values must be contained in ”  “, but logical values do not.

MATRICES (MATRIX)

  • Think of a matrix as an object made up of rows and columns.
  • The vectors within a matrix must all be the same type, so all numeric, or all character, or all logical.
  • How to create a matrix:

# creates a 5 x 4 numeric matrix – 5 rows by 4 columns
y <- matrix(1:20, nrow=5,ncol=4)

Coding Explanation:

y = or y <- create a matrix called y
matrix(  )  – call the function matrix to create the matrix y
1:20 – the values of the matrix
nrows =  let’s R know how many rows are in the matrix that you are creating
ncol= let’s R know how many columns are in the matrix that you are creating.

Resulting matrix y will look like:

> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

ARRAYS

  • Arrays are very similar to matrices.  Think of an array as a matrix with an added dimension.  For example, we may have a matrix that contains data for 2015.  We want to add in the same data for 2016 in the same format.  So we can create an array, with a matrix that contains 2015 data and a matrix that contains a matrix of the 2016 data.

DATA FRAMES

  • A Data Frame is a general form of a matrix.  What this really means, is that a data frame is like a dataset that we use in other programs such as SAS and SPSS.  The columns or variables do not need to be the same type as is required in a matrix.
  • We can have one vector/column/variable in a data frame that is integer (numeric), followed by a second one that is character, followed by a third that is logical.  But in a matrix, all three vectors/columns/variables must be the same type: numeric, character, or logical.
  • How to create a data frame:

d <- c(10, 12, 31, 4)
e <- c(“blue”, “green”, “red”, NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
sampledata <- data.frame(d, e, f)
names(sampledata) <- c(“ID”, “Colour”, “Passed”) # variable names

Coding Explanation:

sampledata <- or sampledata = name of the data frame that we are creating
data.frame(  )  calling on the function that creates a data frame
d, e, f  tells R that we are creating the data frame with the 3 vectors in the order of d, followed by e, followed by f

names(sac(“ID”, “Colour”, “Passed”) mpledata) – providing variable names within the data frame
c(“ID”, “Colour”, “Passed”)  – creating or identifying the 3 variable names within the data frame:  ID, Colour, Passed are the variable names

LISTS

  • an ordered collection of objects.
  • objects in the list do not have to be the same type.
  • You can create a list of objects and store them under one name.
  • How to create a list:

# a string, a numeric vector, a matrix, and a scaler 
wlist <- list(name=”Fred”, mynumbers=a, mymatrix=y, age=5.3)

Coding Explanation:

wlist <- or wlist =  creating a list called wlist
list(  )  – calling the function to create a list
name=”Fred”, mynumbers=a, mymatrix=y, age=5.3  values that are to be contained in the list called wlist

FACTORS

Factors are categorical variables in your data.  You can have a nominal factor or you can have an ordinal factor.  Yup, those words again – remember nominal and ordinal data are categorical pieces of data, so you can fall into one group or another.  Nominal, there is no relationship or order to the categories, whereas ordinal data there is an order to the different levels.

Questions or Homework for Self-study work:

  1. Create examples of a vector, matrix, data frame, and a list.
  2. Using the following files, identify the type of data :
    • cars sample found in R
  3. Create a data frame with the following information:
    • column 1:  13, 14, 15, 12
    • column 2:  Male, Female, Male, Male
    • column 3: TRUE, TRUE, FALSE, FALSE
    • column 4: 26, 44, 77, 31
  4. Can I create a matrix with the information listed in #3 above?  Why or Why not?

 

Name