R: ANOVA with an RCBD (updated 20181120)

PDF copy of ANOVA with an RCBD notes

Analyses of Variance (ANOVA) is probably one of the most used statistical analyses used in our field.  In R, there are many different ways to conduct an ANOVA.  The key, as is for any analysis, is to know your statistical model, which is based on your experimental design, which in turn is based on your research question and hypothesis.  We will work through an RCBD (randomized complete block design) using 2 commonly used ANOVA functions in R, to see the differences and how each function handles a mixed model.

Please download the Excel dataset and R script to be used for this workshop.

RCBD example

Let’s review the data collected from a small RCBD trial.  There were 4 blocks, where 6 treatments were randomly assigned to each.   The statistical model for this experimental design is:

RCBD Statistical Model

aov()

Whenever you run an ANOVA or any statistical analysis in R, it is recommended that you save it as an object, that way you can run a summary() on the object and refer to it later for plotting and other future analysis.  To conduct the ANOVA, we will use the statistical model provided above and essentially translate it into R.

model <- aov(nitrogen ~ block + trmt, data=rcbd_data)

We are saving the results into an object called model.  Inside the aov() you can see our model that matches the statistical model : nitrogen ~ block + trmt.  Notice that you do not use an ‘=’ but you use a tilda or ‘~’.  Also note, contrary to SAS, you include the + signs.  Our model is a basic one where we have the random effect of the block and the fixed effect of the trmt.

Once you run this, you need to run the object model to see what you have as results.  Then to see the anova table run summary(model).

 

Challenges with this analysis

Has anyone noticed how R has treated our random effect of block?  Yes, it is treating it as a fixed effect.  What about our trmt effect – notice anything odd about this one?  Why is there only 1 df when we have 6 treatment levels – shouldn’t we have 5 df and not 1 df???

We need to set the block and trmt variables as factors before we run the ANOVA in R.  Remember we used this coding before, we will use 2 new objects for this and call them block_fac and trmt_fac.

block_fac = as.factor(rcbd_data$block)
trmt_fac = as.factor(rcbd_data$trmt)

Now let’s try the ANOVA again.  Do these results make more sense?

ASIDE: CALCULATING p-values

As we continue to work on ANOVAs I wanted to point out different ways to calculate your p-value.  Ideally the p-value will accompany your analysis, but situations exist when that may not happen.  A quick way to calculate the p-value is by using the pf() function in R.  Let’s test this out with our example above.  We are going to calculate the p-value for our trmt effect.

1-pf(0.474,5,15)

pf(F-statistic value, numerator df, denominator df) – in our case the numerator df is 5 for the trmt effect and the denominator df is the df associated with the residual, which is 15 in this case.  When you run this you should get the same p-value you get when you run the summary() on the model.

 

Residual analysis following aov()

One of the assumptions of any ANOVA is to ensure that the residuals are normally distributed.  Let’s take a look at the R script to try some plots to see what we can do.

Also recall the shapiro.test that we used earlier – try running this on the residuals of our model.

lmer()

One of the challenges with our previous analysis is that aov() ran the analysis and considered our block effect as a fixed effect.  aov() uses ordinary least squares as the methodology for calculating the ANOVA table.  In other words, we can easily calculate all the SS, MS, F, by hand if we wanted to.  When adding a random effect into our models we need to take a slightly different approach.  This is the first change in the methodology of our ANOVAs – moving to a REML or Restricted Maximum Likelihood methodology.  This is where the lmer() package comes into play.

Let’s start by reviewing our statistical model above.  Recognizing that we need to find a way to let R know that the block effect is a random effect in our model.  Let’s create a new object called rmodel with our new model:

rmodel <- lmer(nitrogen ~ (1|block) + trmt, data=rcbd_data)

Notice how we set out the random effect of block in this model – (1|block).

As we’ve done previously, review the results in the object we called rmodel, then try the summary() function on rmodel.  Can you see the differences in the results?   To see the ANOVA table results and fixed effects we need to use a new function anova() with the rmodel object we created by running the lmer() function for the mixed model.

Now you should see the fixed effect trmt results – but what is missing?  How do we obtain the p-value?

1-pf(0.4719,5,15)  – which gives us a p-value of 0.79 – indicating that we should accept the Null hypothesis that the Nitrogen is the same for all treatments OR there are no differences between the treatments.

Residual analysis following lmer()

As we did with the first model let’s check our residuals.  Since we ran a proper mixed model, we will not have access to the same set of plots we did earlier.  Please refer to the accompanying R script for the plot and normality test syntax.

After reviewing both, would you conclude that the residuals are normally distributed and that we did a good job with this model?

NLME package

Sometimes we may be working with data that is not as normal as we would like it to be, and we want the option to specify the distribution of the incoming data.  The NLME package will allow us to do this amongst many more options.  We will NOT use the added functionality of the NLME package in this workshop, but I want to work through our RCBD example using this package, so that you are aware of it and can work through your own data when the time comes.

We are using the same data and same statistical model, however, our R script will change due to the new package:

newmodel <- lme(nitrogen ~  trmt_fac, random = ~1|block_fac,  data=rcbd_data)

Notice the changes in the syntax when using the NLME package.  This is the identical model that we used in LMER – but different syntax.  Please note that it is recommended to create the treatment variable as a factor before you include it in the model.

As we did above, we will check the residuals, and we will also use EMMEANS to look means comparisons.

Conclusion

We worked through 1 example using 3 different packages that can run an ANOVA analysis.  The first aov() only ran our model as a fixed effects model which was incorrect for our RCBD.  The second analysis use the lmer() package – which used our mixed model correctly but left us calculating the p-value for our fixed effect separately.  The third analysis, we used NLME package to run our mixed model followed with the means comparison tests.  There are other packages available where an ANOVA analysis forms part.  However, you should do your homework and seek out any accompanying documentation, ideally a refereed journal discussing its development and use, before using it for your own research.

Name

R: Getting comfortable with your data – Descriptive statistics, Normality, and Plotting

PDF copy of the Getting comfortable with your data – Descriptive statistics, Normality, and Plotting

Once we have our dataset cleaned and tidied, it is time to start getting familiar with it.  Running descriptive statistics and plotting the data will give you a great sense as to what your data looks like and what type of analysis you can conduct on it.  Let’s start by ensuring that we are all working with the same dataset – the woodchips.xlsx file.

Please also download the R script used in the Getting comfortable with your data.

Descriptive Statistics

Summary

R has a versatile function called Summary() that provides a summary of your analysis results.  A function that we will use on a regular basis as we move to working more with our data and its analysis.  For now we have a dataset called woodchips – let’s try out the summary function on a dataset without any analysis:

summary(woodchips)

The results provide the mean, median, min, max, and quartiles for numeric values and data types for character or string variables.  A great start to working with your data are these values.

Normality test – Shapiro-Wilk

When running ANOVAs, which we will tackle later on, one of the assumptions is that our residuals have a normal distribution.  We may also be interested to test the distribution of our outcome variable.  A common test for this is the Shapiro-Wilk test.  To run this test on our woodwt variable, we use the following:

shapiro.test(woodchips$woodwt)

Notice that by running this small piece of code we ONLY get our Shapiro-Wilk statistic and associated p-value.  Unlike other statistical packages, we only have the 1 test result and not 3-4 other tests.

Frequency and Cross-tabulations

All data that we collect may not be continuous, we may have categorical data, such as Quality score and SampleID in our Woodchips dataset.  Calculating a mean or testing for a normal distribution doesn’t make any sense for these 2 variables.  We may be interested in their frequencies – for instance how many observations have a quality score of 2 or 5.  To calculate a simple frequency we use the table() function in R.

table(woodchips$quality.factor)

Let’s try a crosstabulation between the SampleID and Quality score:

table(woodchips$sampleID, woodchips$quality.factor)

Remember we can always save our results as an object, simply be naming it – for example let’s save this frequency table in an object called mytable.

mytable <- table(woodchips$sampleID, woodchips$quality.factor)

To view its contents we simply type the name of the object and run it.

mytable

Two more options that accompany the table() include margin.table() and prop.table().  If you want row or column tables for your crosstabulations, you can run the margin.table() function.  To obtain row and column percents you would run the prop.table() function.  Try it out and see what you get as results.

Plotting your data

Along with running descriptive statistics we may also want to plot our data to get a visual representation of our data.  There are a number of functions available to you for plotting.  Let’s start by using some of the base plotting functions and move onto playing around with one of the more popular plotting packages called ggplot2.

Base R plotting functions are:

  • plot() for scatterplots and x-y plots
  • boxplot() for a box plot
  • barplot()  for a barchart

Review the attached R script file for the examples and explanation of the code used.

Adding labels to the plot()

The syntax for the plot() is:

plot(woodchips$woodwt, main=”Weight of woodchips”, xlab=”Indiviudal Samples”, ylab=”Weight in grams”)

plot() calls on the function and will produce a plot that is available in the Plots window in RStudio

woddchips$woodwt – the the outcome variable that we want to plot

main=”…”  creates the Main Plot title.  The title must be contained in between the ” ”

xlab=”…”  creates the X-axis label

ylab=”…”  creates the Y-axis label

Aside:  Working with dataset names

You may have noticed that whenever recalling a variable name, you must include the name of the dataset with the variable and separated by a $.  There are different ways of using the dataset name.  Here are 2 lines of syntax that perform the same task but are recalling the name of the dataset in a different way:

with(woodchips, plot(quality, woodwt, pch=as.integer(quality)))
plot(woodchips$quality, woodchips$woodwt, pch=as.integer(woodchips$quality))

We can use these two methods with any R syntax – the way you choose to work will be personal preference.

ggplot2 package

The ggplot2 package is an extremely versatile package used to create plots in R.  During the Winter 2018 semester we had the pleasure of having Andrew Frewi, Ph.D., talk to us and show us how to use ggplot2.  Let’s review some of the plots that were created with Andrew, highlighting some of the features of ggplot2.

As we work through the next sections of the R workshop, we will try to incorporate the plots as we go along.

Name

R: Cleaning and tidying data

PDF copy of the Cleaning and tidying data NOTES

Quite often when you work with folks who are R experts or aficionados, you hear the term “data wrangling”.  To wrangle your data involves 3 steps according to Wickham and Grolemund (2017), the authors of R for Data Science:  Importing your data, Tidying your data, and Transforming your data.  We have already discussed importing or bringing your data into R, and we have discussed some transformations of the data as well – creating new variables and recoding.  However, we have not discussed tidying your data.  To me, this is synonymous with cleaning and reshaping your dataset.  Depending on your analysis, you may need your data in a long columnar format or you may need your data in a broad and short format.

This section of the workshop we will use data that accompanies the tidyverse package. 

Please download the R script used in the cleaning and tidying data.

What in the world is a Tibble?

No, it is not an imaginary creature from Fraggle Rock created by the Jim Henson Company.  That’s the first thing that comes to my mind 🙂  A tibble in R is a newer way/version of the classic dataframe in R.  More efficient way of handling your data, especially when printing it to your screen.  There are many more advantages as well, but for now, we just need to recognize that it is a new and more efficient way of working with dataframes in R.

Tidy Data

Tidy data is another term that you may hear with working with data in R.  There are 3 “rules” to having tidy data and they are:

  1. Each variable must have its own column
  2. Each observation must have its one row
  3. Each values must have its own cell

To me, this is a clean dataset and nothing too new.  New terms for the concept of a clean dataset.  But be aware of the term “tidy data” and the 3 caveats – in case someone asks you.

Reshaping our data – Gathering

For many of us that have used SAS for many years, may see this particular aspect of R as transposing our data – going from wide to long.  We will be using the table4a available in the package.

Notice that this table lists a country followed by 2 columns one for 1999 and a second for 2000 data.  Our goal with this exercise is to create a tidy dataset that contains the following 3 variables:  County,  Year,  Cases.

newtable4a <- gather(table4a,”1999″, “2000”, key=”year”, value=”cases”)

newtable4a – the name of the new tidy dataset we are creating
gather() is the function we are using to gather the data in the original columns labelled 1999 and 2000 into new variables called year and cases

We first need to tell the gather() the name of the original table – table4a in our case

We then list the original column names that we wish to gather into one column – so 1999 and 2000

key=”year” – this is the name of the new variable that will contain the values of 1999 and 2000

value=”cases” – our new variable that will hold that current values under 1999 and 2000 will be called cases, and their values will be placed here by country and the year it was identified with in the original table.

Try it out and see what happens.  As you work through this example, I would like you to think about your own research data.  Can you see a situation where this would be helpful for you?

Try replicating our example with the table4b – this table contains the country population.

Reshaping our data – Spreading

If I do the comparison again to SAS – spreading is the same as transposing again – but now we are going from a columnar dataset to a wide dataset.  Imagine that you have collected your data on some variable for a number of months.  Now we want to see if there is a correlation between the measures at each month – in order to perform the correlation analysis, we need each row or observation to contain the measurements taken at each month – so a wide dataset.

In R, we refer to this action as spreading the data.  We will again use the sample data that accompanied the tidyverse package.  Let’s start by looking at table2.   This table contains the variables:  country, year, type, count.  We want to create a table that is a tidy dataset and contains the variables:  country, year, cases, and population.

newtable2 <- spread(table2, key=type, value=count)

newtable2 – the name of the new tidy dataset we are creating
spread() is the function we are using to spread the data

Similar to above within our spread() function, we first tell R the name of the original table we want to work with – table2

We then identify the key variable – so the variable that we will be breaking out the data – you’ll notice that values in the type variable in table2 are cases and population – and these will be the variable names in our newtable2.

The last piece of information we need to inform R is that the value of our new variables are the values contained in the variable called count in the original table2.

Try it out and see what happens.  As you work through this example, I would like you to think about your own research data.  Can you see a situation where this would be helpful for you?

Changing Data Types

When we introduced R in the first section of the workshop, I defined the different types of data that R can use.  Depending on how you create your data in R, whether it is by importing it from Excel or a CSV file, or creating it directly in R, there may be situations where you may need to change a variable from one data type to another.

Let’s try a few conversion with a new data file that you will first import from Excel.  The data file is called woodchips.xlsx and can be downloaded directly from this link.  Please import it into your R workspace by a method you prefer.  I used the following code:

woodchips <- read_excel(“woodchips.xlsx”, sheet=”Sheet1″, col_names=TRUE)

To see what is inside the dataset, simply type the name that you assigned to the dataset – in my case that would be:

woodchips

You should see 3 variables:  sampleID, woodwt, and quality, along with a snippet of the data.  You should also see that the sampleID is a <chr> or character data type and both woodwt and quality are <dbl> or double precision floating point number (think of this as your decimal data).

In RStudio, in your Environment tab, you should also see the name of your dataset – woodchips.  If you select the arrow to open the contents, you will see the names of the variables along with their structures and a few observation values.  We see that SampleID is character and that both woodwt and quality are numeric.  Let’s replicate this in our Console window by using the str() or Structure function found within the Base R.

str(woodchips)

Notice that the results are the same as in the Environment window.

Changing characters to factors

Let’s take a look at our SampleID variable – it is a character but we want it to be treated as a factor – a classification variable.  R is very particular about what types of variables can be used in what analyses.  In this case we know we want to use SampleID as a factor.  To convert it we need to use the as.factor() function as follows:

as.factor(woodchips$sampleID)

What do we need to do if we want R to replace the current contents of our variable sampleID in the dataset woodchips with the new factor version??

woodchips$sampleID <- as.factor(woodchips$sampleID)

Can we take a shortcut and drop the dataset designation?  Why or why not?

Changing numeric to factors

Same concept, except now we have a numeric piece of information we want to be treated as a factor.  For illustration purposes, let’s use the quality variable.  Why would we do this with our research data?  When you set out your trial you labelled your plots or your treatments using numbers – we are not going to do any calculations with these numbers, but they will be used as a classification variable.  In R, we need to set these as factors in our dataset.  Again, for illustration purposes let’s use the quality variable and convert it to a factor.  Follow the same process we did above.

First check the structure of your variable, then change it to factor using the as.factor() function and check to make sure it worked.

str(woodchips$quality)
woodchips$quality <- as.factor(woodchips$quality)
woodchips$quality
str(woodchips$quality)

Changing factors to numeric

The reverse may be happen as well.  We may have a variable in our dataset that we need to go from factor back to numeric.  Our little example is a perfect one.  In this case you would use the as.numeric() function.  Let’s try it on our quality variable.

str(woodchips$quality)
woodchips$quality <- as.numeric(woodchips$quality)
woodchips$quality
str(woodchips$quality)

 

Name

R: Introduction to R and Definitions

PDF copy of the Introduction to R and Definitions notes

Before I learn a new software or new skills, I often like to do some homework and ask the silly questions like:  what, when, why, and how, to give me a base understanding of the software.  So, let’s work through these questions for R.

What is R?

R is a system that is used for statistical computation and graphics.  It has a number of aspects to is that include a programming language, graphics, interfaces or connection opportunities with other languages, and debugging capabilities.  I have found that many do not refer to R as a statistical software package, because it can do so much more.

What does this all mean?  It means that R is a very robust program that folks use for a variety of reasons, it’s not just for statistical analysis!

Where did R come from?

The history of software packages can be quite interesting to learn about.  For instance R has been defined as a “dialect” of S.  Some of you may remember the statistical software called S-Plus?  Well, that’s where R comes from.  It was developed in the 1980s and has been one of the fastest growing open-source software packages since.

What does “open-source” mean?

I’m sure you’ve heard of this term in the past or in different contexts.  One thing that you will hear when people talk about R, is that it is free or that is is open-source.  Keep in mind that open-source means that it is freely available for people to use, modify, and redistribute.  Which usually translates to: there is no cost to acquire and use the R software!  Another aspect of open-source is that it is or rather can be community-driven.  So, any and all modifications to the software and subsequent documentation (if it exists) is driven by the community.

Please note, that R has matured over the years, and today’s R community is extremely strong, and encouraging documentation for anything that is released, making it a very desirable product.  This may not always be the case with open-source software.

Who uses R?

Business, academia, statisticians, data miners, students, and the list goes on.  Maybe we should ask the question, who is NOT using R, and then ask the question Why?

There are so many different statistical software options today and which one you choose to use will depend on several different factors:

  • What does your field of study use?
  • If you are a graduate student, what does your supervisor suggest and use?
  • What type of analyses are you looking to perform and does your program of choice offer those analyses?
  • What types of support do you have access to?

How does R work?

If you’re looking for a statistical package that is point and click, R is not for you!  R is driven by coding.  YES!  you will have to learn how to write syntax in R.  You can use R interactively by using R-Studio, and you may never reach a point in your studies or your research where you will move away from the interactive capabilities of R – so no big worries!  Besides, today there are a lot of resources available to help you learn how to use R.  So don’t let that stop you!

Base R and Packages

When you download and install R, the Base R program is installed.  To run many of the analyses you may be required to install a package.  What is a package?  It is a collection of functions, data, and documentation that extend the current capabilities of the Base R program.  These are what makes R so versatile!  As we work through our workshops and associated code, I will provide you with the name of the Package.  There are a number of ways to acquire and install packages, we will review these as we work through them.  Please note that there may be several packages that perform a similar analysis, please read all the documentation before selecting a package to use.

I will add a page to this Blog in the near future (Summer 2018) that will list the packages and associated documentation that I have used and recommend.

How do I acquire R? Where can I download it?

Visit the Comprehensive R Archive Network (CRAN) website to download the R software.  https://cran.r-project.org/   Please note that this will also be the website used to download future packages used in analyses as well.

To download R-Studio, visit The RStudio website at https://www.rstudio.com/

Both websites have comprehensive instructions to assist you with the installation on your own computers.

Let’s get started by reviewing some definitions

When you think about conducting any statistical analysis, your starting point is data.  So let’s start with a few definitions of the different data types observed in R.

Numeric, Character, or Logical

A quick overview of the different types of data you can work with in R.

  • Numeric = numbers
  • Character = words
  • Logical = TRUE or FALSE – not all data is in the form of numbers or letters, sometimes you might have data that has been collected as matching a criteria (TRUE) or not matching a criteria (FALSE).  We’ll work through examples of this in another session, for now just be aware that this type of data is commonly used in R.
  • How do you find out what form your data are in?
    • class(…)
    • The results of this statement will tell you exactly what form your data are.
    • Example:

testform <- c(12, 13, 15)
class(testform)

> class(testform)
[1] “numeric”

Numeric Classes in R

Numbers are handled in a couple of ways in R.  These are referred to as the Numeric Classes of R, and two that we will are known as integer and double.  Having a basic understanding of these different numeric classes will come in handy.

  • Integer:
    • If you think back to high school math, you’ll probably remember the term “integer”.  First thing that comes to my mind when I think of integer – is Whole number, no fractions, no decimal places.
    • As you can imagine storing numeric data as integers does not require a lot of space.  So, in terms of computing, if you do not foresee your analysis needing decimals and precision numbers, then integers are the way to go.
  • Double:
    • Double precision floating point numbers – think of this as the decimals side of your numeric data.
    • Storing Double numeric data takes up more space than Integer data.  But sometimes you’re just not sure what you will need, so R will switch between the 2 numeric classes as it is required for your analysis.

Data Types in R

Let’s review the different data types available to you in R.

Vectors

  • Let’s not panic at some of these terms, but work through examples of each.  Think of a vector as a column of data or one variable.
  • Vectors can be numeric, characters, or logical format.
  • How to create a vector:

# a numeric vector
a = c(2, 4.5, 6, 12)

# a character vector
b = c(“green”, “blue”, “yellow”)

# a logical vector
c = (TRUE, TRUE, FALSE, TRUE)

Coding Explanation:

a = ; b = ; c = ;  creating vectors called a, b, c respectively.  Please note that a <- is the same as a =

c(x, x, x  )  tells R that we are creating a vector or a column with the contents found in the parentheses.  The , tells R to drop to the next row in the vector/column being created.

character values must be contained in ”  “, but logical values do not.

Matrices (matrix)

  • Think of a matrix as an object made up of rows and columns.
  • The vectors within a matrix must all be the same type, so all numeric, or all character, or all logical.
  • How to create a matrix:

# creates a 5 x 4 numeric matrix – 5 rows by 4 columns
y <- matrix(1:20, nrow=5,ncol=4)

Coding Explanation:

y = or y <- create a matrix called y
matrix(  )  – call the function matrix to create the matrix y
1:20 – the values of the matrix
nrows =  let’s R know how many rows are in the matrix that you are creating
ncol= let’s R know how many columns are in the matrix that you are creating.

Resulting matrix y will look like:

> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

Arrays

  • Arrays are very similar to matrices.  Think of an array as a matrix with an added dimension.  For example, we may have a matrix that contains data for 2015.  We want to add in the same data for 2016 in the same format.  So we can create an array, with a matrix that contains 2015 data and a matrix that contains a matrix of the 2016 data.

Data Frames

  • A Data Frame is a general form of a matrix.  What this really means, is that a data frame is like a dataset that we use in other programs such as SAS and SPSS.  The columns or variables do not need to be the same type as is required in a matrix.
  • We can have one vector/column/variable in a data frame that is integer (numeric), followed by a second one that is character, followed by a third that is logical.  But in a matrix, all three vectors/columns/variables must be the same type: numeric, character, or logical.
  • How to create a data frame:

d <- c(10, 12, 31, 4)
e <- c(“blue”, “green”, “red”, NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
sampledata <- data.frame(d, e, f)
names(sampledata) <- c(“ID”, “Colour”, “Passed”) # variable names

Coding Explanation:

sampledata <- or sampledata = name of the data frame that we are creating
data.frame(  )  calling on the function that creates a data frame
d, e, f  tells R that we are creating the data frame with the 3 vectors in the order of d, followed by e, followed by f

names(sac(“ID”, “Colour”, “Passed”) mpledata) – providing variable names within the data frame
c(“ID”, “Colour”, “Passed”)  – creating or identifying the 3 variable names within the data frame:  ID, Colour, Passed are the variable names

Lists

  • an ordered collection of objects.
  • objects in the list do not have to be the same type.
  • You can create a list of objects and store them under one name.
  • How to create a list:

# a string, a numeric vector, a matrix, and a scaler 
wlist <- list(name=”Fred”, mynumbers=a, mymatrix=y, age=5.3)

Coding Explanation:

wlist <- or wlist =  creating a list called wlist
list(  )  – calling the function to create a list
name=”Fred”, mynumbers=a, mymatrix=y, age=5.3  values that are to be contained in the list called wlist

Factors

Factors are categorical variables in your data.  You can have a nominal factor or you can have an ordinal factor.  Yup, those words again – remember nominal and ordinal data are categorical pieces of data, so you can fall into one group or another.  Nominal, there is no relationship or order to the categories, whereas ordinal data there is an order to the different levels.

Questions or Homework for Self-study work:

  1. Create examples of a vector, matrix, data frame, and a list.
  2. Using the following files, identify the type of data :
    • cars sample found in R
  3. Create a data frame with the following information:
    • column 1:  13, 14, 15, 12
    • column 2:  Male, Female, Male, Male
    • column 3: TRUE, TRUE, FALSE, FALSE
    • column 4: 26, 44, 77, 31
  4. Can I create a matrix with the information listed in #3 above?  Why or Why not?

Name

ARCHIVE: Summer 2018 – Special Topics Workshops in June

In response to requests during the SAS and SPSS workshops, I will be offering the following 4 Special Topics workshops in June.  If you are interested in adding R to any of these workshops, please email oacstats@uoguelph.ca to let me know.

Special Topics Workshops

  • Principal Component Analysis  – examples will be demonstrated in both SAS, SPSS, and R
  • Date: Jun 13, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Experimental Designs – Split-plot, Split-block, Split-split-plot, Latin Squares. Examples will be demonstrated in SAS and R.
  • Date: Jun 14, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Linear and Nonlinear Regression for SAS.  We will use PROC REG, PROC NLIN, PROC GLIMMIX, and possibly PROC NLMIXED.
  • Date: Jun 27, 2018  9am-12Noon
    Location: ANNU Rm 102
  • Please register here.
  • CANCELLED – Surface Analysis in SAS
  • Date: Jun 28, 2018  9am-12Noon
  • Location: ANNU Rm 102