Principal Component Analysis – SPSS, SAS, R

Many statistical procedures test specific hypotheses.  Principal Component Analysis (PCA), Factor analysis, Cluster Analysis, are examples of analyses that explore the data rather than answer a specific hypothesis.  PCA examines common components among data by fitting a correlation pattern among the variables.  Often used to reduce data from several variables to 2-3 components.

When running a PCA, you need to consider a couple of questions:  How many factors/components should be used, and how do you interpret the factors/components?

Before running a PCA, one of the first things you will need to do is to determine whether there is any relationship among the variables you want to include in a PCA.  If the variables are not related then there’s no reason to run a PCA.  The data that we will be working with is a sample dataset that contains the 1988 Olympic decathlon results for 33 athletes.  The variables are as follows:

run100m:  time it took to run 100m
longjump:  distance attained in the Long Jump event
shotput:  distance reached with ShotPut
highjump:  height reached in the High Jump event
run400m: time it took to run 400m
hurdles110m:  time it took to run 110m of hurdles
discus:  distance reached with Discus
polevault:  height reached in the Pole Vault event
javelin:  distance reached with the Javelin
run1500m:  time it took to run 1500m
score:  overall score for decathlon

Download the data in an excel spreadsheet here.

For this workshop we will conduct the same analysis in the 3 commonly used statistical packages:  SPSS, SAS, and R.  We will stat with SPSS then progress to SAS and finally to R.

If you are using SAS, please download the SAS program.

If you are using R, please download the R Script file.

Name

ARCHIVE: Summer 2018 – Special Topics Workshops in June

In response to requests during the SAS and SPSS workshops, I will be offering the following 4 Special Topics workshops in June.  If you are interested in adding R to any of these workshops, please email oacstats@uoguelph.ca to let me know.

Special Topics Workshops

  • Principal Component Analysis  – examples will be demonstrated in both SAS, SPSS, and R
  • Date: Jun 13, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Experimental Designs – Split-plot, Split-block, Split-split-plot, Latin Squares. Examples will be demonstrated in SAS and R.
  • Date: Jun 14, 2018  9am-12Noon
  • Location: ANNU Rm 102
  • Please register here.
  • Linear and Nonlinear Regression for SAS.  We will use PROC REG, PROC NLIN, PROC GLIMMIX, and possibly PROC NLMIXED.
  • Date: Jun 27, 2018  9am-12Noon
    Location: ANNU Rm 102
  • Please register here.
  • CANCELLED – Surface Analysis in SAS
  • Date: Jun 28, 2018  9am-12Noon
  • Location: ANNU Rm 102

SPSS Workshop: GLMM and Non-gaussian Distributions

When we collect data from our research trials, we do not always have data that is “well-behaved” or that comes from the traditional normal distribution curve.  It can have a number of distributions and with the latest statistical methodological advances, SPSS can handle some of these as well.  Your first job is to be able to recognize when a normal distribution is NOT appropriate and which distribution is an appropriate starting place.   Non-Gaussian distributions are what these are referred to.  Remember Gaussian is the same as calling it Normal.

Where do we start?  Think about your data – what is it?

  • A percentage?
  • A count?
  • A score?

How do we know that our data is not from a normal distribution?

  • Always check your residuals!
  • Remember the assumptions of your analyses?
    • Normally distributed residuals is one of them!

Let’s work with the following example.  We have another RCBD trial with 4 blocks and 4 treatments randomly assigned to each block.  There were 2 outcome measures taken:   proportion of the plot that flowered, and the number of plants in each plot at the end of the trial.

Generalized Linear Models in SPSS

Those of you that follow the terminology that is commonly used when talking about ANOVAs – Analysis of Variance – may notice that there is one term that is NOT included in this heading?  Any guesses?  MIXED.  In SAS, we talk about and work with Generalized Linear Mixed Models or GLMM (GLIMMIX), but I just realized that the MIXED part is missing here.  That’s way I could not figure out how to add a RANDOM effect to the models!

We will however, still use the RCBD data that was used in the SAS workshop.  It contains, as noted above proportion data and count data – perfect examples of non-gaussian data.

Please download the Excel spreadsheet here and load it into SPSS.

We will work through each of the outcome variables in the workshop using the following steps:

  • Analyze
  • Generalized Linear Models
    • Generalized Linear Models
    • We will work through the tabs for each variable together and discuss the outputs

Please note that if you have attended the SAS workshop and this workshop – and are trying to compare the results – they will NOT match as we are creating a different model in each program – since we cannot add a RANDOM effect in SPSS yet.  You can, however, create the same fixed effects model and the results match perfectly.

See you later in May!

Name

 

SPSS: Introduction and Getting the Data In

SPSS software

One of the biggest PROs to using SPSS is its familiar look and feel.  When you open it – it almost feels like home – so familiar – so much like Excel!  The rows representing individual cases or observations and the columns representing variables.   Only thing, is that in SPSS the Data View – is just that – a view of the data, a container for the data.  You cannot create any new variables in this view – in Excel we can move our cursor to a new cell and type in a formula – you CANNOT do this in SPSS.

Note at the bottom there are 2 tabs: Data View and Variable View.  We are currently in the Data View.  If you select the Variable View tab – things look a lot different from Excel.  We now have rows to match our variables (or columns in the Data View) and the columns in the Variable View represent different pieces of information about our variables (or metadata).

Excel Data into SPSS

Let’s take a step back and work through an example of bringing a dataset into SPSS.  For the purposes of this part of the workshop, please download the following 2018 Trial Data excel file.  A small 10 observation file with 4 variables.  Two of the variables contain letters and 2 are numbers for weight and height measurements.

Let’s work together and bring the file into SPSS.  Believe it or not, it is as simple as doing the following:

  • File
  • Open
    • Data
      • Navigate to the folder where you saved the data
      • Change the Files of Type: to Excel
      • Select the file
    • Open
    • Answer the Wizard questions
      • Select the correct worksheet – if there are more than 1
      • Double check the Preview to ensure that it looks correct
      • When you are satisfied – Click OK

That was pretty straight forward wasn’t it?

Variable View

Two things happened when you Clicked OK – SPSS opened a second window – the SPSS Statistics Viewer or Output window and your data popped into SPSS as we expected.  In the Data View, notice the icons associated with each variable – we will talk more about these in a moment.

Click on Variable View – notice how your 4 variables are now listed as the rows and some of the information regarding the variables has been filled in.  Let’s work through each one separately.  I will talk about the first variable here, and you will fill in the remaining 3 variables as an exercise.

Name – lists the name of the variable as it was read in from Excel.  Please note that you should keep these as small and as informative as possible.  Please review the Best Practices for entering your data in Excel here.

The second column is the Type – if you select the word Type you will be presented with the options that are available to you in SPSS.  Generally speaking we tend to primarily use the String and Numeric types.  However, you may also have the occasion to use the Date type.

Width – is the width of the variable we have – so how many spaces are we willing to accept as data.  For ID it is set to 3.  You can change this at any time.  Decimals goes along with the Width.  Our data has no decimals, but you can change this at any time as well.  Note that when we create new variables, SPSS will, by default assign 2 decimal places to the new variable.

Label – Ah…  the lovely label.  I highly recommend that you complete the Label and the next Values column if you plan to use SPSS for your analysis.  It will save you a lot of time.  So, what is it?  Our variable names are short and sometimes not very informative.  In this case, Weight?  Weight of what?  What units of measure were used?  You can add all this information in the label field.  How about for weight – Weight measured at 24 months of age in lbs.  This comes in extremely handy if you are working with surveys and are using Q1, Q2, etc..  as variable names.  In the label field you can add the entire question.  What is really nice about this – when you run your statistics, in the output window rather than seeing weight or Q1, you will see the label you typed in here.  So, if you’re like me – no more loose papers with notes!

I referred to the next column, Values, as another really handy one to have and to complete.  When you click on Values you are presented with a small dialogue box where you can enter the Value and its Label.  Let’s use an example:  Treatment, many folks may use a numbering system for their treatments, so a 1, 2, 3, 4 or a lettering system, A, B, C, D.  But no one knows what each treatment refers to.  This is the spot where you can add that label.  So if treatment 1 = 20% Iron; 2 = 25% Iron; 3 = No Iron; 4 = Control.  We can add all of that here!

Missing is our next one.  Another example – let’s say I have collected data and for one reason or another I have some missing data – maybe the equipment broke down one day, and one of the areas I was supposed to go measure was flooded and I couldn’t get to it.  Chances are you are going to document that somewhere – in a lab book or your notes.  But what about the data?  We tend to leave it blank right?  We have no data so we’ll leave it blank.  Another way to handle it is to  provide a value that is NOT a possible value for your data – so 999 for age or maybe 9999, and add that value to your dataset.  Now, we don’t want it to be included in any statistical analysis, so in SPSS we add that value to the Missing column in Variable view.

Columns and Align – are ways to make your Data View prettier or neater.

Measure – ah the important one!  You have 3 options:

  • Ordinal
  • Nominal
  • Scale

We will review these in the workshop as they are VERY important when you go to run any statistics and creating Charts in SPSS.

Output Window

The output window is where you will see all of your results and the log.  The Log part of SPSS tells you what SPSS has done every step of the way and it also provides you with the syntax that SPSS used to get the results.  Yes, one of the strengths to SPSS is its user-friendly interface, but if you want to learn to code in SPSS, that option is there as well!  I’ll talk more about that in a bit.

How can you save your results in SPSS Output window?  Well, you can just save them, but they will only be accessible when you are using the SPSS software.  You can export them to PDF, Word, Excel, etc…  by using the File – Export option.  But, personally I really like the Copy and Paste option.

If you select the tables and/or charts you want to save for later, Copy the selected materials, then simply Paste them into  your Word document.  Works like a charm EXCEPT on a Mac.  On the Macs, if you are trying to Copy and Paste a Chart you will need to Copy Special – select image, and then Paste it into Word.  This has happened on a PC as well at times.

OK     Paste     Reset     Cancel     Help

When you start running any analysis in SPSS, you will recognize these 5 words – they appear everywhere.  So a quick definition may be handy.

OK – go ahead and run the analysis

Paste – opens a Syntax window and will “paste” the syntax that SPSS will use to run the analysis you are trying to set up.  If you want to learn how to write syntax in SPSS, this is a great way to do that!

Reset – reset the dialogue box to its original state

Cancel – we know this one

Help – we know this one as well.

Conclusion

A quick overview to get you started using SPSS.

Name

ARCHIVE: Summer 2018 Workshops

Workshops for the Summer 2018 have just been posted and are now available for Registration.  Please register if you are planning on attending.  If you register and need to cancel, please do so with the link on the confirmation email you receive when registering or by emailing oacstats@uoguelph.ca .  The registration link for each workshop is listed below and is unique to that workshop.

SAS

A 2-day workshop will be held on May 8-9, 2018 from 9am – 4pm in ANNU Rm 102.  Topics covered will include:

  • Getting the data in
  • Merging datasets and creating new variables
  • Descriptive statistics
  • ANOVA using GLIMMIX – we will work through a number of examples

If you are new to SAS, please plan on attending the 2 days.  For anyone interested in learning more about GLIMMIX, you are invited to attend May 10 only.  However, please note that any material covered on the first day will NOT be repeated on Day 2.

To register for this workshop, please register for each day separately here.

Date: May 8 – 9, 2018 9am – 4pm
Location:  ANNU Rm 102

SPSS

A 2-day workshop will be held on May 16-17, 2018 from 9am – 4pm in ANNU Rm 102.  Topics covered will include:

  • Getting the data in
  • Merging datasets and creating new variables
  • Descriptive statistics
  • ANOVA and GLMM
  • Non-parametric analyses, including Kruskal-Wallis, and Friedman ANOVA

If you want to follow along with the workshop, please ensure that you have the SPSS Software installed on your laptop.  You always have the option to follow the instructor if you do not have the software on your laptop.  Please plan to attend the 2 days to learn all about SPSS and how you can use it for your research project.

To register for this workshop, please register for each day separately here.

Date: May 16-17, 2018  9am-4pm
Location: ANNU Rm 102

R workshop

A 2-day workshop will be held on May 22-23, 2018 from 9am – 4pm in ANNU Rm 102.  Topics covered will include:

  • Getting your data into R
  • Working with your data – cleaning and tidying
  • Descriptive statistics
  • Packages performing ANOVA
  • Packages performing Regression
  • ggplot2

If you want to follow along with the workshop, please ensure that you have the R and RStudio installed on your laptop.  You always have the option to follow the instructor if you do not have the software on your laptop.  Please plan to attend the 2 days to learn all about R and how you can use it for your research project.

To register for this workshop, please register for each day separately here.

Date: May 22-23, 2018  9am-4pm
Location: ANNU Rm 102

RDM: Starting your Research on the Right Foot!

Join Carol Perry from the Library and Michelle Edwards, to learn how to start your research on the right foot.  If you are just starting your graduate work or if you’re an experienced researcher, join us to learn all about the best practices to help you organize and document your project data, store and analyze your data, secure and preserve your data legacy.  This day long workshop is filled with hands-on exercises to encourage you to treat your data as a valuable commodity.  At the end of this workshop, every participant will complete a Data Management Plan and be will be all set to tackle their research data.

This is a one-day workshop held on Tuesday, June 5, 2018 in ANNU Rm 102.  The workshop startst at 9am and will be finished at 4pm.  Please register here.

Date: June 5, 2018  9am-4pm
Location: ANNU Rm 102

 

Thank-you and hope to see you in a workshop!

Name