SPSS: Introduction and Getting the Data In

SPSS software

One of the biggest PROs to using SPSS is its familiar look and feel.  When you open it – it almost feels like home – so familiar – so much like Excel!  The rows representing individual cases or observations and the columns representing variables.   Only thing, is that in SPSS the Data View – is just that – a view of the data, a container for the data.  You cannot create any new variables in this view – in Excel we can move our cursor to a new cell and type in a formula – you CANNOT do this in SPSS.

Note at the bottom there are 2 tabs: Data View and Variable View.  We are currently in the Data View.  If you select the Variable View tab – things look a lot different from Excel.  We now have rows to match our variables (or columns in the Data View) and the columns in the Variable View represent different pieces of information about our variables (or metadata).

Excel Data into SPSS

Let’s take a step back and work through an example of bringing a dataset into SPSS.  For the purposes of this part of the workshop, please download the following 2018 Trial Data excel file.  A small 10 observation file with 4 variables.  Two of the variables contain letters and 2 are numbers for weight and height measurements.

Let’s work together and bring the file into SPSS.  Believe it or not, it is as simple as doing the following:

  • File
  • Open
    • Data
      • Navigate to the folder where you saved the data
      • Change the Files of Type: to Excel
      • Select the file
    • Open
    • Answer the Wizard questions
      • Select the correct worksheet – if there are more than 1
      • Double check the Preview to ensure that it looks correct
      • When you are satisfied – Click OK

That was pretty straight forward wasn’t it?

Variable View

Two things happened when you Clicked OK – SPSS opened a second window – the SPSS Statistics Viewer or Output window and your data popped into SPSS as we expected.  In the Data View, notice the icons associated with each variable – we will talk more about these in a moment.

Click on Variable View – notice how your 4 variables are now listed as the rows and some of the information regarding the variables has been filled in.  Let’s work through each one separately.  I will talk about the first variable here, and you will fill in the remaining 3 variables as an exercise.

Name – lists the name of the variable as it was read in from Excel.  Please note that you should keep these as small and as informative as possible.  Please review the Best Practices for entering your data in Excel here.

The second column is the Type – if you select the word Type you will be presented with the options that are available to you in SPSS.  Generally speaking we tend to primarily use the String and Numeric types.  However, you may also have the occasion to use the Date type.

Width – is the width of the variable we have – so how many spaces are we willing to accept as data.  For ID it is set to 3.  You can change this at any time.  Decimals goes along with the Width.  Our data has no decimals, but you can change this at any time as well.  Note that when we create new variables, SPSS will, by default assign 2 decimal places to the new variable.

Label – Ah…  the lovely label.  I highly recommend that you complete the Label and the next Values column if you plan to use SPSS for your analysis.  It will save you a lot of time.  So, what is it?  Our variable names are short and sometimes not very informative.  In this case, Weight?  Weight of what?  What units of measure were used?  You can add all this information in the label field.  How about for weight – Weight measured at 24 months of age in lbs.  This comes in extremely handy if you are working with surveys and are using Q1, Q2, etc..  as variable names.  In the label field you can add the entire question.  What is really nice about this – when you run your statistics, in the output window rather than seeing weight or Q1, you will see the label you typed in here.  So, if you’re like me – no more loose papers with notes!

I referred to the next column, Values, as another really handy one to have and to complete.  When you click on Values you are presented with a small dialogue box where you can enter the Value and its Label.  Let’s use an example:  Treatment, many folks may use a numbering system for their treatments, so a 1, 2, 3, 4 or a lettering system, A, B, C, D.  But no one knows what each treatment refers to.  This is the spot where you can add that label.  So if treatment 1 = 20% Iron; 2 = 25% Iron; 3 = No Iron; 4 = Control.  We can add all of that here!

Missing is our next one.  Another example – let’s say I have collected data and for one reason or another I have some missing data – maybe the equipment broke down one day, and one of the areas I was supposed to go measure was flooded and I couldn’t get to it.  Chances are you are going to document that somewhere – in a lab book or your notes.  But what about the data?  We tend to leave it blank right?  We have no data so we’ll leave it blank.  Another way to handle it is to  provide a value that is NOT a possible value for your data – so 999 for age or maybe 9999, and add that value to your dataset.  Now, we don’t want it to be included in any statistical analysis, so in SPSS we add that value to the Missing column in Variable view.

Columns and Align – are ways to make your Data View prettier or neater.

Measure – ah the important one!  You have 3 options:

  • Ordinal
  • Nominal
  • Scale

We will review these in the workshop as they are VERY important when you go to run any statistics and creating Charts in SPSS.

Output Window

The output window is where you will see all of your results and the log.  The Log part of SPSS tells you what SPSS has done every step of the way and it also provides you with the syntax that SPSS used to get the results.  Yes, one of the strengths to SPSS is its user-friendly interface, but if you want to learn to code in SPSS, that option is there as well!  I’ll talk more about that in a bit.

How can you save your results in SPSS Output window?  Well, you can just save them, but they will only be accessible when you are using the SPSS software.  You can export them to PDF, Word, Excel, etc…  by using the File – Export option.  But, personally I really like the Copy and Paste option.

If you select the tables and/or charts you want to save for later, Copy the selected materials, then simply Paste them into  your Word document.  Works like a charm EXCEPT on a Mac.  On the Macs, if you are trying to Copy and Paste a Chart you will need to Copy Special – select image, and then Paste it into Word.  This has happened on a PC as well at times.

OK     Paste     Reset     Cancel     Help

When you start running any analysis in SPSS, you will recognize these 5 words – they appear everywhere.  So a quick definition may be handy.

OK – go ahead and run the analysis

Paste – opens a Syntax window and will “paste” the syntax that SPSS will use to run the analysis you are trying to set up.  If you want to learn how to write syntax in SPSS, this is a great way to do that!

Reset – reset the dialogue box to its original state

Cancel – we know this one

Help – we know this one as well.

Conclusion

A quick overview to get you started using SPSS.

Name

SAS Syntax: Best Practices

There are so many different ways to write your SAS code or syntax.  After working in SAS for a few decades now I would like to offer a series of Best Practices when it comes to writing your own syntax.  Remember that how you write your syntax or code is up to you as long as it works – right 🙂

  1. At the end of any DATA step or PROC – add a Run; statement.  This will allow you to select pieces of your code to run separately from your whole program.
  2. Always specify the name of the dataset you are using when running a PROC.  This avoids any confusion in the results when you run parts of your program.  SAS will run any analysis on the last dataset SAS used or created – which may not be the one you want it to use
  3. When using PROC GLIMMIX, take advantage of the * or the commenting ability of SAS.  Include all of your models in the PROC, but add a * in front of all the model statements you are not using at the moment.  For example:
    • Proc glimmix data=flowers plots=studentpanel;
          class block trmt;
          model count_flower = trmt; 
          * model prop_flower = trmt;
          random block;
      Run;
  4. Try to add more comments in your program.  This way you will remember what you did in a couple of months.

If I think of any more items, I will come back and add them at a later date.  If you have any suggestions, please drop me a line and I’ll add them in.

Name

SAS Workshop: GLIMMIX and non-Gaussian Distributions

One of the biggest advantages of using PROC GLIMMIX, is that the data coming into the analysis no longer needs to be normally distributed.  It can have a number of distributions and SAS can handle it.  Our job now is to be able to recognize when a normal distribution is NOT appropriate and which distribution is an appropriate starting place.   Non-Gaussian distributions are what these are referred to.  Remember Gaussian is the same as calling it Normal.

Where do we start?  Think about your data – what is it?

  • A percentage?
  • A count?
  • A score?

How do we know that our data is not from a normal distribution?

  • Always check your residuals!
  • Remember the assumptions of your analyses?
    • Normally distributed residuals is one of them!

Let’s work with the following example.  We have another RCBD trial with 4 blocks and 4 treatments randomly assigned to each block.  There were 2 outcome measures taken:   proportion of the plot that flowered, and the number of plants in each plot at the end of the trial.

Please copy and paste the attached code to create the SAS dataset on your computer.

We will work through the output and how/when you need to add the DIST= option to your MODEL statement.  We will also talk about the LINK= function and what it does.

Name

 

SAS Workshop: ANOVAs

ANOVAs or Analysis of Variance is one of the “classic” or standard statistical analyses that you will complete at one time or another during your research career.  The statistical methodology behind the ANOVAs has changed a great deal over the past 40-50 years and SAS has done its part by creating new PROCs to match the statistical advances.

This part of the SAS Workshop will start by reviewing the SAS PROCedures that were created and used over the years.  This will give you a better sense as to why we are using the newest PROCedure GLIMMIX and will hopefully provide you insight as to why some researchers are still using the older PROCedures GLM or MIXED.

History of ANOVA analyses in SAS

1966 – SAS is released with Proc ANOVA, which is to be used with:

  • balanced data ONLY!
  • FIXED effects ONLY!
  • NOTE from SAS Online Docs: “Caution:If you use PROC ANOVA for analysis of unbalanced data,you must assume responsibility for the validity of the results.

1976 – SAS released Proc GLM

  • balanced (Type I SS) and unbalanced (Type III SS)
  • RANDOM statement introduced – provides EMS (expected mean squares equations, but you need to do the calculations!)

1992 – Proc MIXED

  • RANDOM statement incorporated
  • REPEATED statement introduced
  • “Normally distributed” data ONLY
  • linear effects

1992 – Proc GENMOD

  • Non-normal data
  • Fixed effects ONLY

xxxx? – Proc NLMIXED

  • normal, binomial, Poisson distributions
  • nonlinear effects

2005 – Proc GLIMMIX

  • Proc MIXED
  • Proc NLMIXED
  • Non-normal data

 

Randomized Complete Block Design (RCBD)

We will start by analyzing the data collected from a small RCBD trial.  There were 4 blocks, where 6 treatments were randomly assigned to each.  To run these analyses, please copy and paste the following code into your SAS program.  There may be edits that you will need to make when you paste into your program.

Data rcbd;
  input block trmt Nitrogen;
  datalines;
1 1 34.98
1 2 40.89
1 3 42.07
1 4 37.18
1 5 37.99
1 6 34.89
2 1 41.22
2 2 46.69
2 3 49.42
2 4 45.85
2 5 41.99
2 6 50.15
3 1 36.94
3 2 46.65
3 3 52.68
3 4 40.23
3 5 37.61
3 6 44.57
4 1 39.97
4 2 41.9
4 3 42.91
4 4 39.2
4 5 40.45
4 6 43.29
;
Run;

Run this data and use a Proc Print to ensure that the data was read correctly.

Once we have the data in our SAS program, let’s start with PROC GLM:

/* Proc GLM Statements */
Proc glm data=rcbd;
    class block trmt;
    model Nitrogen = block trmt;
    random block;
    title “Proc GLM Results”;
Run;
Quit;

Here is a PDF copy of the output created by the above code.  I will review the output and the code that is used to generate it during the workshop.

Let’s move onto PROC MIXED:

/* Proc MIXED Statements with an LSMEANS for treatment differences */
Proc mixed data=rcbd;
    class block trmt;
    model Nitrogen = trmt;
    random block;
    title “Proc MIXED Results”;
Run;

Here is a PDF copy of the output created by the above code.  I will review the output and the code that is used to generate it during the workshop.

Now let’s do the same analysis for a third time using PROC GLIMMIX.  The code is:

/* Proc GLIMMIX Statements with an LSMEANS for treatment differences */
Proc glimmix data=rcbd;
    class block trmt;
    model Nitrogen = trmt;
    random block;
    title “Proc GLIMMIX Results”;
Run;

Here is a PDF copy of the output created by the above code.  I will review the output and the code that is used to generate it during the workshop.

So…  if you have used PROC MIXED in the past, moving to GLIMMIX is easy and highly recommended!

 

Name

SAS Workshop: Getting Comfortable with your data

PDF version of the workshop notes

Before we start any statistical analysis, we should really take a step back and get familiar and comfortable with our data.  “Playing” around with it to ensure that you know what’s in there.  This may sound funny, but getting comfortable with your data by running descriptive statistics really does two things:  One, you understand what’s been collected and how; and second, gives you the opportunity to review the data and find any errors in it.  Sometimes you may find an extra 1 added to the front of a number, or maybe a 6 instead of a 9, or any combinations of data entry errors.  By playing around with your data and getting comfortable with it before running your analysis, you may find some of these anomalies.

For this workshop, I will provide you with a starting SAS program, which you can download here.  You will be asked to type in the PROCs as we work through them, but if you would rather, you always have the option of copying them from this post and pasting them into your SAS editor or code window.  Please note, that there may be some nuances when you copy and paste.  Any ” will need to be changed in your SAS program!!!

My goals for this session are to review the following PROCedures:

  • Proc Contents
  • Proc Univariate
  • Proc Freq
  • Proc Means

PROC CONTENTS

PROC CONTENTS provides you with the backend information on your dataset.  One of the challenges in working with SAS, is that you do not have your dataset in front of you all the time.  You read it in and it gets sucked into what I call the “Blackbox of SAS”.  Sometimes we either what to see the data – to ensure it’s still there or simply to be comforted by the sight of it (we use PROC PRINT), or we want to see the contents of the dataset – so the formats of the variables and information about the dataset.

To do this we need to run a Proc CONTENTS on our file.  This is the equivalent of the Variable View in SPSS.

Proc contents data=woodchips;
Run;

What information were you able to see?  Information about the actual SAS datafile along with formatting information about the variables contained in the datafile.  View the output here as a PDF.

If you make changes to the variables along the way, or if you add labels, rerun the Proc CONTENTS to ensure the changes were applied.

PROC UNIVARIATE

Proc UNIVARIATE will be familiar to many of you as the PROC we use to see whether our data is normally distributed or not.  This is one use for this PROCedure, but it is also very handy to get a sense for your data.  It is one PROC that isn’t used to its full capability, in my opinion.

Let’s try running it as follows:

Proc univariate data=woodchips;
var weight;
Run;

Here is a link to the output saved as a PDF file.

As you review the output you can see the variety of descriptive statistics that this PROC provides you.  You should now have a very good feel for the data we are working with.

PROC FREQ

Proc FREQ is used to create frequencies and cross-tabulations.   In our dataset we only have one categorical variable, quality.  To create a frequency table use the following code:

Proc freq data=woodchips;
table quality;
Run;

Here is the link to the output saved as a PDF file.

Should you run a Proc FREQ on a variable such as weight?  Why or why not?

PROC MEANS

Proc MEANS is a fabulous and very versatile Proc to get a sense of your continuous variables, weight, in our example.  Let’s start with the overall mean by using this code:

Proc means data=woodchips;
var wood_weight;
Run;

Here is the link to the output saved as a PDF file.

Note the default measures – N, Mean, StdDev, Min, Max

To add other descriptive measures, list them at the end of the Proc MEANS statement.  For example, we want the standard error and the Sum:

Proc means data=woodchips mean stderr sum;
var wood_weight;
Run;

Here is the link to the output saved as a PDF file.

One last piece of code for Proc MEANS:  We want to see the means for each quality group.

Proc means data=woodchips;
class quality;
var wood_weight;
Run;

Here is the link to the output saved as a PDF file.

For more ways to use Proc MEANS, visit the following blog entry on SASsyFridays: