ARCHIVE: W18 SPSS Workshop: Creating Charts

Chart Builder in SPSS

Numbers and statistics can be fun, but sometimes putting these numbers into context with a chart or graph may reach a broader audience of understanding.  What do I mean by that?  How many of you will remember a number vs how many of you will remember a graph that shows a trend?

Building charts in SPSS is quite straightforward.  The dataset we will use for this workshop is one of the many Sample datasets that accompany your SPSS program.  For ease of this workshop, I’ve saved the DEMO dataset as an Excel file.  Please download this file and open it in your SPSS program.

Let’s start by creating a barchart for our job satisfaction variable.  We want to see a bar for each level and we want to see the count.

In SPSS:

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Notice on bottom half – a gallery of all the different types of charts you can create in SPSS.
    • We want a simple barchart
      • Select bar
      • Then double-click on the first barchart listed
      • Once you do this you should see the skeleton of a bar chart appear in the top half of your dialogue box.
      • All you need to do now, is to drag and drop the variables where they are appropriate.
      • For this example:
        • Select Job satifisfaction and drag it to the x-axis
        • On the right, you may see an Element Properties dialogue box (if you do not see this – Click on the Element Properties button to open it).
        • Note that under Statistics, Count is selected – this is what we want.  But click on this to see what other statistics are available.
      • To create the graph Click OK

You should now see a very plain barchart with frequencies.

Let’s create a chart that shows the average income for each level of job satisfaction.  I’m curious to see whether the folks that are not satisfied with their job have a lower average income.

So, let’s start this again:

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Select Barchart again
    • Drag and drop Job Satisfaction to the x-axis
    • Now drag and drop Household income to the y-axis
    • Notice how the Statistic changed to Mean.  This is what we want.
    • Let’s run in by clicking OK

Hmm…  now that’s an interesting graph!

One last piece missing from this graph – error bars!  Whenever you have charts with means, you should ALWAYS provide some measure of variance.  So let’s add some error bars and we’ll try standard error.

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Select Barchart again
    • Drag and drop Job Satisfaction to the x-axis
    • Now drag and drop Household income to the y-axis
    • Ensure that the statistic is mean
    • Under the statistics box in the Element Properties box, check the Display Error Bars box
      • Now you have a few options, as stated above let’s use the Standard Error option – select Standard Error
      • Click Apply
    • Click OK to run chart

Providing the error bars gives the reader a “fuller” picture of the data.  Although in this case it does not change the story!

Try:

  1. Create a barchart that shows the mean household income by job satisfaction for the 2 levels of marital status.  Be sure to include error bars.
  2. What question does this barchart answer?

More types of charts

We’ll investigate different types of charts based on what you are looking for.

Name

ARCHIVE: W18 SPSS Workshop: T-tests, ANOVAs, and GLMs

This week we’ll take a quick tour of the classic T-tests, ANOVAs, and GLMs in SPSS.  The dataset we will use will match up with the one that was used in the SAS workshop.  Please download and open this Excel file into SPSS.  This data is fictional and contains 4 variables:

  • Field:  A nominal piece of information indicating what field the data was collected in
  • Block:  A nominal piece of information indicating which block the data was collected in
  • Trmt:  Treatment – a nominal piece of data indicating which treatment was applied
  • Nitrogen:  A continuous or scale piece of data indicating the amount of N expressed in the sample taken

This trial was designed as a Random Complete Block Design and should be analyzed as such.  However, to showcase the t-test in SPSS we will take a step back and play with the data to start.

T-test

SPSS makes it easy for us to conduct t-tests on our data.  If you go to file menu  Analyze -> Compare Means   you will see 3 different types of t-tests available to you.  You should be comfortable with each one in order to be able to choose the correct one for your analysis.

Quick review of the different t-tests and examples of each

One sample t-test:  This test will compare the mean of your data to 1 value.  Examples of this may be – you have collected %protein data on a number of different brands of adult dog food.  The recommended %protein in adult dog feed is 25% and you want to check whether your samples are equal to the recommended amount.  For this type of test you would use a One-sample t-test.

Independent Samples t-test:  This test allows you to compare 2 means from 2 independent groups.  Examples:  Average age between males and females.

Paired-Samples t-test:  This test allows you to compare 2 means taken on the same experimental units.  Examples: Average weight before a treatment and average weight after a treatment on the same 20 experimental units.

These 3 tests are used primarily when the outcome variable you are testing is continuous or scale.  There are similar t-test equivalents for outcome variables that may not follow that normal distribution.  These are:

  • Independents samples:  Mann-Whitney, Kolmogorov-Smirnov Z
  • Paired_samples:  Wilcoxon, Sign, McNemar

Independents samples t-test

With our sample data – we have a variable called Field.  We want to see whether there are any differences between the 2 fields where the data was collected.  Please note – that we would NOT do this for our trials, we are only doing this for the purposes of this workshop.  Ideally we would have a separate dataset that would be more appropriate, but in the interest of efficiency, I have chosen to use the same dataset and create a fictitious variable for demonstration purposes only.!!!

To conduct the Independent Samples t-test:

  • Analyze
    • Compare Means
      • Independent Samples T Test
      • Select Nitrogen as the Test variable and Field as the Grouping Variable
        Notice that the Grouping Variable has ?, ? – you need to tell SPSS what the values for Field are.  Click on Define Groups and set 1 as Group1 and 2 as Group2.  Note that you would put in the values that are represented in your dataset – ours just happen to be “1” and “2”
      • Continue
    • OK

In the output window you should now see 2 tables.  The first one displays the mean, standard deviation, and standard error for the Nitrogen variable for each group – so each Field.

The second table provides the t-test results.  Note that the first half of this table contains the Levene’s test for equality of variances.  One of the assumptions of a t-test, is that the variation of your outcome variance in both groups is equal.  Lucky for us, SPSS provides us with t-test results for the situation where we have equal variances and when we do not have equal variances.  In our case, the Levene’s test tells us that we have equal variation between our groups – p= 0.840 which means we accept our Null hypothesis that the variation between our groups is equal.  We also see that there are indeed differences in Nitrogen between our 2 fields = < 0.0001 – note that the output says p= 0.000 because it only shows the first 3 digits.  P is NEVER = 0!!

One-way ANOVA

Now let’s assume that we conducted a Completely Randomized Design (CRD) where we randomly selected our experimental units and placed 4 onto each of the 6 treatments.  If this was our experimental design then we would conduct a One-way ANOVA.  There are 2 ways to do this in SPSS.  Here is the first method:

  • Analyze
    • Compare Means
      • One-way ANOVA
      • Select Nitrogen as your Dependent Variable and Treatment as your Factor
    • OK

Your output window should provide with the matching ANOVA table.  In our example here, the Between Groups is non-significant with a p-value=0.959.  The table shows us our Within Group SS, df, and MS.

The second method:

  • Analyze
    • General Linear Model
      • Univariate
      • Select Nitrogen as your Dependent Variable and Treatment as your Fixed Factor
    • OK

Your output window will now provide you with 2 tables.  The first is a Between-Subjects Factor table – showing you where your observations are in relation to the fixed effect of treatment.  In our example we can confirm that we have 4 observations (experimental units) on each of the 6 treatments.  This is a great way of checking that SPSS has read your data correctly.

Your second table is the ANOVA table – labelled Tests of Between-Subjects Effects.  Notice that the 1st ANOVA table you saw above matches this one, but it provides more information.  The Intercept – which is the overall mean.  The same conclusions are drawn from this table than the One-way ANOVA table.  I would recommend that you perform any ANOVAs using this method.

General Linear Model

So we know that our data was collected by implementing an RCBD, and we have a variable called Block in our dataset that is a RANDOM effect.  How do we implement this aspect in SPSS?

The proper statistical model is:

Statistical model used in an RCBD

To do this in SPSS:

  • Analyze
    • General Linear Model
      • Univariate (because we only have 1 outcome variable we are working with)
      • Select Nitrogen as your Dependent Variable and Treatment as your Fixed Factor
      • Now we will add our Block variable as our Random Factor
        • Model
        • Notice that by default SPSS is using a Full Factorial model.  As a basic RCBD we only want the main effects of Treatment and Block included in our model
        • Select Custom at the top.  Then select both Trmt and Block – Select Main Effects in the Build Term(s) dropdown menu and click the arrow to place the 2 main effects in the Model: box
        • Continue
    • OK

In our output window you should now see 3 tables.  The first one – Between Subjects Factor, lists the Treatments and the Blocks.  Note that you have 6 observations in each block.

The second table presents our Tests of Between-Subjects Effects or our ANOVA table.  Notice that each factor in our model lists the Hypothesis and the Error.  This is because of our model.  In our model – the error term has been corrected for the 2 effects in our model.  Note that the p-value for Treatment is different from our fixed effects model p=0.787 – the model now incorporates our random Block factor – so it has adjusted or accounted for the variation due to Block before looking at the Treatment differences.

PostHoc tests

In our example, our treatments were not significant, therefore the means among our 4 treatments did not differ – no need to run any PostHoc or means comparisons tests.  However, you should know how to run these in case you research data shows otherwise.  To conduct PostHoc tests, we will do these on our Treatments for demonstration purposes, select the following :

Analyze

  • General Linear Model
    • Univariate (because we only have 1 outcome variable we are working with)
    • Select Nitrogen as your Dependent Variable and Treatment as your Fixed Factor
    • Now we will add our Block variable as our Random Factor
      • Model
        • Notice that by default SPSS is using a Full Factorial model.  As a basic RCBD we only want the main effects of Treatment and Block included in our model
        • Select Custom at the top.  Then select both Trmt and Block – Select Main Effects in the Build Term(s) dropdown menu and click the arrow to place the 2 main effects in the Model: box
      • PostHoc
        • Select the Trmt Factor from the left hand box and add to the PostHoc Tests for box.  Once you do this the tests in the bottom half of this dialogue box become available.
        • Select Tukey
      • Continue
    • OK

You will now have 2 additional tables in your output.  The first one shows you each pairwise combination of treatments along with a difference, a standard error for the difference, a p-value, and 95% confidence limits for the difference.  The bottom table summarizes this table.

NOTE:  if you only have 2 levels in your treatment or fixed effect factor, SPSS will NOT run the PostHoc tests.  It’s telling you that if the ANOVA says they’re different – then it doesn’t have to run the extra test because you already know the answer.

Conclusion

This workshop reviewed the use of t-tests, one-way ANOVA, and a GLM in SPSS.  As an FYI, there is a lot of talk about GLIMMIX in the SAS side of the house and SPSS can do similar analyses – I will propose a workshop in the upcoming Summer session that will showcase GLMMs in SPSS.

Remember your research question when conducting any analysis and match the analysis to your research question – always!!

Name

ARCHIVE: W18 SPSS Workshop: Getting Comfortable with your Data

Before we start any statistical analysis, we should really take a step back and get familiar and comfortable with our data.  “Playing” around with it to ensure that you know what’s in there.  This may sound funny, but getting comfortable with your data by running descriptive statistics really does two things:  One, you understand what’s been collected and how; and second, gives you the opportunity to review the data and find any errors in it.  Sometimes you may find an extra 1 added to the front of a number, or maybe a 6 instead of a 9, or any combinations of data entry errors.  By playing around with your data and getting comfortable with it before running your analysis, you may find some of these anomalies.

For this workshop, we will use a fictitious dataset looking at 25 samples of woodchips, their weight and a quality score for the woodchips within each sample.  Please download the dataset here.  Once you have downloaded the Excel file, open it into SPSS.

My goals for this session are to review the use of the Descriptive Statistics in SPSS and some file information.

DATA FILE INFORMATION

When you receive a file from a colleague, labmate, website, or repository, it is often very handy to take at the Data File Information, to give you a sense as to what is contained in the file.  To accomplish this follow these steps:

  • File
    • Display Data File Information
    • Working File – which is the file that is currently open in SPSS

The data file information will now be available in the SPSS  Statistics Viewer.  Notice that the information is very similar to what we see in the Variable View, with the exception of the last 2 columns:  Print Format and Write Format.  These two columns show us the internal formatting of the variables.  Note that they are and should be the same for each variable.  The PRINT format is the format of the variable for output.  To change either FORMAT you will need to use the FORMATS command.  For more information on this please visit this page on the IBM Knowledge Center.

If there are any values set up in the dataset, the data file information will provide you with a small table with the values and their respective labels.  To test this out add the following labels to the Quality variable:

1 = Low Quality
2 = Regular Quality
3 = High Quality
4 = Exceptional Quality

Once you’ve added these to your dataset, save it on your computer, and try running the Data File Information again to see how the output changes.

Descriptive Statistics

Descriptive statistics are essentially that – they describe your data, or they summarize your data to give you a good, solid base understanding of what you have collected.  The type of descriptive statistics you will conduct will depend on the type of variable you have.  Remember the 3 types of variables that SPSS distinguishes between?

  • Scale – a continuous piece of information, also referred to as Interval or Ratio.  Examples: age, weight, height
  • Nominal – a categorical piece of data – there is NO relationship between the categories.  Examples:  religion, colour, gender
  • Ordinal – a categorical piece of data – this time there is a relationship or order to the categories.  Examples:  Year of study, age group, likert scales

Each of these data types will use a different type of descriptive statistic.  For instance, calculating the mean of colour makes no sense at all, but a frequency count of colour does work.

Frequency

To calculate the frequency of a categorical variable (nominal OR ordinal) in SPSS:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
      • As an example, select Quality
    • Click OK to run

You should now have a frequency table of the variable, Quality

The lists the categories of the variable.  If you had not provided the value labels, you would see 1; 2; 3; 4 as the categories with no explanation as to what they represent.

The table lists Frequency – actual count of observation in each category; Percent – percent of observations as a total; Valid Percent – this will change if you have missing observations.  The Valid Percent is the percentage of observations that have values for Income Category; Cumulative Percent.

Mode

Mode is the value in the data that appears the most.  When you run the frequency you have a table that shows you the 5 levels of wood quality:

  • Low Quality = 5
  • Regular Quality = 6
  • High Quality = 8
  • Exceptional Quality = 6

By looking at these results I can see that High Quality appears to be the category that was selected the most.  But let’s get SPSS to do the hard work for us and confirm whether this is correct or not.

To obtain the MODE of a variable:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
    • Click on the Statistics button on the right
      • Select Mode
      • Click Continue
      • Click OK

You should now see the Mode in the first table of the Frequency output.

Median

The median of a variable, is the middle value.  So if you have an even number of categories, there will be no median or middle value, but if you have an odd number you will see it.

To obtain the MEDIAN in SPSS, follow the same instructions as the MODE, but select the MEDIAN in the Statistics dialogue box.

Mean

The mean or average is calculated on a scale variable or continuous variable.  It just doesn’t make sense to calculate the mean of a categorical variable.

To obtain the MEAN in SPSS:

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
      • Click OK to run

You should now have a table with N, Minimum, Maximum, Mean, and Standard Deviation for the household income variable.  These are the default values you obtain when you run this analysis.  But, what happens if you want the Sum or the Standard Error of this variable?

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
    • Select the Options button – this will open another dialogue box that has a list of statistics to select from
      • Select Sum and S.E. mean (standard error of the mean)
    • Click Continue
    • Click OK to run

Your output table will now contain these added statistics.

Explore Function in SPSS

Sometimes you may want to determine what the mean household income by marital status or by another categorical variable.  Till now, we’ve been looking at the entire dataset.  There are a few ways to do this, but the most direct way is to use the Explore function in SPSS.

  • Analyze
  • Descriptive Statistics
  • Explore
    • In the Dependent List box, add the variables for which you would like to calculate the means
    • In the Factor List box, add the variable by which you would like to see the means for – for example: Quality
    • Click Ok to run.

You will now see a much larger table than we have seen to date.  SPSS provides you with a long list of descriptive statistics for wood chip weight by each quality category.

You will also see a Stem and Leaf plot along with a Boxplot to provide you with a sense of the distribution of the data.  More information to help you get a better feeling for the data that you are working with.

Summary

The common descriptive statistics that are used include: frequency, median, mode, mean, and measures of variation (standard deviation, standard error, etc..).  Each of these statistics should be run on the appropriate types of data – keep in mind, that a frequency on a variable such as age will give you a long table with meaningless information.

SPSS OUTPUT WINDOW

As we’ve been working along, you’ve already noticed that all the output or results can be found in a second window – referred to as the SPSS Statistics Viewer window.  If you want to save your work here, using the File -> Save or Save As option will save the entire output window as an .SPV file which is an SPSS format.  This means that if you want to re-open this file you must have SPSS installed on your computer.

If you only want to save a table or a chart, you have a couple of options:

  1. Export the parts you want to save as a Word, Excel, PDF, amongst a few more options.  To accomplish this, follow these steps:
    • select the tables, graphs that you want to export
    • File
      • Export… you should see a new dialogue box open.
        • At the top, ensure that you select “Selected”.  If you leave it as the default ALL, you will be exporting everything in the SPSS output window including the Notes for each analysis.
        • Select the Type of Document you wish to export to – PDF, Excel, etc…
        • Select the location and name for the file you will be exporting in the File Name box
        • Click OK to run
        • This will result in a new file in the location you set out – with the SPSS results you selected.
  2. Copy and Paste
    • This is probably the easiest way to save the tables or charts you want.  On a WINDOWS computer, simply select the table or chart, Copy (either by using the Menubar option or Ctrl-C), move to the document you want to paste the results into – Word, PPT, Excel, etc..  and Paste (either by using the Menubar option or Ctrl-V).
    • On a MAC, you will need to use the Menubar option and select Copy Special and check Image.  Move to the document you want the selected table or graph and Paste or Cmd-V.

Name

 

 

ARCHIVE: W18 SPSS Workshop: Merging datasets and creating new variables

This workshop will walk you through the steps in SPSS to help you merge datasets, create new variables, and recode variables in your current dataset.  Many of these functions are easily performed in Excel, but I’d like to show you how to take advantage of some of the data manipulation options available to you in SPSS.  Pick the method that is the most comfortable for you, when working with your data.

Let’s start by downloading a dataset to use for this workshop, a dummy dataset I created in Excel.  Download the file and save it on your computer.  There are 4 worksheets in this file.  Weight measures were taken on 25 individuals in January, February, and March.  For the month of April we have an additional individuals with weight measures taken in January and April.  Our goal is to create one SPSS dataset that contains all 4 Excel worksheets.

First, let’s 0pen each worksheet in SPSS and save them using the dataset names of January, February, March, and April.  Remember that although the Excel file can hold several worksheets with different data, statistical packages such as SPSS and SAS, cannot bring in the entire file at the same time, you will need to bring in each worksheet separately.

You should now have 4 SPSS datasets saved on your computer called:  January.sav, February.sav, March.sav, and April.sav

Merging Datasets

There are 2 ways to merge datasets in SPSS:  Add Cases or Add Variables.  I really like how SPSS states these, since it makes it clear as to how you are adding the data from one file to the next.  Add Cases – means that you will be adding more observations to the current dataset.  If you think about this, this means that you would be adding more respondents, more animals, more plots, etc…  Essentially adding more observations that are unique to what is in your current dataset.

Add Variables means you have the same set of observational units, so respondents, animals, plots, etc… and you have new measurements that you want to add to your current dataset.

Adding Variables

Let’s start with our datasets – adding variables.  We have 2 files January, February, and March that have the same 25 individuals and weight measures taken in 3 months.  We want all of these to be in the same SPSS dataset so we can do an analysis across these 3 months.

What’s the first thing we need to do before adding February data to January?  If we were to try it as it is – but could happen?

HINT:  look at the variable names for the 2 files

Make the appropriate changes in the Variable View of SPSS.

Before we can merge any files, we need to ensure that all the datasets are sorted.  It looks like they are currently sorted, but let’s double-check by getting SPSS to run a sort anyway.  Sort all the datasets so we are all set for the next steps.

  • Data
  • Sort Cases
    • Put ID into the Sort by: Box
    • Sort Order – Ascending
  • OK

Let’s start with adding February data to January:

Make sure you are in the January file

  • Data
  • Merge Files
    • Add Variables
    • Since all of our datasets are open – you should see a list of the currently open files.  Select February.  If you closed all the datasets, the top box will be empty, then you will select External SPSS dataset and navigate to where you saved your file and open it
    • Continue
      • Your new dialogue box – you should see all of the variables available in both datasets.  * is the active dataset (January), + is the DataSet 2 or new combined dataset.
      • Review what variables are where.  New Active Dataset is the new merged dataset – think about what you want to see included in here.  Does it match your expectations?  Notice the list of variables in the Excluded box – do these make sense?  Notice that these are the ID and TRMT – variables which were duplicated in the 2 datasets.
      • We can add a key variable – the variable on which your variables is sorted by.  Let’s do this to make sure weights are matched to the appropriate ID.
      • Select Match cases on key variables
        • Select that the variables are sorted in both files – Select both files provide cases – Then select ID from the Excluded box and add it to the Key Variables box – notice how it disappears from the  new Active Dataset box too
  • Ok to run  – you will get a warning message – click ok

Take a look at the January file.  Does it look correct?  Remember you can double-check by looking at the February file.  If you are happy with the way it looked, save it under a new name – maybe Jan_Feb

Repeat this process to add the March dataset as well and save it as a file called Jan_Mar.sav

Close the February and March files, leaving the new Jan_Mar and April datasets open.

Adding Cases

So now we have a dataset with weight measures taken from January to March on the same individuals.  Now we need to add the new observations that have weight measures taken in January and April – this is a great example of Adding Cases to a dataset.

With the Jan_Mar dataset open:

  • Data
  • Merge Files
    • Add Cases
    • As this first case above, because our April dataset is open, you will see it listed in the dialogue box
    • Select April and Continue
      • You should now see a box that lists variables that are Paired and Unpaired.   Essentially think of these as the list of variables that are unique and present in both files
      • Since we want ALL the variables in our new dataset, select all the unpaired Variables and put into the Variables in the New Active Dataset – our new dataset
      • OK

Review the dataset, is it what you were expecting?

If it is, save it as Jan_Apr.sav.  Close the April dataset

Creating New Variables

Adding a new variable

Let’s start by creating a new variable called Wtgain that is the difference between the weight measured in January and the weight measured in March.

  • Transform
  • Compute Variable
    • This will open a new dialogue box
    • In the Target Variable enter a new variable name: wtgain
    • In the Numeric Expression: enter the calculation – select the Weight_mar and place in this box – then add the “-” following by the Weight_jan variable.  So it should read  weight_mar – weight_jan
    • OK

SPSS will add the new variable at the end of the current dataset.  Review and decide whether is completed the action you were expecting.  Since the original weight measures had no decimal places, let’s remove the decimal places added to the new variable.  Add a label to this new variable.

Save the dataset.

Recoding a variable

Sometimes we have a variable that we want to recode – so in our case we are going to create a new variable called wtclass_jan that will take the weights measured in January and put them into 3 weight classes:  1 = 13-16; 2 = 17-20;  3 = 21-24

SPSS has 2 functions that allow us to recode variables.  One is called, Recode into Same Variables… and the second, Recode into Different Variables…  In the interest of NOT writing over any data, I recommend that you use the Recode Into Different Variables… option.  As you work more and more with your data in SPSS, there may come a time when you may want to use the Recode Into Same Variables.. option, but understand that you will lose any data that you overwrite.

We want to create the new variable called wtclass_jan as described above.  To do this:

  • Transform
  • Recode Into Different Variables…
    • You will now see a dialogue box.  Select the Weight_jan variable from the left box and throw it over to the Input Variable ->Output Variable box
    • You will now provide a new name wtclass_jan in the Name box under the Output Variable section.
    • Add an appropriate label – maybe something like January Weight Class Group
      • Change
      • you should now see weight_jan -> wtclass_jan in the middle box
    • Now select Old and New Variables button
      • This is where you will tell SPSS that wtclass 1 contains any weight measures taken in January that include 13 – 16
      • There are different ways to do this – select the one that works best for you
      • Old value – will be the values in weight_jan – I will use Range – 13 – 16
      • New value – will be 1
      • Click Add
      • Create the next 2 groups
    • Continue
  • OK

Remember SPSS creates any new variable and adds it to the end of the current dataset.  We have a small dataset, so it is easy to find, but when you start to work with your own data, you may have 100s of variables – so remember that the new variables are added to the end.

Check the new wtclass_jan variable to see if it worked.

Save your dataset

What can SPSS do?

There are many data manipulations that can be performed in SPSS.  Whether you do these in Excel or SPSS it does not matter.  Document any changes you made to remember what you did.

Name

 

 

 

SPSS: Descriptive Statistics and Charts

Last week in the RPD6380 class, we talked about how to enter data into an Excel spreadsheet, and then opening it into the SPSS software.

As a recap:

  • Create variable names at the top of each column in Excel to match your variables/questions.  Use the Best Practices to naming your variables:
    • Keep it short (Maximum 32 characters in SPSS)
    • Start with a letter – can contain numbers
    • NO funny characters – %,$,#, etc…
    • NO blank spaces – use an _ if you want
  • Save your file in Excel
  • To open in SPSS
    • File
    • Open Data
    • Navigate to where you saved your file
    • Change File Type to Excel
    • Select your file and click Open
    • Answer the questions in the dialogue box
  • Make sure you save your data in SPSS

Variable Labels

In the Variable View of SPSS, take the time to fill in the Labels for each variable.  This way you won’t have to remember what those shortened variable names are in a couple of months or years.

Value Labels

We worked through an exercise where we coded some of our data.  Males/Females were m/f or 1/2.  Be sure to add all these labels in the Values section of the Variable View in SPSS. The time you spend doing this at the start of your research will save you a LOT of time when you do your analysis.

Missing Values

Be sure to add any missing codes to the Missing column in the Variable View

Descriptive Statistics

At the beginning of any statistical analysis, learning more about your data is a great place to start.  Descriptive statistics are essentially that – they describe your data, or they summarize your data to give you a good, solid base understanding of what you have collected.  The type of descriptive statistics you will conduct will depend on the type of variable you have.  Remember the 3 types of variables that SPSS distinguishes between?

  • Scale – a continuous piece of information, also referred to as Interval or Ratio.  Examples: age, weight, height
  • Nominal – a categorical piece of data – there is NO relationship between the categories.  Examples:  religion, colour, gender
  • Ordinal – a categorical piece of data – this time there is a relationship or order to the categories.  Examples:  Year of study, age group, likert scales

Each of these data types will use a different type of descriptive statistic.  For instance, calculating the mean of colour makes no sense at all, but a frequency count of colour does work.

Frequency

To calculate the frequency of a categorical variable (nominal OR ordinal) in SPSS:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
      • As an example, select Income Category
    • Click OK to run

You should now have a frequency table of the variable, Income Category

The lists the categories of the variable, in this case: Below $25; $25-$49; $50-$74; $75+.  If you had not provided the value labels, you would see 1; 2; 3; 4 as the categories with no explanation as to what they represent.

The table lists Frequency – actual count of observation in each category; Percent – percent of observations as a total; Valid Percent – this will change if you have missing observations.  The Valid Percent is the percentage of observations that have values for Income Category; Cumulative Percent.

Try:

  • Run the Frequency procedure on the variable called Internet
  • Can you describe what you see?

Mode

Mode is the value in the data that appears the most.  So let’s switch variables and run a frequency on the variable Job Satisfaction.  When you run the frequency you have a table that shows you how many people answered each of the 5 levels of this Likert Scale:

  • Highly dissatisfied = 1109
  • Somewhat dissatisfied = 1268
  • Neutral = 1393
  • Somewhat satisfied = 1406
  • Highly satisfied =1224

By looking at these results I can see that Somewhat satisfied appears to be the category that people selected the most.  But let’s get SPSS to do the hard work for us and confirm whether this is correct or not.

To obtain the MODE of a variable:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
      • As an example, select Job Satisfaction
    • Click on the Statistics button on the right
      • Select Mode
      • Click Continue
      • Click OK

You should now see the Mode in the first table of the Frequency output.

Try:

  1. What is the mode?
  2. Would you calculate the MODE on a variable such as income?  Why or Why not?

Median

The median of a variable, is the middle value.  So if you have an even number of categories, there will be no median or middle value, but if you have an odd number you will see it.

To obtain the MEDIAN in SPSS, follow the same instructions as the MODE, but select the MEDIAN in the Statistics dialogue box.

Try:

  1. What is the median for Job Satisfaction?
  2. What is the median value for Level of Education?

Mean

The mean or average is calculated on a scale variable or continuous variable.  It just doesn’t make sense to calculate the mean of a categorical variable.

To obtain the MEAN in SPSS:

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
      • use income as an example
      • Click OK to run

You should now have a table with N, Minimum, Maximum, Mean, and Standard Deviation for the household income variable.  These are the default values you obtain when you run this analysis.  But, what happens if you want the Sum or the Standard Error of this variable?

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
    • Select the Options button – this will open another dialogue box that has a list of statistics to select from
      • Select Sum and S.E. mean (standard error of the mean)
    • Click Continue
    • Click OK to run

Your output table will now contain these added statistics.

Try:

  1. Select another Scale variable from your dataset and calculate the mean, variance, and standard deviation.

Explore Function in SPSS

Sometimes you may want to determine what the mean household income by marital status or by another categorical variable.  Till now, we’ve been looking at the entire dataset.  There are a few ways to do this, but the most direct way is to use the Explore function in SPSS.

  • Analyze
  • Descriptive Statistics
  • Explore
    • In the Dependent List box, add the variables for which you would like to calculate the means – for example:  household income
    • In the Factor List box, add the variable by which you would like to see the means for – for example: marital status
    • Click Ok to run.

You will now see a much larger table than we have seen to date.  SPSS provides you with a long list of descriptive statistics for household income by each level of marital status.

You will also see a Stem and Leaf plot along with a Boxplot to provide you with a sense of the distribution of the data.  More information to help you get a better feeling for the data that you are working with.

Summary

The common descriptive statistics that are used include: frequency, median, mode, mean, and measures of variation (standard deviation, standard error, etc..).  Each of these statistics should be run on the appropriate types of data – keep in mind, that a frequency on a variable such as age will give you a long table with meaningless information.

Chart Builder in SPSS

Numbers and statistics can be fun, but sometimes putting these numbers into context with a chart or graph may reach a broader audience of understanding.  What do I mean by that?  How many of you will remember a number vs how many of you will remember a graph that shows a trend?

Building charts in SPSS is quite straightforward and fun!  You’ll see!!

Let’s start by creating a barchart for our job satisfaction variable.  We want to see a bar for each level and we want to see the count.

In SPSS:

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Notice on bottom half – a gallery of all the different types of charts you can create in SPSS.
    • We want a simple barchart
      • Select bar
      • Then double-click on the first barchart listed
      • Once you do this you should see the skeleton of a bar chart appear in the top half of your dialogue box.
      • All you need to do now, is to drag and drop the variables where they are appropriate.
      • For this example:
        • Select Job satifisfaction and drag it to the x-axis
        • On the right, you may see an Element Properties dialogue box (if you do not see this – Click on the Element Properties button to open it).
        • Note that under Statistics, Count is selected – this is what we want.  But click on this to see what other statistics are available.
      • To create the graph Click OK

You should now see a very plain barchart that matches the Frequency counts we created earlier.

Let’s create a chart that shows the average income for each level of job satisfaction.  I’m curious to see whether the folks that are not satisfied with their job have a lower average income.

So, let’s start this again:

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Select Barchart again
    • Drag and drop Job Satisfaction to the x-axis
    • Now drag and drop Household income to the y-axis
    • Notice how the Statistic changed to Mean.  This is what we want.
    • Let’s run in by clicking OK

Hmm…  now that’s an interesting graph!

One last piece missing from this graph – error bars!  Whenever you have charts with means, you should ALWAYS provide some measure of variance.  So let’s add some error bars and we’ll try standard error.

  • Graphs
  • Chart Builder – this will open a dialogue box
    • Select Barchart again
    • Drag and drop Job Satisfaction to the x-axis
    • Now drag and drop Household income to the y-axis
    • Ensure that the statistic is mean
    • Under the statistics box in the Element Properties box, check the Display Error Bars box
      • Now you have a few options, as stated above let’s use the Standard Error option – select Standard Error
      • Click Apply
    • Click OK to run chart

Providing the error bars gives the reader a “fuller” picture of the data.  Although in this case it does not change the story!

Try:

  1. Create a barchart that shows the mean household income by job satisfaction for the 2 levels of marital status.  Be sure to include error bars.
  2. What question does this barchart answer?

More charts

I used the example of a barchart, but the more you use the ChartBuilder, you can see how straightforward it is to create charts in SPSS.  Try playing around with a different chart and see what happens.

Summary

  • Barchart for counts
  • Barchart to show means of groups
  • Side-by-side barchart to show means of group

 

Name