Agricultural Statistics

This workshop will walk you through the steps in SPSS to help you merge datasets, create new variables, and recode variables in your current dataset. Many of these functions are easily performed in Excel, but I’d like to show you how to take advantage of some of the data manipulation options available to you in SPSS. Pick the method that is the most comfortable for you, when working with your data.

Let’s start by downloading a dataset to use for this workshop, a dummy dataset I created in Excel. Download the file and save it on your computer. There are 4 worksheets in this file. Weight measures were taken on 25 individuals in January, February, and March. For the month of April we have an additional individuals with weight measures taken in January and April. Our goal is to create one SPSS dataset that contains all 4 Excel worksheets.

First, let’s 0pen each worksheet in SPSS and save them using the dataset names of January, February, March, and April. Remember that although the Excel file can hold several worksheets with different data, statistical packages such as SPSS and SAS, cannot bring in the entire file at the same time, you will need to bring in each worksheet separately.

You should now have 4 SPSS datasets saved on your computer called: January.sav, February.sav, March.sav, and April.sav

Merging Datasets

There are 2 ways to merge datasets in SPSS: Add Cases or Add Variables. I really like how SPSS states these, since it makes it clear as to how you are adding the data from one file to the next. Add Cases – means that you will be adding more observations to the current dataset. If you think about this, this means that you would be adding more respondents, more animals, more plots, etc… Essentially adding more observations that are unique to what is in your current dataset.

Add Variables means you have the same set of observational units, so respondents, animals, plots, etc… and you have new measurements that you want to add to your current dataset.

Adding Variables

Let’s start with our datasets – adding variables. We have 2 files January, February, and March that have the same 25 individuals and weight measures taken in 3 months. We want all of these to be in the same SPSS dataset so we can do an analysis across these 3 months.

What’s the first thing we need to do before adding February data to January? If we were to try it as it is – but could happen?

HINT: look at the variable names for the 2 files

Make the appropriate changes in the Variable View of SPSS.

Before we can merge any files, we need to ensure that all the datasets are sorted. It looks like they are currently sorted, but let’s double-check by getting SPSS to run a sort anyway. Sort all the datasets so we are all set for the next steps.

Data
Sort Cases
- Put ID into the Sort by: Box
- Sort Order – Ascending
OK

Let’s start with adding February data to January:

Make sure you are in the January file

Data
Merge Files
- Add Variables
- Since all of our datasets are open – you should see a list of the currently open files. Select February. If you closed all the datasets, the top box will be empty, then you will select External SPSS dataset and navigate to where you saved your file and open it
- Continue
  - Your new dialogue box – you should see all of the variables available in both datasets. * is the active dataset (January), + is the DataSet 2 or new combined dataset.
  - Review what variables are where. New Active Dataset is the new merged dataset – think about what you want to see included in here. Does it match your expectations? Notice the list of variables in the Excluded box – do these make sense? Notice that these are the ID and TRMT – variables which were duplicated in the 2 datasets.
  - We can add a key variable – the variable on which your variables is sorted by. Let’s do this to make sure weights are matched to the appropriate ID.
  - Select Match cases on key variables
    - Select that the variables are sorted in both files – Select both files provide cases – Then select ID from the Excluded box and add it to the Key Variables box – notice how it disappears from the new Active Dataset box too
Ok to run – you will get a warning message – click ok

Take a look at the January file. Does it look correct? Remember you can double-check by looking at the February file. If you are happy with the way it looked, save it under a new name – maybe Jan_Feb

Repeat this process to add the March dataset as well and save it as a file called Jan_Mar.sav

Close the February and March files, leaving the new Jan_Mar and April datasets open.

Adding Cases

So now we have a dataset with weight measures taken from January to March on the same individuals. Now we need to add the new observations that have weight measures taken in January and April – this is a great example of Adding Cases to a dataset.

With the Jan_Mar dataset open:

Data
Merge Files
- Add Cases
- As this first case above, because our April dataset is open, you will see it listed in the dialogue box
- Select April and Continue
  - You should now see a box that lists variables that are Paired and Unpaired. Essentially think of these as the list of variables that are unique and present in both files
  - Since we want ALL the variables in our new dataset, select all the unpaired Variables and put into the Variables in the New Active Dataset – our new dataset
  - OK

Review the dataset, is it what you were expecting?

If it is, save it as Jan_Apr.sav. Close the April dataset

Creating New Variables

Adding a new variable

Let’s start by creating a new variable called Wtgain that is the difference between the weight measured in January and the weight measured in March.

Transform
Compute Variable
- This will open a new dialogue box
- In the Target Variable enter a new variable name: wtgain
- In the Numeric Expression: enter the calculation – select the Weight_mar and place in this box – then add the “-” following by the Weight_jan variable. So it should read weight_mar – weight_jan
- OK

SPSS will add the new variable at the end of the current dataset. Review and decide whether is completed the action you were expecting. Since the original weight measures had no decimal places, let’s remove the decimal places added to the new variable. Add a label to this new variable.

Save the dataset.

Recoding a variable

Sometimes we have a variable that we want to recode – so in our case we are going to create a new variable called wtclass_jan that will take the weights measured in January and put them into 3 weight classes: 1 = 13-16; 2 = 17-20; 3 = 21-24

SPSS has 2 functions that allow us to recode variables. One is called, Recode into Same Variables… and the second, Recode into Different Variables… In the interest of NOT writing over any data, I recommend that you use the Recode Into Different Variables… option. As you work more and more with your data in SPSS, there may come a time when you may want to use the Recode Into Same Variables.. option, but understand that you will lose any data that you overwrite.

We want to create the new variable called wtclass_jan as described above. To do this:

Transform
Recode Into Different Variables…
- You will now see a dialogue box. Select the Weight_jan variable from the left box and throw it over to the Input Variable ->Output Variable box
- You will now provide a new name wtclass_jan in the Name box under the Output Variable section.
- Add an appropriate label – maybe something like January Weight Class Group
  - Change
  - you should now see weight_jan -> wtclass_jan in the middle box
- Now select Old and New Variables button
  - This is where you will tell SPSS that wtclass 1 contains any weight measures taken in January that include 13 – 16
  - There are different ways to do this – select the one that works best for you
  - Old value – will be the values in weight_jan – I will use Range – 13 – 16
  - New value – will be 1
  - Click Add
  - Create the next 2 groups
- Continue
OK

Remember SPSS creates any new variable and adds it to the end of the current dataset. We have a small dataset, so it is easy to find, but when you start to work with your own data, you may have 100s of variables – so remember that the new variables are added to the end.

Check the new wtclass_jan variable to see if it worked.

Save your dataset

What can SPSS do?

There are many data manipulations that can be performed in SPSS. Whether you do these in Excel or SPSS it does not matter. Document any changes you made to remember what you did.

Name

Last time we met we took a closer look at bar charts and how we can create bar charts in the different software packages we have access to at the University of Guelph. This time I’d like to change gears a bit and take a look at qualitative data and some of the options that are available to visualize this type of data.

First, qualitative data is a broad term, and the more I work in the field of data & statistics, the more I learn how people define and use the term “qualitative data”. So, I am not going to try to define this term for this session, but rather concentrate on how we can use different types of visualization for this broadly defined type of data.

Resources, there are a large number of resources available to you to help you determine what the best option for your visualization is. I will highlight a couple that I’m currently using. If you are using others, please let me know and I can add them to the list of resources on this site.

Chart Suggestions – Chart suggestions presented by Andrew Abela from Extreme Presentations. A decision chart based on the types of data you are using and the story you want to tell or visualize.

Qualitative Chart Chooser 3.0 by Jennifer Lyons and Stephanie Evergreen. I just came across this wonderful resource. I really like their approach of “What story are you trying to tell?” Let’s work through a couple of examples and discuss your thoughts on this resource.

Books: Data Visualization, A Handbook for Data Driven Design. Andy Kirk (2016).

As you peruse these resources, the one thing that you will notice is that you have to be very comfortable with the type of data you are working with. A quick review:

Categorical:
- Nominal – groups or levels that do you have any relationship between them or any order
  - Examples may include: gender, religion, Yes/No
- Ordinal – groups or levels that have an order to them
  - Examples may include: level of education, size, Likert scales
Continuous, Scale, Interval, Ratio
- Data that was collected on a scale, interval, or ratio
  - Examples may include: age, weight, temperature, weight gain

Bring examples of qualitative data to this session and we will look at possible options to visualize the data, the pros, and the cons.

Name

Agricultural Statistics

R: RStudio and Importing Data

Andrew

ARCHIVE: W18 SPSS Workshop: Merging datasets and creating new variables