Come and join us as we continue ‘R’ advertures!

Come and join us as we continue ‘R’ advertures!

This workshop will walk you through the steps in SPSS to help you merge datasets, create new variables, and recode variables in your current dataset. Many of these functions are easily performed in Excel, but I’d like to show you how to take advantage of some of the data manipulation options available to you in SPSS. Pick the method that is the most comfortable for you, when working with your data.
Let’s start by downloading a dataset to use for this workshop, a dummy dataset I created in Excel. Download the file and save it on your computer. There are 4 worksheets in this file. Weight measures were taken on 25 individuals in January, February, and March. For the month of April we have an additional individuals with weight measures taken in January and April. Our goal is to create one SPSS dataset that contains all 4 Excel worksheets.
First, let’s 0pen each worksheet in SPSS and save them using the dataset names of January, February, March, and April. Remember that although the Excel file can hold several worksheets with different data, statistical packages such as SPSS and SAS, cannot bring in the entire file at the same time, you will need to bring in each worksheet separately.
You should now have 4 SPSS datasets saved on your computer called: January.sav, February.sav, March.sav, and April.sav
There are 2 ways to merge datasets in SPSS: Add Cases or Add Variables. I really like how SPSS states these, since it makes it clear as to how you are adding the data from one file to the next. Add Cases – means that you will be adding more observations to the current dataset. If you think about this, this means that you would be adding more respondents, more animals, more plots, etc… Essentially adding more observations that are unique to what is in your current dataset.
Add Variables means you have the same set of observational units, so respondents, animals, plots, etc… and you have new measurements that you want to add to your current dataset.
Let’s start with our datasets – adding variables. We have 2 files January, February, and March that have the same 25 individuals and weight measures taken in 3 months. We want all of these to be in the same SPSS dataset so we can do an analysis across these 3 months.
What’s the first thing we need to do before adding February data to January? If we were to try it as it is – but could happen?
HINT: look at the variable names for the 2 files
Make the appropriate changes in the Variable View of SPSS.
Before we can merge any files, we need to ensure that all the datasets are sorted. It looks like they are currently sorted, but let’s double-check by getting SPSS to run a sort anyway. Sort all the datasets so we are all set for the next steps.
Let’s start with adding February data to January:
Make sure you are in the January file
Take a look at the January file. Does it look correct? Remember you can double-check by looking at the February file. If you are happy with the way it looked, save it under a new name – maybe Jan_Feb
Repeat this process to add the March dataset as well and save it as a file called Jan_Mar.sav
Close the February and March files, leaving the new Jan_Mar and April datasets open.
So now we have a dataset with weight measures taken from January to March on the same individuals. Now we need to add the new observations that have weight measures taken in January and April – this is a great example of Adding Cases to a dataset.
With the Jan_Mar dataset open:
Review the dataset, is it what you were expecting?
If it is, save it as Jan_Apr.sav. Close the April dataset
Let’s start by creating a new variable called Wtgain that is the difference between the weight measured in January and the weight measured in March.
SPSS will add the new variable at the end of the current dataset. Review and decide whether is completed the action you were expecting. Since the original weight measures had no decimal places, let’s remove the decimal places added to the new variable. Add a label to this new variable.
Save the dataset.
Sometimes we have a variable that we want to recode – so in our case we are going to create a new variable called wtclass_jan that will take the weights measured in January and put them into 3 weight classes: 1 = 13-16; 2 = 17-20; 3 = 21-24
SPSS has 2 functions that allow us to recode variables. One is called, Recode into Same Variables… and the second, Recode into Different Variables… In the interest of NOT writing over any data, I recommend that you use the Recode Into Different Variables… option. As you work more and more with your data in SPSS, there may come a time when you may want to use the Recode Into Same Variables.. option, but understand that you will lose any data that you overwrite.
We want to create the new variable called wtclass_jan as described above. To do this:
Remember SPSS creates any new variable and adds it to the end of the current dataset. We have a small dataset, so it is easy to find, but when you start to work with your own data, you may have 100s of variables – so remember that the new variables are added to the end.
Check the new wtclass_jan variable to see if it worked.
Save your dataset
There are many data manipulations that can be performed in SPSS. Whether you do these in Excel or SPSS it does not matter. Document any changes you made to remember what you did.

Last time we met we took a closer look at bar charts and how we can create bar charts in the different software packages we have access to at the University of Guelph. This time I’d like to change gears a bit and take a look at qualitative data and some of the options that are available to visualize this type of data.
First, qualitative data is a broad term, and the more I work in the field of data & statistics, the more I learn how people define and use the term “qualitative data”. So, I am not going to try to define this term for this session, but rather concentrate on how we can use different types of visualization for this broadly defined type of data.
Resources, there are a large number of resources available to you to help you determine what the best option for your visualization is. I will highlight a couple that I’m currently using. If you are using others, please let me know and I can add them to the list of resources on this site.
Chart Suggestions – Chart suggestions presented by Andrew Abela from Extreme Presentations. A decision chart based on the types of data you are using and the story you want to tell or visualize.
Qualitative Chart Chooser 3.0 by Jennifer Lyons and Stephanie Evergreen. I just came across this wonderful resource. I really like their approach of “What story are you trying to tell?” Let’s work through a couple of examples and discuss your thoughts on this resource.
Books: Data Visualization, A Handbook for Data Driven Design. Andy Kirk (2016).
As you peruse these resources, the one thing that you will notice is that you have to be very comfortable with the type of data you are working with. A quick review:
Bring examples of qualitative data to this session and we will look at possible options to visualize the data, the pros, and the cons.

Please visit the SASsyFridays blog for this session’s content on Subsetting.

This workshop is the third in a series of 4 offered in partnership with Carol Perry, Associate Librarian Research and Scholarship. These workshops are hands-on and have exercises associated with each aspect being covered in the workshop.
This workshop explores the areas of data anonymisation (making data secure) and data preservation in the context of Research Data Management. The powerpoint presentation is available here, please review for more information and contact either Carol Perry or Michelle Edwards for questions.
![]()