This workshop will walk you through the steps in SPSS to help you merge datasets, create new variables, and recode variables in your current dataset. Many of these functions are easily performed in Excel, but I’d like to show you how to take advantage of some of the data manipulation options available to you in SPSS. Pick the method that is the most comfortable for you, when working with your data.
Let’s start by downloading a dataset to use for this workshop, a dummy dataset I created in Excel. Download the file and save it on your computer. There are 4 worksheets in this file. Weight measures were taken on 25 individuals in January, February, and March. For the month of April we have an additional individuals with weight measures taken in January and April. Our goal is to create one SPSS dataset that contains all 4 Excel worksheets.
First, let’s 0pen each worksheet in SPSS and save them using the dataset names of January, February, March, and April. Remember that although the Excel file can hold several worksheets with different data, statistical packages such as SPSS and SAS, cannot bring in the entire file at the same time, you will need to bring in each worksheet separately.
You should now have 4 SPSS datasets saved on your computer called: January.sav, February.sav, March.sav, and April.sav
There are 2 ways to merge datasets in SPSS: Add Cases or Add Variables. I really like how SPSS states these, since it makes it clear as to how you are adding the data from one file to the next. Add Cases – means that you will be adding more observations to the current dataset. If you think about this, this means that you would be adding more respondents, more animals, more plots, etc… Essentially adding more observations that are unique to what is in your current dataset.
Add Variables means you have the same set of observational units, so respondents, animals, plots, etc… and you have new measurements that you want to add to your current dataset.
Let’s start with our datasets – adding variables. We have 2 files January, February, and March that have the same 25 individuals and weight measures taken in 3 months. We want all of these to be in the same SPSS dataset so we can do an analysis across these 3 months.
What’s the first thing we need to do before adding February data to January? If we were to try it as it is – but could happen?
HINT: look at the variable names for the 2 files
Make the appropriate changes in the Variable View of SPSS.
Before we can merge any files, we need to ensure that all the datasets are sorted. It looks like they are currently sorted, but let’s double-check by getting SPSS to run a sort anyway. Sort all the datasets so we are all set for the next steps.
- Sort Cases
- Put ID into the Sort by: Box
- Sort Order – Ascending
Let’s start with adding February data to January:
Make sure you are in the January file
- Merge Files
- Add Variables
- Since all of our datasets are open – you should see a list of the currently open files. Select February. If you closed all the datasets, the top box will be empty, then you will select External SPSS dataset and navigate to where you saved your file and open it
- Your new dialogue box – you should see all of the variables available in both datasets. * is the active dataset (January), + is the DataSet 2 or new combined dataset.
- Review what variables are where. New Active Dataset is the new merged dataset – think about what you want to see included in here. Does it match your expectations? Notice the list of variables in the Excluded box – do these make sense? Notice that these are the ID and TRMT – variables which were duplicated in the 2 datasets.
- We can add a key variable – the variable on which your variables is sorted by. Let’s do this to make sure weights are matched to the appropriate ID.
- Select Match cases on key variables
- Select that the variables are sorted in both files – Select both files provide cases – Then select ID from the Excluded box and add it to the Key Variables box – notice how it disappears from the new Active Dataset box too
- Ok to run – you will get a warning message – click ok
Take a look at the January file. Does it look correct? Remember you can double-check by looking at the February file. If you are happy with the way it looked, save it under a new name – maybe Jan_Feb
Repeat this process to add the March dataset as well and save it as a file called Jan_Mar.sav
Close the February and March files, leaving the new Jan_Mar and April datasets open.
So now we have a dataset with weight measures taken from January to March on the same individuals. Now we need to add the new observations that have weight measures taken in January and April – this is a great example of Adding Cases to a dataset.
With the Jan_Mar dataset open:
- Merge Files
- Add Cases
- As this first case above, because our April dataset is open, you will see it listed in the dialogue box
- Select April and Continue
- You should now see a box that lists variables that are Paired and Unpaired. Essentially think of these as the list of variables that are unique and present in both files
- Since we want ALL the variables in our new dataset, select all the unpaired Variables and put into the Variables in the New Active Dataset – our new dataset
Review the dataset, is it what you were expecting?
If it is, save it as Jan_Apr.sav. Close the April dataset
Creating New Variables
Adding a new variable
Let’s start by creating a new variable called Wtgain that is the difference between the weight measured in January and the weight measured in March.
- Compute Variable
- This will open a new dialogue box
- In the Target Variable enter a new variable name: wtgain
- In the Numeric Expression: enter the calculation – select the Weight_mar and place in this box – then add the “-” following by the Weight_jan variable. So it should read weight_mar – weight_jan
SPSS will add the new variable at the end of the current dataset. Review and decide whether is completed the action you were expecting. Since the original weight measures had no decimal places, let’s remove the decimal places added to the new variable. Add a label to this new variable.
Save the dataset.
Recoding a variable
Sometimes we have a variable that we want to recode – so in our case we are going to create a new variable called wtclass_jan that will take the weights measured in January and put them into 3 weight classes: 1 = 13-16; 2 = 17-20; 3 = 21-24
SPSS has 2 functions that allow us to recode variables. One is called, Recode into Same Variables… and the second, Recode into Different Variables… In the interest of NOT writing over any data, I recommend that you use the Recode Into Different Variables… option. As you work more and more with your data in SPSS, there may come a time when you may want to use the Recode Into Same Variables.. option, but understand that you will lose any data that you overwrite.
We want to create the new variable called wtclass_jan as described above. To do this:
- Recode Into Different Variables…
- You will now see a dialogue box. Select the Weight_jan variable from the left box and throw it over to the Input Variable ->Output Variable box
- You will now provide a new name wtclass_jan in the Name box under the Output Variable section.
- Add an appropriate label – maybe something like January Weight Class Group
- you should now see weight_jan -> wtclass_jan in the middle box
- Now select Old and New Variables button
- This is where you will tell SPSS that wtclass 1 contains any weight measures taken in January that include 13 – 16
- There are different ways to do this – select the one that works best for you
- Old value – will be the values in weight_jan – I will use Range – 13 – 16
- New value – will be 1
- Click Add
- Create the next 2 groups
Remember SPSS creates any new variable and adds it to the end of the current dataset. We have a small dataset, so it is easy to find, but when you start to work with your own data, you may have 100s of variables – so remember that the new variables are added to the end.
Check the new wtclass_jan variable to see if it worked.
Save your dataset
What can SPSS do?
There are many data manipulations that can be performed in SPSS. Whether you do these in Excel or SPSS it does not matter. Document any changes you made to remember what you did.
One thought on “ARCHIVE: W18 SPSS Workshop: Merging datasets and creating new variables”