ARCHIVE: Summer 2018: R-users and SASsy Fridays Hiatus!

Good day everyone!

I’ve been talking with a few folks about these during the summer and consensus is that everyone is busy and it will be tough to maintain attendance.  So, with that in mind, both the R-Users and SASsy Fridays sessions will be back in the Fall!

Apologies to anyone who was planning on coming.  Topics listed will be presented in September.

Have a great summer!  See you all in the Fall!
Name

R: Getting the data in, merging files, and creating new variables

PDF copy of the Getting the data in, merging files, and creating new variables notes

As we begin the coding part of the R workshop, let us try to bring things together and make it as easy as possible for you to run the code and work with me as we progress through the topics of the next 2 days.  On that note, ror each session of the R workshop, I will have an R script already prepared for you, complete with comments for every step that you will need to download and open in RStudio at the beginning of every workshop section.

Please download the following:

Getting the Data in

As with any program out there today, there are several ways to bring data into the R platform.  We will work through 2 different ways to accomplish this task, and once we’ve worked through both, I would like to hear which one you prefer!

Reading a CSV file, gets you in the habit of creating a preservation-ready format for your data, but you’ve probably already figured out, that it also means having documentation at the ready – so you remember what variable is what, and with respect to reading it into R, you need to pick and choose the location, or make sure your working directory has been set at the beginning.  Reading an Excel file, is just SOooo much easier and probably the way most of us like to work.  Just remember to save your data as you work!

Merging files

Merging files sounds like such an innocent task.  I have an Excel file with 4 monthly worksheets and all I want to do is put them all together into 1 file, so I can analyse the data.  Easy peasy right??

There are a few ways of merging files in R.  The most straightforward method is to use the merge function available in Base R.  Try it out with our data and tell me what happens when you merge the January data which has 25 observations with the February data which only has 23 observations?

So, we’ve noticed that there’s something NOT quite right with this merge.  The 2 observations that had a measure in January but not in February were not included in our final dataset. What happens if later on, say in March or April, we do have measures for these individuals?  We want them to be included.  So we need to consider other methods of merging our files.

We will use the joining functions available in the DPLYR package.  By doing this we need to take a quick little detour to remind ourselves about sets, unions, and joins?  This is the way that R takes when merging or rather joining datasets.  You’ll also see that by taking this approach we can do merge all of our data using this one function, unlike SAS and SPSS.

This is a perfect opportunity to show you the Cheatsheets in R.  In RStudio follow these steps:

  • Help
  • Cheatsheets
  • Data Transformation with DPLYR

Let’s work through the examples of Combine Tables to get a better understanding of how to merge in R.

Based on these examples, we are interested in performing a FULL_JOIN.  Did the coding in the R script work for you?  Can you see how this might work for your own research data?

Creating new variables

Creating a new variable is very straightforward function:  Ynew = Var1 + Var2 or whatever variable you need to create.  The tricky part is ensuring that it becomes a part of your dataset.  Let’s work through the examples in the R script.

Now what if we want to recode a variable rather than just creating a new one?  For example:  we want to create a new variable called wtclass that will take the weights measured in January and put them into 3 weight classes:  1 = 13-16; 2 = 17-20;  3 = 21-24

Quick recap

Getting your data into R, can be as easy as using the READXL package and importing your Excel worksheets directly into R.

Once you have your dataset in R, you can merge files using the join functions available in the DPLYR package.

Creating new variables and recoding variables is straightforward, just remember to make sure that you have added them to your R datafile by using the attach() and detach() functions.  Note there are other ways of doing this as well, this is just one.

Don’t be afraid to check out the Help resources – Cheatsheets are fun and very informative.

Name

 

SPSS Workshop: GLMM and Non-gaussian Distributions

When we collect data from our research trials, we do not always have data that is “well-behaved” or that comes from the traditional normal distribution curve.  It can have a number of distributions and with the latest statistical methodological advances, SPSS can handle some of these as well.  Your first job is to be able to recognize when a normal distribution is NOT appropriate and which distribution is an appropriate starting place.   Non-Gaussian distributions are what these are referred to.  Remember Gaussian is the same as calling it Normal.

Where do we start?  Think about your data – what is it?

  • A percentage?
  • A count?
  • A score?

How do we know that our data is not from a normal distribution?

  • Always check your residuals!
  • Remember the assumptions of your analyses?
    • Normally distributed residuals is one of them!

Let’s work with the following example.  We have another RCBD trial with 4 blocks and 4 treatments randomly assigned to each block.  There were 2 outcome measures taken:   proportion of the plot that flowered, and the number of plants in each plot at the end of the trial.

Generalized Linear Models in SPSS

Those of you that follow the terminology that is commonly used when talking about ANOVAs – Analysis of Variance – may notice that there is one term that is NOT included in this heading?  Any guesses?  MIXED.  In SAS, we talk about and work with Generalized Linear Mixed Models or GLMM (GLIMMIX), but I just realized that the MIXED part is missing here.  That’s way I could not figure out how to add a RANDOM effect to the models!

We will however, still use the RCBD data that was used in the SAS workshop.  It contains, as noted above proportion data and count data – perfect examples of non-gaussian data.

Please download the Excel spreadsheet here and load it into SPSS.

We will work through each of the outcome variables in the workshop using the following steps:

  • Analyze
  • Generalized Linear Models
    • Generalized Linear Models
    • We will work through the tabs for each variable together and discuss the outputs

Please note that if you have attended the SAS workshop and this workshop – and are trying to compare the results – they will NOT match as we are creating a different model in each program – since we cannot add a RANDOM effect in SPSS yet.  You can, however, create the same fixed effects model and the results match perfectly.

See you later in May!

Name

 

SPSS: Introduction and Getting the Data In

SPSS software

One of the biggest PROs to using SPSS is its familiar look and feel.  When you open it – it almost feels like home – so familiar – so much like Excel!  The rows representing individual cases or observations and the columns representing variables.   Only thing, is that in SPSS the Data View – is just that – a view of the data, a container for the data.  You cannot create any new variables in this view – in Excel we can move our cursor to a new cell and type in a formula – you CANNOT do this in SPSS.

Note at the bottom there are 2 tabs: Data View and Variable View.  We are currently in the Data View.  If you select the Variable View tab – things look a lot different from Excel.  We now have rows to match our variables (or columns in the Data View) and the columns in the Variable View represent different pieces of information about our variables (or metadata).

Excel Data into SPSS

Let’s take a step back and work through an example of bringing a dataset into SPSS.  For the purposes of this part of the workshop, please download the following 2018 Trial Data excel file.  A small 10 observation file with 4 variables.  Two of the variables contain letters and 2 are numbers for weight and height measurements.

Let’s work together and bring the file into SPSS.  Believe it or not, it is as simple as doing the following:

  • File
  • Open
    • Data
      • Navigate to the folder where you saved the data
      • Change the Files of Type: to Excel
      • Select the file
    • Open
    • Answer the Wizard questions
      • Select the correct worksheet – if there are more than 1
      • Double check the Preview to ensure that it looks correct
      • When you are satisfied – Click OK

That was pretty straight forward wasn’t it?

Variable View

Two things happened when you Clicked OK – SPSS opened a second window – the SPSS Statistics Viewer or Output window and your data popped into SPSS as we expected.  In the Data View, notice the icons associated with each variable – we will talk more about these in a moment.

Click on Variable View – notice how your 4 variables are now listed as the rows and some of the information regarding the variables has been filled in.  Let’s work through each one separately.  I will talk about the first variable here, and you will fill in the remaining 3 variables as an exercise.

Name – lists the name of the variable as it was read in from Excel.  Please note that you should keep these as small and as informative as possible.  Please review the Best Practices for entering your data in Excel here.

The second column is the Type – if you select the word Type you will be presented with the options that are available to you in SPSS.  Generally speaking we tend to primarily use the String and Numeric types.  However, you may also have the occasion to use the Date type.

Width – is the width of the variable we have – so how many spaces are we willing to accept as data.  For ID it is set to 3.  You can change this at any time.  Decimals goes along with the Width.  Our data has no decimals, but you can change this at any time as well.  Note that when we create new variables, SPSS will, by default assign 2 decimal places to the new variable.

Label – Ah…  the lovely label.  I highly recommend that you complete the Label and the next Values column if you plan to use SPSS for your analysis.  It will save you a lot of time.  So, what is it?  Our variable names are short and sometimes not very informative.  In this case, Weight?  Weight of what?  What units of measure were used?  You can add all this information in the label field.  How about for weight – Weight measured at 24 months of age in lbs.  This comes in extremely handy if you are working with surveys and are using Q1, Q2, etc..  as variable names.  In the label field you can add the entire question.  What is really nice about this – when you run your statistics, in the output window rather than seeing weight or Q1, you will see the label you typed in here.  So, if you’re like me – no more loose papers with notes!

I referred to the next column, Values, as another really handy one to have and to complete.  When you click on Values you are presented with a small dialogue box where you can enter the Value and its Label.  Let’s use an example:  Treatment, many folks may use a numbering system for their treatments, so a 1, 2, 3, 4 or a lettering system, A, B, C, D.  But no one knows what each treatment refers to.  This is the spot where you can add that label.  So if treatment 1 = 20% Iron; 2 = 25% Iron; 3 = No Iron; 4 = Control.  We can add all of that here!

Missing is our next one.  Another example – let’s say I have collected data and for one reason or another I have some missing data – maybe the equipment broke down one day, and one of the areas I was supposed to go measure was flooded and I couldn’t get to it.  Chances are you are going to document that somewhere – in a lab book or your notes.  But what about the data?  We tend to leave it blank right?  We have no data so we’ll leave it blank.  Another way to handle it is to  provide a value that is NOT a possible value for your data – so 999 for age or maybe 9999, and add that value to your dataset.  Now, we don’t want it to be included in any statistical analysis, so in SPSS we add that value to the Missing column in Variable view.

Columns and Align – are ways to make your Data View prettier or neater.

Measure – ah the important one!  You have 3 options:

  • Ordinal
  • Nominal
  • Scale

We will review these in the workshop as they are VERY important when you go to run any statistics and creating Charts in SPSS.

Output Window

The output window is where you will see all of your results and the log.  The Log part of SPSS tells you what SPSS has done every step of the way and it also provides you with the syntax that SPSS used to get the results.  Yes, one of the strengths to SPSS is its user-friendly interface, but if you want to learn to code in SPSS, that option is there as well!  I’ll talk more about that in a bit.

How can you save your results in SPSS Output window?  Well, you can just save them, but they will only be accessible when you are using the SPSS software.  You can export them to PDF, Word, Excel, etc…  by using the File – Export option.  But, personally I really like the Copy and Paste option.

If you select the tables and/or charts you want to save for later, Copy the selected materials, then simply Paste them into  your Word document.  Works like a charm EXCEPT on a Mac.  On the Macs, if you are trying to Copy and Paste a Chart you will need to Copy Special – select image, and then Paste it into Word.  This has happened on a PC as well at times.

OK     Paste     Reset     Cancel     Help

When you start running any analysis in SPSS, you will recognize these 5 words – they appear everywhere.  So a quick definition may be handy.

OK – go ahead and run the analysis

Paste – opens a Syntax window and will “paste” the syntax that SPSS will use to run the analysis you are trying to set up.  If you want to learn how to write syntax in SPSS, this is a great way to do that!

Reset – reset the dialogue box to its original state

Cancel – we know this one

Help – we know this one as well.

Conclusion

A quick overview to get you started using SPSS.

Name