ARCHIVE: Ridgetown – Workshop data – November 6, 2017

We’ll continue to work with the data that Brittany provided me earlier this semester.  I’d like you all to try developing a model for the CONTROL_56D data.  Using GLIMMIX, try to find the best fitting model.

On Monday, November 6, we will work together and discuss how everyone approached this challenge.

Please download the data and program here.  Since I cannot link to .sas files, I have provided you with the PDF file.  You’ll need to copy and paste the contents into a SAS editor and work from there.

Also note that I will be available for one-on-one consultations in the afternoon of Monday, November 6, 2017.  To book a timeslot, please visit:  http://rt_oacstats.youcanbook.me 

Crimes of Statistics: Power

To consider the POWER of your statistical analysis, we need to take a step back and talk briefly about Hypothesis tests and their relationship with POWER.

Remember how you start your research?  With a hypothesis.  For our little example we will have an hypothesis statement that says the mean height of cats is equal to the mean height of dogs.  The alternate hypothesis would then say that the mean height of cats is not equal to the mean height of dogs.

Ho: µcats  = µdogs
Ha: µcats  ≠ µdogs

We are using an alpha value of 5%, therefore our p-value = 0.05.  We went out to measure 4 cats and 4 dogs and their height measurements (inches) are:
Cats:  11, 13, 11, 14
Dogs:  24, 21, 18, 28

The mean height for cats is 12.5 with a standard deviation of 1.5
The mean height for dogs is 22.8 with a standard deviation of 4.3

I can conduct a t-test and it provides me with a p-value of 0.02.  With data such as this I can also calculate the variation around the mean, such that I have 11.0-14.0 (12.5 ± 1.5) for the cats and 18.5-27.1 (22.8 ± 4.3) for the dogs.  Do the ranges overlap? No.

What conclusion do we draw?
That we will reject the Null hypothesis and state that dogs are significantly taller than cats by an average of 10″.

Sounds great right?  We did expect that the dogs would be taller than cats.  So right from the beginning, in this example, our experience and knowledge of cats and dogs, told us  that the Null hypothesis was false – and with our little sample we proved it!

Let’s review this table – in our case we were working with a Ho that we knew to be false and we rejected the Ho – so we have NO ERROR.

  Ho is TRUE Ho is FALSE
REJECT the NULL Hypothesis Type I error
(ALPHA)
No error
(POWER = 1-BETA)
ACCEPT the NULL Hypothesis No error
(1-ALPHA)
Type II error
(BETA)

We’re going to repeat this experiment and measure another 8 animals – 4 cats and 4 dogs.

Ho: µcats  = µdogs
Ha: µcats  ≠ µdogs

We are again using an alpha value of 5%, therefore our p-value = 0.05.  We have height measurements (inches) of 4 cats and 4 dogs:
Cats:  21, 13, 11, 14
Dogs:  23, 21, 18, 14

The mean height for cats is 14.8 with a standard deviation of 4.3
The mean height for dogs is 19.0 with a standard deviation of 3.9

I can conduct a t-test and it provides me with a p-value of 0.19.  With data such as this I can calculate the variation around the mean, such that I have 10.5-19.1 (14.8 ± 4.3) for the cats and 15.1-22.9 (19.0 ± 3.9) for the dogs.  Do the ranges overlap?  Yes.

What conclusion do we draw?
That we will NOT reject the Null hypothesis and state that the average height of cats and dogs is the same.

Are we comfortable with this?  If you review the table presented above – now we still have a FALSE Ho and this time around we did NOT reject the Null hypothesis – leading us to committing a Type II or Beta error.

A Type II error is directly related to the POWER of the test.  By definition, the power of a statistical test, is the probability that the test will correctly reject the null hypothesis when it is false.

POWER is related to a number of factors:

  • sample size
  • effect size – or the size of the difference between treatment groups
  • variation of our outcome variable
  • level of significance – p-value

Consider our example above, what factors could be change to increase the POWER of our test and ensure that we won’t see similar results to the second time we collected data?

  • Sample size

There are several ways to calculate the POWER of a statistical test.  SAS has 2 PROCs – Proc POWER and Proc GLMPOWER.  Review the SASsy Fridays post on these.  There are many links to online calculators as well.  Please choose one that is defendable.

 

 

Data Visualization: The Table

Data Analysis Tasks and Methods

What a great visual to help you decide how to visualize your data.   By studying this “visualization” you can see that tables and graphs are primarily used to summarize data and to find relationships.

Tables vs Graphs

Tables:

  • Verbal representation
  • Read the information in rows or columns

Graphs:

  • Visual representation
  • See patterns or relationships

Neither one is better than the other – they each have their own merits and purposes.  It is up to you as the researcher to decide which is more appropriate for your story.

When would you use a Table?

  • Look up individual values
  • Compare pairs of related values
  • Need precision
  • Multiple sets of values in different measures
  • Show summary and detailed information

When would you use a Graph?

  • Show relationships among and between sets of values by giving them shape
  • Patterns, trends and exceptions are more easily seen rather than read
  • Series of values – seen as a whole

Different types of Tables

  1. Data table
    • Show rows and columns of data
    • Very difficult if not impossible to see any trends or relationships by looking at raw data
  2. Contingency Table – or a Crosstab(ulation) Table
    • Can show the relationship between two variables
    • Variables MUST be categorical!
  3. Summary or Aggregate Tables
    • Show descriptive statistics such as: mean, minimum, maximum, standard deviation, standard error, etc…
    • Can also group these by a categorical variable

Anatomy of a Table

Remember that a table should stand on its own!!!

Highlights of the anatomy of a table.  If you are publishing, please check with the publication you are submitting to.  The guidelines listed below have been pooled from a number of different sources and are meant to be used as a teaching tool and guide only.

TITLE:  Should be clear and concise
Also known as the HEADING – according to the Operations Manual for the Canadian Journal of Plant Science, Canadian Journal of Soil Science, and the Canadian Journal of Animal Science.

  • Capitalize the heading in sentence format with no period at the end
  • Do not indent the second and any subsequent lines
  • No units of measure in the title

COLUMN TITLES: Visible and concise
Also known as COLUMN HEADINGS 

  • Capitalize only the first word
  • Units of measure in parentheses on the last line of the subheading
  • If several headings share the same UOM, place below the headings, centred

LINES:  to separate different parts of the table

BODY: 

  • headings within the body of the table need to be italicized
  • centre entries under the column headings
  • centre data within the columns on decimal point, dashes, etc..

FOOTNOTES: used to clarify information in the table and should always appear at the bottom of the table!

  • Footnotes start with the letter a as a superscript
  • Each footnote is on a separate line
  • Asterisk – * to designate statistical significance

Examples of Tables

Data Table

Crosstabulation Table

  • Select Add/Remove Data
  • Under Geography – select all provinces
  • Select Apply at the bottom of the page
  • This will create a Crosstab table of geography vs Quarter/year

Research results table

Designing a table

Items to think about when you are designing a table.  A statistical package may not always provide you with the ideal table 🙂

  1. if you are comparing categories, these should be presented vertically in columns rather than rows.
  2. Row entries of data should not be random – order them by importance or alphabetically.
  3. If you are presenting more than one level of categories, arrange the hierarchy to emphasize the categories you think are most important

Example:

Steer weight                                                   Heifer weight
1981     1991     2001    2011                          1981    1991    2001    2011

Versus

1981                                                      1991
Steer weight     Heifer weight          Steer weight    Heifer weight

Summary

  • Often used to present a lot of data
  • Audience will glaze over the table and may not remember the message behind it.
  • Not recommended to use a table to show patterns, trends, or interactions between values – this may be easier to see and remember by using a more visual object
  • Remember who your audience is!!

 

 

 

Data Visualization: The Basics

Data visualization can mean many different things to different people.  To me, it all comes down to the purpose.  WHY?  and to WHOM?

Our first data Viz meeting, held on October 17 was a quick review of why we use Data visualizations, and a review of the general principles of visualization.  We then reviewed some of the graphic design principles.  Some of these come naturally to folks, while others may be a challenge.  I want to highlight the principles here and will provide a link where you can download the powerpoint that was used during the session.

Four main purposes of Data Visualization:

  1. Analysis
  2. Communication
  3.  Monitoring
  4. Planning

Remember your audience will play a major role in helping you define how you present your data and/or your results

Five General Principles behind Data Visualization:

  1. Show the data
  2. Simplify – you want to keep the message simple and remove all the flowery bits of your visualizations
  3. Reduce the clutter – do you REALLY need all those grids or ticks on your graph??
  4. Revise your visualizations – creating a graph or a table should be viewed as part of your writing.  You don’t write something and leave it.  You write, you revise, you may write again and revise again.  Treat you visualizations in the same manner.
  5. Be HONEST!  This may sound funny – but let’s face it, there are times where the visualizations we create may have an element of exaggeration added in.  Those y-axes – where do they start?  at 0 or somewhere else?  Are we exaggerated the differences between those lines?

The attached powerpoint presentation also demonstrates some of the graphic design principles that we should be aware of as we create tables and graphs.  Please review these as you work on your visualizations.

 

R: Keyboard Shortcuts

As I continue to learn about R and RStudio in particular, I will add to this ongoing list of keyboard shortcuts.  Some of these may also be listed in other posts.  But I’ll try to update this one as I learn new ones.

Keyboard Shortcut                                        Function

Ctrl-Enter                                                        Submit code
Ctrl-1                                                                Move to Source window
Ctrl-2                                                                Move to Console window
Ctrl-L                                                                Clear Console window
Alt –                                                                    <-