Powerpoint Used during the 20170607 workshop
Commonly Used Statistical Packages
For the purposes of this workshop, the following statistical packages were reviewed:
It is recognized that there are many more available and used in the OAC community. If you have questions regarding other packages not included here, please email oacstats@uoguelph.ca .
Commonly Used Statistical Packages: Variable name restrictions and limits
LENGTH OF THE VARIABLE NAME
SAS – 32 characters long
SPSS – 64 bytes long
• 64 characters in English
• 32 characters in Chinese
Stata – 32 characters long
R – 10,000 characters long
Matlab – 63 characters lo
1ST CHARACTER OF THE VARIABLE NAME
SAS – 1st character MUST be:
• a letter (English) OR
• an underscore “_”
SPSS – 1st character MUST be:
• a letter (English) OR
• an underscore “_” OR
• “@”,“#”,“$”
Stata – 1st character MUST be:
• a letter (English) OR
• an underscore “_”
R – NA
Matlab – 1st character MUST be:
• a letter
BLANKS IN VARIABLE NAMES
SAS – NO Blanks!
SPSS – NO Blanks!
Stata – NO Blanks!
R – NO Blanks!
Matlab – NO Blanks!
SPECIAL CHARACTERS IN VARIABLE NAMES
SAS – NO Special characters with the exception of:
• “_”
SPSS -NO Special characters with the exception of:
• “_”
• “.”
• “@”
Stata -NO Special characters with the exception of:
• “_”
R -NO Special characters with the exception of:
• “_”
• “.”
Matlab – NA
CASE IN VARIABLE NAMES
SAS – Mixed case – for presentation only
SPSS – Mixed case – for presentation only
Stata – Mixed case – for presentation only
R – Mixed case – for presentation only
Matlab – Case sensitive
NAMES/WORDS TO AVOID IN VARIABLE NAMES
SAS – SAS keywords
SPSS – SPSS Reserve words
Stata – NA
R – R function words
Matlab – Function names
GENERAL NOTES ABOUT VARIABLE NAMES
SAS – Libref names can only be 8 characters long
SPSS – #variable – is a scratch variable used in syntax
• $variable – is a system variable
• Do NOT end variable name with a “.” OR “_”
Stata – NA
R – R Community recommends that you develop a naming convention for your data
• Use of “_” is faster (10-20%) than the use of “.”
Matlab – NA
Commonalities across the Statistical Packages – Recommended Best Practices for Excel – Variable Names
LENGTH RECOMMENDATION
- Maximum length: 32 characters
- Keep the variable names short and use a variable label to provide more information. Remember you need to type these variable names in and you will need to remember them.
1ST CHARACTER OF A VARIABLE NAME
- ALWAYS start variable names with a letter
VARIABLE NAMES AND SPECIAL CHARACTERS
- Numbers may be used anywhere in the variable name AFTER the first character
- Only use underscores “_”
- Do NOT use BLANKS – replace blanks with an underscore “_”
CASE
- Use lowercase
- Case doesn’t matter for most packages.
- If you are using MatLab – please be aware that the variable names are case sensitive – if you use lowercase as a Best Practice you won’t forget which ones are Capitals and which ones are NoT.
FAMILIARITY WITH STATISTICAL PACKAGE NOMENCLATURE
- As you work with a particular package you will become familiar with keywords or reference words that are reserved for the program to use.
- As a general rule keep away from Statistical terms as variable names. If you REALLY want to use “mean” qualify it with your data, so wt_mean or concentration_mean
Commonalities across the Statistical Packages – Recommended Best Practices for Excel – Variable Labels
Variable names are often short and may not reflect the contents of the data collected. Trying to create a variable name that is a descriptive summary of the data can be extremely challenging. Recommendation is to create short, concise variable names and to create variable labels that are descriptive for each variable name.
Variable name: wt28
Variable label: Weight (kg) at 28 days of age
CREATING VARIABLE LABELS – SAS
Data first;
Infile …
Input …
Label wt28 = “Weight (kg) at 28 days of age”;
Run;
CREATING VARIABLE LABELS – SPSS
- Variable View in the SPSS Data viewer
- Find the variable called wt28
- In the Column called Label – Type: Weight (kg) at 28 days of age
- Syntax Window:
VARIABLE LABELS
Wt28 “Weight (kg) at 28 days of age”.
CREATING VARIABLE LABELS – R
Apply the appropriate function for the space you are working in. For instance the Dataframe, Vector, etc..
Lapply function
CREATING VARIABLE LABELS – STATA
label variable wt28 “Weight (kg) at 28 days of age”
CREATING VARIABLE LABELS – MATLAB
T.Properties.VariableDecsriptions{‘wt28’}=”Weight (kg) at 28 days of age”;