Saving your work as R scripts

Add
module 1
week 3
R
scripts
reproducibility
Author
Affiliation

School of Life Sciences, University of Hawaii

Published

February 7, 2023

✏️

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Know how to build good scripts
  • Be able to run source code
  • Be able to verify your script
  • Know how to start debugging scripts
  • Know how to clear your workspace
  • Know how to use print() and cat(), functions and use your history file

Introduction

Because R is interactive, it is tempting to simply play with code until you get the results you want. The problem with this is that you may not be able to reproduce it. Also, you may have made many manipulations of your data, some of which you’ve lost track of, and so your data objects may not really be what you think they are. This makes it impossible to repeat your analysis with confidence.

A key part of any analysis is verification:
  • Did you do what you really think you did?
  • Was the input free of error?
  • Did the steps of your analysis work without error?
  • And perhaps most importantly – can you reproduce it?

Clean Scripts

Writing clean scripts help you to accomplish these goals. Scripts are lines of code saved in an ordinary text file with a .R or .r ending. (Make sure it is plain text, and NOT an .rtf, .doc, or other type of binary file.

All good script follows the first three R’s, as you increase along the path of R jedi-hood, you will add on the 4th R
  • Readable – If you look at the script in a month or 6 months, will you be able to easily understand it?
  • Right – Does it run free of error, and does it produce correct results?
  • Repeatable – Can you reproduce your results from your input data?
  • Reusable – Is your coding modular and designed well so that your code can interact with other scripts, and/or use it for other purposes?

The mac interface has a very nice text editor, and the newer Windows R interfaces are getting better too (or install Notepad++). From the R menu, choose File > New Document (or command-N). Simply type or cut and paste your code from your history file into here. Let’s make a script for the analyses we’ve done thus far.

Script Template

First, make sure that you are in the directory that you want the script to execute from (Rclass\...). Start off with any packages that you wish to load, then begin to cut and paste your code.

Make sure to add comments indicated by the # symbol so that you know what the code does:

Here is the basic structure of a script:

#require( ...addonpackage... )    # anything between ... needs to be changed 
      # if none, then you don't need that line

#dat <- read.csv(..."your input file.csv"... )       # input data

# Your lines of code to run analyses
# You may have output or processed data that you want to save, 
# create an object for it and write it out to a csv file at the end 

# plot graphics

write.csv(out, file="myoutput.csv")      # output data 

And here is a simple example script that reads in data, calculates summary statistics, a linear regression, and a couple of figures.

#require(stats)        # stats is part of the base package and doesn't need to be loaded, 
    # but if you need an add-on package, you would require it here.

dat <- read.csv ("Data/morphpre.csv")  # read in data

lm.HLSVL <- lm(dat$HandL ~ dat$SVL)   # run a linear model

summary(lm.HLSVL) # get summary statistics
str(lm.HLSVL) # look at the linear model object
coef(lm.HLSVL)[2] # get the slope of the regression

plot(dat$HandL ~ dat$SVL, cex=2)    # make a plot with big dots (cex controls size of symbols)
abline(lm.HLSVL, col="red")              # plots the regression line, in red
  title("Microhylid Hand Length vs Body Size")  # add a title
  text(x=15, y=13, paste("slope = ", coef(lm.HLSVL)[2]))   
      # add important info to the text

###
# please insert your other lines of code here -- enough 
#   to save a meaningful analysis
###

Note that I have used spacing and indents to increase the ``readability’’ of the code. Use it to set of blocks of code that accomplish one task, with indents to indicate heirarchy. We will talk more about this in the functions section.

Save the script file as testScript.R or a title of your choice in your class folder. Now if you want to run the code, you simply type at the R console (from within your Rclass directory):

source("testScript.R")

When I am trying to develop a script, I often work by having the script window open next to the R console, and once a bit of code is working, I cut and paste it directly into the script. Save the script and source it. Once you have a good amount of code, you can work by making changes to the script, saving, and sourcing, over and over again.

Writing pdf to file

If you’d like to print your pdf to a file instead of to the screen, you can add the following code into your script:

pdf(file="MicrohylidHandLvsSize.pdf") # open pdf device for printing
  plot(dat$HandL ~ dat$SVL, cex=2)  # remake plot as before
  abline(lm.HLSVL, col="red")              
    title("Microhylid Hand Length vs Body Size")  
    text(x=15, y=13, paste("slope = ", coef(lm.HLSVL)[2]))  
dev.off() # turn off pdf device so future plots go back to screen

Just to clarify, the syntax is:

pdf() # open pdf device for printing
 
 # lines of code that print to pdf such as a plot
 #   any additional plot elements such as title, text, etc. 

dev.off() # close pdf device, completing the file

If you forget dev.off() your pdf will be corrupted. If you donʻt print anythying to the pdf, it may be blank or it may also be corrupted.

History file

Another handy feature of R is that it automatically saves a history file. That is, a file that has a list of every command you’ve executed in your sessions.

It is saved by default as .history in your working directory. Because the file name begins with a period, it is not visible normally (although it is there – you can see it from the terminal by using the ls -a command). It is great to have in case you are in a bind, but a better practice is to save any important history explicitly with your own filename, either click on the history button on the R gui (box with yellow and blue lines), and click on “save history” at the bottom of the side window, or type the code:

savehistory(file = "date_today.Rhistory")

This is an ordinary text file, which you can open up and edit (removing all the mistakes), and save as a scriptname.R file.

Another helpful tip when writing source code is to use print() and cat() functions to print out your output to the console. When you are using R in interactive mode, when you type the name of a variable, you get a print of its contents. However, when you source the same code, the variable does not print to the screen. You have to explicitly put a print() or cat() function around it.

Let’s use a built-in dataset called iris, which is the famous Fisher iris dataset. Make a test script file and save it as test.R:

test.R
names(iris)   # will not print to console when sourced

spp <- unique(iris$Species)  # only unique values
spp <- as.character(spp)     # factor -> character
spp   # will not print to console when sourced

print('Species names')  # will print
print(spp)              # will print

cat('\n', 'Species names =', spp)  # concatenate 
    # \n is a carriage return character
summary(iris)

Then test it by running:

source("test.R")

You can see that print() just makes a rough dump of the variables onto the screen. I added a character string so that we would know what variable was being printed to screen. cat() makes a nicer, more customized display (it turns everything into a character vector, then pastes them together [i.e., concatenates them] before printing). They both do the same basic job, however. Notice also that summary() does print to screen. Usually you only need to use these explicit print statements to see the contents of your variables as you are debugging.

Remember the workspace

Finally, remember that R is interactive, and the objects you create during a session are still around even after you’ve run your source code and forgotten about them. So to really check that your script is complete, you should shut down R (don’t save the workspace), double click on the name of your script to restart R in the correct directory, and then source the program again. Does it work? Great!!

You could also try clearing all the objects from your workspace using the command:

rm(list=ls())     # remove a list of objects consisting of the entire workspace

But this doesn’t unload your packages, and there is still a danger that the script won’t run in a fresh session if you forgot to include loadig the packages in your script. It’s OK to use for minor incremental changes, but the best thing for a real test when you are done drafting your script is to quit R and retry with a blank slate.

In general, most of my analyses are pretty quick in terms of computer time (not coding and debugging time!). So I never save my workspace, because I don’t want to deal with any ghost objects I have forgotten about. Instead, I write a nice script that will generate the whole analysis. If it’s a really big complex analysis, you can save intermediate output as R data files (more on this later).

Try to create a script file for all the analyses we’ve done so far (and for every session throughout the course).

Letʻs get Organized

Make sure you can see file endings

Does your MacOS or Windows environment show you the file endings (i.e., .R, .pdf, .csv, etc.)? If not be sure to turn them on. Try the instructions below or you can google for “show all file exensions in” (Mac or Windows, etc.).

Mac OS

This is a Finder preference. From any Finder window, click on the menu bar: Finder > Preferences > Advanced > Click on Show all finename extensions.

Windows

This is in File Explorer (Windows key + E). Click on the menu: View > Show > File Name Extensions. You can also choose to show all hidden files if you wish. For pictures see here

Text editor environment

While I love the R text editor for writing R scripts, for working with multiple .qmd and other files I find it helpful to have a full-featured plain text editor. A new tool that I discovered is the Sublime text editor. If youʻd like to try it out, you can download it here: https://www.sublimetext.com

A couple of features I like is that you can have multiple panes open. For example if you want to copy text from an old script to a new script, you can easily see and do that.

It also allows you to organize Projects, various folders that will appear on the sidebar to preserve your workspace. This helps when you are writing text documents across folders. So for example if you have your Rclass folder in one place and your website folder in another, you can have both open within the Sublime project. When you finish working on it you can save the project and reopen it later.

To create a project start by opening a new file. Then choose the Project > Add folder to Project... on the menu bar. You can load mulitple folders.

It has contextual highlighting for Quarto as well as GitHub markdown.

It also has integration with command line R https://bishwarup-paul.medium.com/a-guide-to-using-r-in-sublime-text-27f78b33f872. You can run R commands in a lower terminal pane, sent directly from your text document in sublime.

Different Desktop Windows

Itʻs also nice to have multiple desktops to organize your work. - It makes it easier to find your different apps. - You may have one workspace for your text editor, and another for your Terminal or CMD prompt, for example. - If I am working with multiple git repos, I might have one desktop just for my Terminal windows with a separate Terminal open for each one.

To use multiple desktops: On Windows On Mac

One the Mac, you can open new desktops by using three fingers to swipe up on the trackpad. Switch between them by swiping left or right with three fingers.

Organize your Projects into Folders

Weʻve been learning about reproducibility. One important aspect is file organization. Each project should be organized into one folder that contains:

  • All input data (usually in a Data folder)
  • All code and documentation
  • All output

The idea is to keep everything complete, self-contained, and clear. Move old versions into a “Trash” folder. If you donʻt end up looking back at it, then delete it! (Or if you are bold, delete it right away!)

A really useful UNIX/CMD command is tree. It shows you the directory structure contained within any folder. It works on both MacOS and Windows.

This is in ASCII – so you can copy and paste it into your README.md file!

Terminal/CMD
tree myfolder

If it is not pre-installed on your mac, you may need to install it with homebrew:

Terminal
brew install tree

Exercises

  1. Create a script of the work we’ve done so far.
  2. R has great diagnostic plots for linear models. Read about them in the help page for ?plot.lm and incorporate a multi-panel figure by adding two lines of code to the script you’ve already made:
par(mfrow = c(2,2))   # set the plot environment to have two rows and two columns
plot(lmHLSVL)  # or the name of your linear model object          
  1. Save output to a file.
  2. Modify test.R so that a summary of the iris data prints to the console when sourced.
  3. Explore other datasets in R. At the R command prompt type data() to see what is available.