Literate Statistical Programming and Quarto

Introduction to literate statistical programming tools including Quarto Markdown
module 1
week 2
Markdown
Quarto
programming
Author
Affiliation

School of Life Sciences, University of Hawaii

Published

January 26, 2023

Pre-lecture materials

Read ahead

Read ahead

Before class, you can prepare by reading the following materials:

  1. https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/1740-9713.01522
  2. Creating a Website in Quarto quickstart up to and including Render https://quarto.org/docs/websites/
  3. Publishing to GitHub up to and including Render to docs https://quarto.org/docs/publishing/github-pages.html

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Be able to define literate programming
  • Know how to use R plus a text editor or Rstudio to practice literate programming
  • Create a Quarto markdown document

Literate Programming

Before there was widespread concern about “Reproducible Research”, the term literate programming was coined by Donald Knuth, one of the true geniuses of computing, the author/inventor of TeX, and the notion of structured programming, among other things. He wrote a book about it in 1984!

Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language.

The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer. The program is also viewed as a hypertext document, rather like the World Wide Web. (Indeed, I used the word WEB for this purpose long before CERN grabbed it!) —Donald Knuth

The basic idea is that:

  • Programs are useless without descriptions.
  • Descriptions should be literate, not comments in code or typical reference manuals.
  • The code in the descriptions should work. Thus it is necessary to extract the real working code from the literary description.

These concepts were baked into the R help pages. You will see that there are working code examples for every single function in R at the bottom of each help page. In fact it is required before a package will be published on CRAN.

Additional Reading

The Data Science Pipeline

The basic issue is when you read a description of a data analysis, such as in an article or a technical report, for the most part, what you get is the report and nothing else.

Of course, everyone knows that behind the scenes there’s a lot that went into this article and that is what I call the data science pipeline.

The Data Science Pipeline

Literate Programming in Practice

One basic idea to make writing reproducible reports easier is what’s known as literate statistical programming. The idea is to think of a report or a publication as a stream of text and code.

  • The text is readable by people and the code is readable by computers.

  • The analysis is described in a series of text and code chunks.

  • Each kind of code chunk will do something like load some data or compute some results.

  • Each text chunk will relay something in a human readable language.

  • The code and text remain together in a single source document. No more separate analysis files and word processing files. When code is edited, the report is automatically generated and updated.

There might also be presentation code that formats tables and figures and there’s article text that explains what’s going on around all this code. This stream of text and code is a literate statistical program or a literate statistical analysis.

Weaving and Tangling

Literate programs by themselves are a bit difficult to work with, but they can be processed in two important ways.

Literate programs can be weaved to produce human readable documents like PDFs or HTML web pages, and they can tangled to produce machine-readable “documents”, or in other words, machine readable code.

In order to use a system like this you need a documentational language, that’s human readable, and you need a programming language that’s machine readable (or can be compiled/interpreted into something that’s machine readable).

Sweave

One of the original literate programming systems in R that was designed to do this was called Sweave written by Friedrich Leisch. Sweave enables users to combine R code with a documentation program called LaTeX. Sweave revolutionized coding, and has become part of the R base code. Leisch is on the R Core Development Team and the BioConductor Project.

Sweave files ends a .Rnw and have R code weaved through the document:

<<plot1, height=4, width=5, eval=FALSE>>=
data(airquality)
plot(airquality$Ozone ~ airquality$Wind)
@

Once you have created your .Rnw file, Sweave will process the file, executing the R chunks and replacing them with output as appropriate before creating the PDF document.

Sweaveʻs main limitation is that it requires knowledge of LaTeX

  • LaTeX is very powerful for laying out mathematical equations and fine-tuned control of formatting, but is not a documentation language that is widely used outside of mathematics.
  • Therefore, there is a steep learning curve.
  • Sweave also lacks a lot of features that people find useful like caching, and multiple plots per page and mixing programming languages.

Instead, folks have moved towards using something called knitr, which offers everything Sweave does, plus it extends it to much simpler Markdown documents.

rmarkdown

Another choice for literate programming is to build documents based on Markdown language. A markdown file is a plain text file that is typically given the extension .md. The rmarkdown R package takes a R Markdown file (.Rmd) and weaves together R code chunks Figure 1, producing a large number of user-specified outputs.

Figure 1: R markdown translates text and code to many different formats

R chunks surrounded by text looks like this:

```{r plot1, height=4, width=5, eval=FALSE, echo=TRUE}
data(airquality)
plot(airquality$Ozone ~ airquality$Wind)
```
Tip

The best resource for learning about R Markdown this by Yihui Xie, J. J. Allaire, and Garrett Grolemund:

The R Markdown Cookbook by Yihui Xie, Christophe Dervieux, and Emily Riederer is really good too:

The authors of the 2nd book describe the motivation for the 2nd book as:

“However, we have received comments from our readers and publisher that it would be beneficial to provide more practical and relatively short examples to show the interesting and useful usage of R Markdown, because it can be daunting to find out how to achieve a certain task from the aforementioned reference book (put another way, that book is too dry to read). As a result, this cookbook was born.”

Because this is lecture is built in a .qmd file (which is very similar to a .Rmd file), let’s demonstrate how this work. I am going to change eval=FALSE to eval=TRUE.

data(airquality)
plot(airquality$Ozone ~ airquality$Wind)

Questions
  1. Why do we not see the back ticks ``` anymore in the code chunk above that made the plot?
  2. What do you think we should do if we want to have the code executed, but we want to hide the code that made it?

Before we leave this section, I find that there is quite a bit of terminology to understand the magic behind rmarkdown that can be confusing, so let’s break it down:

  • Pandoc. Pandoc is a command line tool with no GUI that converts documents (e.g. from number of different markup formats to many other formats, such as .doc, .pdf etc). It is completely independent from R (but does come bundled with RStudio). If you donʻt have Rstudio installed, you will have to install pandoc.
  • Markdown (markup language). Markdown is a lightweight markup language with plain text formatting syntax designed so that it can be converted to HTML and many other formats. A markdown file is a plain text file that is typically given the extension .md. It is completely independent from R.
  • R Markdown (markup language). R Markdown is an extension of the markdown syntax for weaving together text with R code. R Markdown files are plain text files that typically have the file extension .Rmd.
  • rmarkdown (R package). The R package rmarkdown is a library that uses pandoc to process and convert text and R code written in .Rmd files into a number of different formats. This core function is rmarkdown::render(). Note: this package only deals with the markdown language. If the input file is e.g. .Rhtml or .Rnw, then you need to use knitr prior to calling pandoc (see below).
Tip

Check out the R Markdown Quick Tour for more:

Artwork by Allison Horst on RMarkdown

knitr

One of the alternative that has come up in recent times is something called knitr.

  • The knitr package for R takes a lot of these ideas of literate programming and updates and improves upon them.
  • knitr still uses R as its programming language, but it allows you to mix other programming languages in.
  • You can also use a variety of documentation languages now, such as LaTeX, markdown and HTML.
  • knitr was developed by Yihui Xie while he was a graduate student at Iowa State and it has become a very popular package for writing literate statistical programs.

Knitr takes a plain text document with embedded code, executes the code and ‘knits’ the results back into the document.

For for example, it converts

  • An R Markdown (.Rmd) file into a standard markdown file (.md)
  • An .Rnw (Sweave) file into to .tex format.
  • An .Rhtml file into to .html.

The core function is knitr::knit() and by default this will look at the input document and try and guess what type it is e.g. Rnw, Rmd etc.

This core function performs three roles:

  • A source parser, which looks at the input document and detects which parts are code that the user wants to be evaluated.
  • A code evaluator, which evaluates this code
  • An output renderer, which writes the results of evaluation back to the document in a format which is interpretable by the raw output type. For instance, if the input file is an .Rmd, the output render marks up the output of code evaluation in .md format.

Converting a Rmd file to many outputs using knitr and pandoc

[Source]

As seen in the figure above, from there pandoc is used to convert e.g. a .md file into many other types of file formats into a .html, etc.

So in summary:

“R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on).”

[Source]

In comes Quarto

The folks who developed R Markdown have moved on to a new package called Quarto. Quarto contains many of the features of R Markdown, but importantly, is now separate from Rstudio. It is intentionally developed as a cross-platform, cross-language markup language. It works with R, Python, Julia, and Observable. And is rapidly catching on with the statistical coding community.

Quarto can render output to many formats of documents including html, pdf, docx, md. It can also layout websites, presentations, or even books. It supports interactive apps such as Rshiny among other things.

We are going to learn using Quarto. Letʻs start by getting a simple webpage up. Conceptually, there are three tasks:

  1. Create your quarto website on your computer
  2. Make your website folder into a GitHub repo
  3. Publish your website via GitHub

Create your website locally with Quarto

In this section, I am adding a bit more explanation to the Quarto quickstart guide up to and including Render. If something is not clear, please consult https://quarto.org/docs/websites/

There are three main quarto commands we will use:

  • quarto create-project: Make a website project template
  • quarto preview: Take a look at what the webite will look like
  • quarto render: Render your qmd to html

Make your website directory and template

Create your website (here called mysite) using the following command. It will make a directory of the same name and put the website contents within it.

Terminal
quarto create-project mysite --type website

You should now see the following files in your mysite directory (Figure 2):

Figure 2: Website files from the Terminal view

This is the bare-bones version of your website. Check that the code is functional by looking at a preview:

Terminal
quarto preview

This should open up a browser window showing a temporary file made by quarto by rendering your website files.

Tip
  • quarto preview will refresh the preview every time you save your index.qmd (or any) website files. So itʻs a good idea to keep the preview open as you make edits and saves.
  • Check every edit, it is easier to debug in small steps.
  • Terminate quarto preview with Control-c

Render your website to html

Use quarto to render your content to html, the format used by browsers. First navigate into your website directory then render:

Terminal
cd mysite
quarto render

Take a look at the mysite contents after rendering, you should see a new directory _site (Figure 3). The html was rendered and put in there (go ahead, open up the files and check it out):

Figure 3: Website files after rendering

Personalize your content

What is really nice is that you can personalize your website by simply editing the quarto markdown and yaml files.

Web content goes in .qmd

Using any text editor, edit the index.qmd to personalize your website.

The first section of your index.qmd is the header. You can change the title and add additional header information, including any cover images and website templates.

For example this is what I have in my course website index.qmd header. Note that my cover image is in a folder called images within at the top level of my website directory. If you want to try this out substitute or remove the image line and change the twitter/github handles.

index.qmd
---
title: "Welcome to Introduction to Data Science in R for Biologists!"
image: images/mycoolimage.png
about:
  template: jolla
  links:
    - icon: twitter
      text: Twitter
      href: https://twitter.com/mbutler808
    - icon: github
      text: Github
      href: https://github.com/mbutler808
---

You should edit the body of your website as well. You simply edit the text.

The quarto markdown page has great examples showing how to format your content. Take a look at how to specify header sizes, lists, figures and tables.

Try editing the about.qmd file as well. You will notice that this is another tab in your website. YOu can add more tabs by adding .qmd files.

With each addition, be sure to quarto preview your changes to make sure it works. When you are satisfied with your website, quarto render to render to html.

Tip
  • When editing markdown, take care to note spaces and indents as they are interpreted for formatting.
  • Indentations are really important for formatting lists.
  • For example in a hyperlink, there is no space between the square brackets and parentheses. [This is a cool link](http://mycoollink.com)

Website-wide settings go in _quarto.yml

All Quarto projects include a _quarto.yml configuration file that sets the global options that apply across the entire website.

YAML started off as “Yet Another Markup Language” 😜. It is clean, clear, and widely used. You can edit your YAML to add options or change the format of your website. Take a look at your _quarto.yml.

Here is an example for a simple website. title: is the parameter to set the websiteʻs title. navbar: sets the menu, in this case on the left sidebar. By default tabs will be named based on the names of the .qmd files, but you can set them manually. There are many themes you can choose from too, check them out. For something different try cyborg.

_quarto.yml
project:
  type: website

website:
  title: "today"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - about.qmd

format:
  html:
    theme: minty
    css: styles.css
    toc: true

Again, after saving your edits, quarto preview to see the effects. When you are satisfied with your website, quarto render to render to html.

Terminal
quarto render

Publishing your website to GitHub

You can publish your website for free on GitHub, which is a very cool feature. In his section I am adding a bit more explanation to the Quarto quickstart guide up to and including Render to docs https://quarto.org/docs/publishing/github-pages.html. I describe the most important stpes below:

  1. Render your html to a docs directory
  2. Supress GitHub jekyll html processing by creating a .nojekyll file
  3. Make your website directory into a repo, and link it to a GitHub repo
  4. Edit the GitHub repo settings to publish your website

Render your html to docs

Edit the _quarto.yml file at the top level of your website to send output to docs. This will also create the docs folder.

_quarto.yml
project:
  type: website
  output-dir: docs

The next time you quarto render it will create docs and all of its contents.

Supress GitHub jekyll html processing

GitHub uses a sofware called jekyll to render html from markdown. Since weʻre using quarto, we want to supress that. Create an empty file named .nojekyll at the top level of your website directory to supress default jekyll processing.

Mac/Linux
Terminal
touch .nojekyll
Windows
Terminal
copy NUL .nojekyll

Setup a GitHub repo for your website

  1. Turn your website directory into a git repo:
Terminal
git init
git add .
git commit -m "first commit"
  1. Create a GitHub repo by the same name

For example, mine might be github.com/mbutler808/mysite.

  1. Link your local repo and GitHub repo together

If you forgot how to do this, go back here

  1. Check your GitHub repo. Are your files there?

GitHub settings to serve your webpage

Almost there! A couple more steps.

From your GitHub repo, click on Settings in the top menu, and Pages on the left menu.

Your website should deploy from branch. Under Select branch choose main and under Select folder choose docs.

After clicking save GitHub will trigger a deployment of your website. After a few minutes, your URL will appear near the top at Your site is live at...:

Congratulations! ⚡️ Your website is now live 🎉🎊😍

Now make more changes!

The Quarto Workflow is
  • Edit the content in .qmd
  • From the Command line:
    • quarto preview to check that edits are correct
    • quarto render to render .qmd to .html
    • git add .
    • git commit -m "message"
    • git push origin main
  • Check your website (this may take a beat)

For fun

You can have fun with emoji! Guangchuang Yu wrote the package emojifont (this is the same person who wrote the widely used ggtree package) and now you can bring your emoji out of your phone and into your quarto documents! Install the R package emojifont:

install.packages("emojifont")

Then anywhere you want an emoji in the markdown file, you just type:

`r emojifont::emoji('palm_tree')`

🌴

Or if you want several, just line them up:

`r emojifont::emoji('balloon')``r emojifont::emoji('tada')``r emojifont::emoji('smiley')`

🎈🎉😃

There is a handy cheat sheet of emoji names here https://gist.github.com/rxaviers/7360908

Final tips

Tip
  • Always always quarto render before you push up your changes to GitHub!
  • If your changes are not appearing, try quarto preview and check that your changes appear in the preview. Then quarto render before you use git to add, commit, and push
  • Note: It can take a few minutes to render on GitHub before your changes appear on your website

Please see Stephanie Hicksʻ lecture for more literate programming examples and tips.

Post-lecture materials

Final Questions

Here are some post-lecture questions to help you think about the material discussed.

Questions

  1. What is literate programming?

  2. What was the first literate statistical programming tool to weave together a statistical language (R) with a markup language (LaTeX)?

  3. What is knitr and how is different than other literate statistical programming tools?

  4. Where can you find a list of other commands that help make your code writing more efficient when using Quarto?

Additional Resources