Learning objectives

At the end of this lesson you will:

Understand the special features of lists
Be able to access list elements and write to lists
Be able to construct a for loop for repeated computation
Have gained another skill in modular programming

Overview

We’ve been introduced to lists, but here we will gain a better understanding of some of their special features and how to use them to write more powerful code. Lists and counted loops (for loops) work really well together when you want to scale up to repeated computation.

Lists are commonly returned from functions because functions can only return one object. Any collection of objects can be put together into a single list. Functions (and any other R element) can be used together with for loops to improve modularity and readabilty.

R also has special functions that operate along lists, called apply() functions, which we will learn about in the next lesson.

Lists

Lists in R are vectors like any other vector, but more flexible in that elements of a list can have different data types. This has at least three consequences.

First any operation that you can perform on a vector can also be done on a list.
Second, any types of objects can be organized together into a list, which are very convenient for things like model fits, where you may want to store the model formula, the data, the coefficients, any likelihood values, and any other relevant information together into one data object.
Third, you can use lists as containers for containers, which can be nested indefinitely.

The elements of lists can be named, either upon creation, or using the names() function. Naming list elements is always a good idea because it gives you another way of accessing their elements:

applicant <- list(fullname="Mickey Mouse", address="123 Main St.",  state="CA")
applicant

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

$state
[1] "CA"

names(applicant) <- c("fullname", "address", "state")
applicant

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

$state
[1] "CA"

We can also use all of the standard functions that work on vectors, such as the combine function:

applicant <- c(applicant, list(scores=matrix(1:10, nrow=2)))
applicant

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

$state
[1] "CA"

$scores
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

If we had multiple applicants, we could put them all together in a list of lists.

Accessing list elements

A lot of people get tripped up working with lists, but the same rules apply to lists as other objects.

There are just a couple of additional things:

The double bracket, and
The heirarchy of objects.

Once you understand that, itʻs simply applying the rules.

List elements can be accessed with the usual operators for vectors:

$ If the list is named
[ ] By number or name of the list element with single brackets. Returns a list. Can use a vector of indices or names.
[[ ]] By number or name with double brackets. Returns the element inside the list slot. Must be a single index or name.

By name This is why itʻs a good idea to name list elements.

applicant$fullname

[1] "Mickey Mouse"

applicant[1]   ## returns a list of length one

$fullname
[1] "Mickey Mouse"

applicant[[1]]  ## returns the object within applicant[1]

[1] "Mickey Mouse"

Single brackets return lists. We can select multiple elements within single brackets:

applicant[1:2]

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

applicant[c("fullname", "address")]

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

Double brackets return the element within the list slot. But we can only select one:

applicant[[1]]

[1] "Mickey Mouse"

applicant[["fullname"]]

[1] "Mickey Mouse"

applicant[[1:2]]  ## cannot subset [[]] with more than one index

Error in applicant[[1:2]] : subscript out of bounds Error in applicant[[1:2]] : subscript out of bounds

Exclusion index (drops the state slot):

applicant[-3]

$fullname
[1] "Mickey Mouse"

$address
[1] "123 Main St."

$scores
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

Accessing elements inside an object within a list: Here we want to access elements of a matrix which is in a list.

applicant[4]

$scores
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

applicant[[4]][2,1]  # Take the scores matrix, and grab row 2, column 1.

[1] 2

applicant[[4]][,3]  # Take the scores matrix, and grab all of column 3.

[1] 5 6

For loops

Because of the flexibility of lists, they are useful containers for the output of loops or other repeated operations on data. What is a loop you may ask? It is a set of code that you want to execute repeatedly. For example, you may have a large number of datasets that you want to perform the same set of operations on.

The easiest type of loop to understand is the for loop. It is a counted loop, or repeated a fixed number of times. You may be familiar with for loops (or for-next loops) from other computing languages. In R the for loop operates over a vector, once for each element of the vector. The syntax is:

for (var in seq) expr

Where var is a variable which takes on values of the vector seq and evaluates a block of code expr. The loop is evaluated once for each value of seq. If we need expr to span more than one line, we can do this by enclosing the loop with {} (even if itʻs only one line itʻs often nice for readability).

for (i in 1:3) { 
   print(paste("This is a for loop", i))
}

[1] "This is a for loop 1"
[1] "This is a for loop 2"
[1] "This is a for loop 3"

It is traditional to use i, j, or k as the variable to remember that itʻs a counting index, but it is often convenient to use names that are meaningful to understand the code. For example, in the context of our earlier example, it might be helpful to iterate over each applicant in our applicant list:

for (applicant in applicant_list) expr

Saving loop output to lists

Often we want to save the result or output of the code to a list. But we donʻt want to create a list with each iteration of the loop, we just want to fill the list element or add on to the list. So in order to do this, we need to create the list outside of the loop and then modify it inside the loop.

One strategy is to fill the list element by element using the counter i (note that we donʻt have to tell R how long the list is when we create it. We can just make an empty list, R will just keep adding to mylist):

mylist <- vector("list")   ## creates a null (empty) list
mylist

list()

for (i in 1:4) {
   mylist[i] <- list(data.frame(x=rnorm(3), y=rnorm(3)))  ## why does this have to be a list object?
}
mylist

[[1]]
           x          y
1 -1.5891006  1.0447945
2 -0.9289017  0.7709087
3  0.3724301 -0.4045960

[[2]]
           x          y
1  0.1793252 -0.1167686
2  0.8126721  0.4296348
3 -1.6203481 -0.8756523

[[3]]
           x          y
1 -0.9860967 -0.3711503
2  0.8975715 -0.4703146
3 -1.0199470  1.2916043

[[4]]
          x          y
1  1.359747 -0.9601006
2  1.236366 -0.6644286
3 -1.495295 -0.3078295

This code does the same thing, but uses the c() function to add on to mylist (what happens when you add on to a null list?):

mylist <- vector("list")   ## creates a null (empty) list
for (i in 1:4) {
   mylist <- c(mylist, list(data.frame(x=rnorm(3), y=rnorm(3))))
}
mylist

[[1]]
           x          y
1 -1.4285965 -0.6116601
2  1.4485142 -0.3044112
3 -0.1127297 -0.2419247

[[2]]
          x          y
1 0.2545869 -0.7306861
2 0.1032878  1.1283951
3 2.0754615 -0.1501652

[[3]]
           x          y
1 -0.3170322 -0.3139040
2 -1.0130806 -0.4275128
3  0.2439555  0.1655069

[[4]]
          x         y
1 0.8218255 0.9689341
2 0.5789347 0.1230913
3 1.8513586 1.7753243

Reshaping lists

You often want to reshape list output in scientific programming. For example, you may fit models many times on many permutations of your data, for example, and you want to flatten your list and make a dataframe. When you know that your output is regular, it is often convenient to use the unlist() function. Unlist will also work on dataframes, because you know, dataframes are lists of vectors all of the same length.

lm.out <- lm( mylist[[1]]$x  ~ mylist[[1]]$y )  ## calculate a linear regression on dataframe 1 x as a function of y
aov.out <- anova(lm.out)   ## run anova, save to aov.out
aov.out

Analysis of Variance Table

Response: mylist[[1]]$x
              Df Sum Sq Mean Sq F value Pr(>F)
mylist[[1]]$y  1 2.2984  2.2984   1.242 0.4656
Residuals      1 1.8506  1.8506

unlist(aov.out)

      Df1       Df2   Sum Sq1   Sum Sq2  Mean Sq1 
1.0000000 1.0000000 2.2983557 1.8505623 2.2983557 
 Mean Sq2  F value1  F value2   Pr(>F)1   Pr(>F)2 
1.8505623 1.2419769        NA 0.4655777        NA

Exercises

Take mylist above and name its elements (the dataframes).
Write another for loop to return the maximum value of x and y in each dataframe. How can you make the code flexible to make it work if mylist has a different length?
Write a for loop to loop over mylist. Within this loop, for each dataset compute an anova on x ~ y, unlist the anova output, and add as a row to a final dataframe.