applicant <- list(fullname="Mickey Mouse", address="123 Main St.", state="CA")
applicant
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$state
[1] "CA"
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$state
[1] "CA"
March 23, 2023
At the end of this lesson you will:
We’ve been introduced to lists, but here we will gain a better understanding of some of their special features and how to use them to write more powerful code. Lists and counted loops (for loops) work really well together when you want to scale up to repeated computation.
Lists are commonly returned from functions because functions can only return one object. Any collection of objects can be put together into a single list. Functions (and any other R element) can be used together with for loops to improve modularity and readabilty.
R also has special functions that operate along lists, called apply()
functions, which we will learn about in the next lesson.
Lists in R are vectors like any other vector, but more flexible in that elements of a list can have different data types. This has at least three consequences.
The elements of lists can be named, either upon creation, or using the names()
function. Naming list elements is always a good idea because it gives you another way of accessing their elements:
applicant <- list(fullname="Mickey Mouse", address="123 Main St.", state="CA")
applicant
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$state
[1] "CA"
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$state
[1] "CA"
We can also use all of the standard functions that work on vectors, such as the combine function:
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$state
[1] "CA"
$scores
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
If we had multiple applicants, we could put them all together in a list of lists.
There are just a couple of additional things:
Once you understand that, itʻs simply applying the rules.
List elements can be accessed with the usual operators for vectors:
By name This is why itʻs a good idea to name list elements.
applicant$fullname
[1] "Mickey Mouse"
applicant[1] ## returns a list of length one
$fullname
[1] "Mickey Mouse"
applicant[[1]] ## returns the object within applicant[1]
[1] "Mickey Mouse"
Single brackets return lists. We can select multiple elements within single brackets:
applicant[1:2]
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
applicant[c("fullname", "address")]
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
Double brackets return the element within the list slot. But we can only select one:
applicant[[1]]
[1] "Mickey Mouse"
applicant[["fullname"]]
[1] "Mickey Mouse"
applicant[[1:2]] ## cannot subset [[]] with more than one index
Error in applicant[[1:2]] : subscript out of bounds Error in applicant[[1:2]] : subscript out of bounds
Exclusion index (drops the state
slot):
applicant[-3]
$fullname
[1] "Mickey Mouse"
$address
[1] "123 Main St."
$scores
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Accessing elements inside an object within a list: Here we want to access elements of a matrix which is in a list.
applicant[4]
$scores
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
applicant[[4]][2,1] # Take the scores matrix, and grab row 2, column 1.
[1] 2
applicant[[4]][,3] # Take the scores matrix, and grab all of column 3.
[1] 5 6
Because of the flexibility of lists, they are useful containers for the output of loops or other repeated operations on data. What is a loop you may ask? It is a set of code that you want to execute repeatedly. For example, you may have a large number of datasets that you want to perform the same set of operations on.
The easiest type of loop to understand is the for
loop. It is a counted loop, or repeated a fixed number of times. You may be familiar with for loops (or for-next loops) from other computing languages. In R the for loop operates over a vector, once for each element of the vector. The syntax is:
for (var in seq) expr
Where var
is a variable which takes on values of the vector seq
and evaluates a block of code expr
. The loop is evaluated once for each value of seq
. If we need expr
to span more than one line, we can do this by enclosing the loop with {}
(even if itʻs only one line itʻs often nice for readability).
[1] "This is a for loop 1"
[1] "This is a for loop 2"
[1] "This is a for loop 3"
It is traditional to use i
, j
, or k
as the variable to remember that itʻs a counting index, but it is often convenient to use names that are meaningful to understand the code. For example, in the context of our earlier example, it might be helpful to iterate over each applicant in our applicant list:
for (applicant in applicant_list) expr
Often we want to save the result or output of the code to a list. But we donʻt want to create a list with each iteration of the loop, we just want to fill the list element or add on to the list. So in order to do this, we need to create the list outside of the loop and then modify it inside the loop.
One strategy is to fill the list element by element using the counter i
(note that we donʻt have to tell R how long the list is when we create it. We can just make an empty list, R will just keep adding to mylist
):
mylist <- vector("list") ## creates a null (empty) list
mylist
for (i in 1:4) {
mylist[i] <- list(data.frame(x=rnorm(3), y=rnorm(3))) ## why does this have to be a list object?
}
mylist
[[1]]
x y
1 -1.5891006 1.0447945
2 -0.9289017 0.7709087
3 0.3724301 -0.4045960
[[2]]
x y
1 0.1793252 -0.1167686
2 0.8126721 0.4296348
3 -1.6203481 -0.8756523
[[3]]
x y
1 -0.9860967 -0.3711503
2 0.8975715 -0.4703146
3 -1.0199470 1.2916043
[[4]]
x y
1 1.359747 -0.9601006
2 1.236366 -0.6644286
3 -1.495295 -0.3078295
This code does the same thing, but uses the c()
function to add on to mylist
(what happens when you add on to a null list?):
mylist <- vector("list") ## creates a null (empty) list
for (i in 1:4) {
mylist <- c(mylist, list(data.frame(x=rnorm(3), y=rnorm(3))))
}
mylist
[[1]]
x y
1 -1.4285965 -0.6116601
2 1.4485142 -0.3044112
3 -0.1127297 -0.2419247
[[2]]
x y
1 0.2545869 -0.7306861
2 0.1032878 1.1283951
3 2.0754615 -0.1501652
[[3]]
x y
1 -0.3170322 -0.3139040
2 -1.0130806 -0.4275128
3 0.2439555 0.1655069
[[4]]
x y
1 0.8218255 0.9689341
2 0.5789347 0.1230913
3 1.8513586 1.7753243
You often want to reshape list output in scientific programming. For example, you may fit models many times on many permutations of your data, for example, and you want to flatten your list and make a dataframe. When you know that your output is regular, it is often convenient to use the unlist()
function. Unlist will also work on dataframes, because you know, dataframes are lists of vectors all of the same length.
lm.out <- lm( mylist[[1]]$x ~ mylist[[1]]$y ) ## calculate a linear regression on dataframe 1 x as a function of y
aov.out <- anova(lm.out) ## run anova, save to aov.out
aov.out
Analysis of Variance Table
Response: mylist[[1]]$x
Df Sum Sq Mean Sq F value Pr(>F)
mylist[[1]]$y 1 2.2984 2.2984 1.242 0.4656
Residuals 1 1.8506 1.8506
unlist(aov.out)
Df1 Df2 Sum Sq1 Sum Sq2 Mean Sq1
1.0000000 1.0000000 2.2983557 1.8505623 2.2983557
Mean Sq2 F value1 F value2 Pr(>F)1 Pr(>F)2
1.8505623 1.2419769 NA 0.4655777 NA
for
loop to return the maximum value of x and y in each dataframe. How can you make the code flexible to make it work if mylist has a different length?for
loop to loop over mylist
. Within this loop, for each dataset compute an anova on x ~ y
, unlist
the anova output, and add as a row to a final dataframe.