4.3 Data Structures

Imagine a grocery list, shopping list, or to-do list. That list consists of a set of items in a specified order, and the list also has a length. Why do you think it’s useful to organize these items into a list, rather than in some other fashion? Can you think of why it might be useful to store data in a list?

Often, you will need to work with many related data, for example:

A sequence of measurements through time
A grid of values
A set of phone numbers

In these circumstances, it would make sense to organize the data into a data structure. R provides multiple data structures, each of which are appropriate in various situations. By far the most popular data structure in R is the data frame, but in order to talk about data frames, we must talk about some simpler data structures first.

4.3.1 Vectors

A vector is just an ordered set of elements (in other words, data), all of which have the same data type. Vectors can be created for the logical, numeric (double or integer), or character data types. Here’s an example of a vector:

x <- c(1, 2, 3)  # this is a vector of numeric types
print(x)

[1] 1 2 3

Note that to create a vector, we use the c function, where c stands for combine. This makes sense, because we are combining three numeric objects into a numeric vector. We may determine the length of any atomic vector like so:

length(x)

[1] 3

The class function will tell us what type of data is stored in a vector (which makes sense, because all elements of the vector have the same data type).

class(x)

[1] "numeric"

Here’s how to create logical or character vectors:

y <- c(TRUE, TRUE, FALSE, TRUE)
z <- c("to", "be", "or", "not", "to", "be")

class(y)

[1] "logical"

length(y)

[1] 4

class(z)

[1] "character"

length(z)

[1] 6

The above statement states that all elements of a vector must have the same data type, so what do you think will happen if you try to create a vector using elements from different data types? Here are some possibilities, can you think of another one?

R will produce an error
R will combine the elements somehow, but the result won’t be a vector
Something else?

Whatever happens, humans were behind the decision of how R should behave in this situation. If you were in charge of making this decision, what would make the most sense?

Let’s try to create a vector of mixed type and see what happens. Run the following commands in R and think about the output:

m <- c(TRUE, “Hello”, 5)

class(m)

print(m)

What changes did R make when creating the vector?

What’s happening in the above code is an example of type conversion, which we will talk more about later. For now, remember that every element in an R vector is the same type.

You can create empty vectors as placeholders, by indicating the data type and how many elements there are:

empty <- numeric(10)   # this creates a numeric vector of length 10

This is the first instance of us using a name which is longer than a single character! This new vector is called empty.

Let’s print the contents of the vector:

print(empty)

 [1] 0 0 0 0 0 0 0 0 0 0

Even though we didn’t tell R what data to put in the vector, it put a 0 in each element. This is the default value for a new vector.

Here’s how you can create new vectors of other types:

empty_int <- integer(45)   # create integer vector with 45 elements
empty_cha <- character(2)  # create character vector with 2 elements
empty_log <- logical(1000)    # create logical vector with 1000 elements!!

We saw that the default value for a numeric vector is 0. Use the code above to create empty integer, character, and logical vectors, then print them out to see what default values R has given to each element. Do these make sense?

What happens if we create a vector of length 1? It turns out this is the same as just creating a single instance of that data type. Observe how the following are the same.

a <- numeric(1)  # create vector of length 1 (default value is 0, right?)
b <- 0           # create single numeric with value 0
a == b           # compare a and b to see if they are the same.

[1] TRUE

It turns out, you can create a vector of length 0, which contains 0 elements. This may sound odd, but can happen sometimes! However, you cannot create a vector of negative length (e.g. logical(-1) won’t work), or a fractional length (e.g. character(12.7) won’t work).

4.3.1.1 Accessing and Changing Elements

After you’ve created a vector, how do you put your data in them? Here’s how you can change the value of a specific element:

a <- c(1, 2, 3)  # create numeric vector of length 3
a[2] <- 4        # change the value of the second element of a to 4
a                # print the result

[1] 1 4 3

See how the second element of a has changed? So you can access a specific element using square brackets: [ and ]. In fact, if you want to know the value of the third element (without changing anything), just use:

a[3]    # access the third element

[1] 3

What do you think will be the result of the following code (hint: the result will either be TRUE or FALSE)?

vec <- c(4, 5, 6) # Create a vector

vec[3] == 6 # Remember what == does?

Once you make a guess, try it in R and see if you were correct.

This video gives an introduction to vectors.

https://youtu.be/-BlN6_ZMpKE

4.3.1.2 Working with vectors

You can do many things with vectors that you can with single instances of each data type. Recall, you can add a number to a numeric object:

a <- 3   # create a numeric object
a + 4    # add a number to the object.

[1] 7

The same thing is possible with numeric vectors:

a <- c(1, 2, 3)   # create a numeric vector
a + 4             # add a number to EACH ELEMENT of the vector!

[1] 5 6 7

This type of behavior is called elementwise behavior. That is, the operation is performed on each element separately. Here are some other elementwise operations:

a - 3

[1] -2 -1  0

a * 1.5

[1] 1.5 3.0 4.5

a ^ 2

[1] 1 4 9

a == 2

[1] FALSE  TRUE FALSE

R has some functions which summarize the values in a vector. One such function is the sum function, which adds the values of each element in the vector:

print(a)  # print the elements of a as a reminder
sum(a)    # add all the elements of a together.

[1] 1 2 3
[1] 6

Other examples of summary functions include max, min, mean, and sd. We’ll talk about these and other summary functions later.

Some operations work on two vectors, as long as they are the same length:

b <- c(1, 0, 1)
a + b

[1] 2 2 4

b * a

[1] 1 0 3

a ^ b

[1] 1 1 3

You can even compare two vectors, and the result will be a logical vector:

z <- a > b  # compare a and b, element by element, assign the result to z
z           # print the value of z

[1] FALSE  TRUE  TRUE

The first logical value is the result of a[1] < b[1], the second logical value is the result of a[2] < b[2], etc. What operations can we perform on character vectors? Here are some examples:

z == TRUE  # which elements are TRUE?

[1] FALSE  TRUE  TRUE

This just produces z again (Do you see why?). Here’s how to get the logical “opposite” of z:

z == FALSE

[1]  TRUE FALSE FALSE

Or, as we saw before, we can use !, which operates on each element of z:

!z

[1]  TRUE FALSE FALSE

Remember how logical objects can be treated as numeric objects (either a 0 or 1)? If we use this with the sum function to determine how many elements are TRUE:

sum(z)

[1] 2

Here’s another example of using the sum function on a logical vector:

sum(a == b)  # how many elements do a and b have in common?

[1] 1

So there is one element that both a and b share.

Logical vectors can also be used to access all elements of a vector for which a certain condition is true. We’ll see how to do this later on.

Let’s create some character vectors and explore a few things we can do with them:

a <- c("I", "have", "to", "have", "a", "donkey")
b <- c("You", "want", "to", "sell", "a", "donkey")

First, we can do elementwise comparison (assuming equal length), just as we did for numeric vectors:

a == b

[1] FALSE FALSE  TRUE FALSE  TRUE  TRUE

To search for specific character strings in a character vector, you can use the grep function:

grep("have", a)  # search the vector a for the phrase "have"

[1] 2 4

This result shows that the phrase “have” occurs in elements 2 and 4 of a! What if we search for a phrase that doesn’t occur?

grep("raddish", a)

integer(0)

The result is an integer vector of length 0, meaning there are no elements that match the phrase!

This video continues the discussion of vectors.

https://youtu.be/NgmVhLpuM5k

4.3.1.3 Vectors of different types

What if we try to perform operations between vectors of different types? This will work in some cases, but not others. Here are a few examples:

a <- c(1, 2, 3)
b <- c("I", "am", "sam")
c <- c(TRUE, TRUE, FALSE)

a + b  # can you add a numeric vector to a character vector?

Error in a + b: non-numeric argument to binary operator

a + c  # can you add a numeric vector to a logical vector?

[1] 2 3 3

We see that you can’t add a numeric vector to a character vector, but you can add a numeric vector to a logical vector. Why is this?

Predict whether the following are possible:

Can you can multiply a character vector with a numeric vector?
Can you can multiply a logical vector with a numeric vector?

Check whether you are correct by creating some vectors in R and attempting to multiply them together. Can you make sense of the answer? If you run into errors, you can include error=TRUE in your code chunk options like this:

```{r, error=TRUE}

This will allow RStudio to still knit the document, even though the code block generated errors.

4.3.1.4 Special Numeric Vectors

There are a few special ways of creating a numeric vector which can be very useful, so we’ll mention them here. The first way creates a sequence of all integers between a starting and ending point:

d <- 1:5  # create sequence starting at 1 and ending at 5
d

[1] 1 2 3 4 5

Here’s a longer example:

d <- 1:100  # create sequence starting at 1 and ending at 100
d

  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

In this example, the R output can’t be shown on a single line, so it must be placed on multiple lines. Notice that each line has a different number in brackets: [1], [19], [37] etc. This number indicates which element of the vector is the start of that line. So we finally have an explanation for the [1] which is displayed with all R output. It’s simply indicating that this is the first element of the output. This also reflects the fact stated earlier that any R object can be considered a vector of length 1!

When you’re working with large data sets, it’s often helpful to see just the first few results instead of printing the entire thing. You can use head() to print the first six rows.

Another way to create a numeric vector is using the seq function, which allows you to specify the interval between each vector element. For example:

e <- seq(2, 100, 2)
e

 [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
[20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
[39]  78  80  82  84  86  88  90  92  94  96  98 100

Or you can also specify how long you want the vector to be, and seq will determine the appropriate interval to make the elements evenly spaced.

seq(1, 10, length.out=3)

[1]  1.0  5.5 10.0

seq(1, 10, length.out=5)

[1]  1.00  3.25  5.50  7.75 10.00

4.3.1.5 Another Data Type: Factor

In the previous section, we avoided talking about the factor data type, because we need the concept of vectors to appreciate their purpose, but now we are equipped to talk about them. Consider the following example of a character vector:

cha_vec <- c("cheese", "crackers", "cheese", "crackers", "cheese", "crackers", "cheese")

There are seven elements in this vector (length(cha_vec) is 7), but there are only two unique elements, “cheese” and “crackers”. Imagine having to write down this vector on a piece of paper, and the space it would take. Now imagine writing down instead:

1, 2, 1, 2, 1, 2, 1

1 = “cheese”

2 = “crackers”

This second method writes down numbers instead of character strings, but also keeps a record of which numbers correspond to which character strings. The total amount of space taken up on the piece of paper is smaller for the second method, and the amount of space saved would be even larger if the character vector were longer and had more repeated elements.

This is the essence of what a factor data type is: A character vector stored more efficiently on the computer. For a factor vector, R stores an integer vector (which often takes less space than a character vector), and also maintains a “lookup table” which keeps track of which integers correspond with which character strings.

To illustrate, let’s create a factor variable:

# Create a new factor variable from our existing character vector:
fac_vec <- factor(cha_vec)

Notice how we started with a character vector and used the factor function to create a factor from it. If we print the new vector,

fac_vec

[1] cheese   crackers cheese   crackers cheese   crackers cheese  
Levels: cheese crackers

it displays the elements as we would expect, but also includes another line of output giving Levels. This shows that there are only two unique character strings, which are called factor levels. Since R is using integers “behind the scenes” to store the vector, we can see those integers by using the as.integer function:

as.integer(fac_vec)

[1] 1 2 1 2 1 2 1

This is another example of type conversion, which we will discuss soon.

In some situations, numbers may get treated as characters, like so:

x <- c(“4”, “5”, “6”)

This may pose an issue if this character vector gets converted to a factor, because the “behind the scenes” integers may not agree with the Levels, which represent the original data. This can easily happen when reading in data from a file on your computer, if you’re not careful. We’ll talk more about this later.

There are a few neat things you can do with factor vectors. By changing the levels, you can quickly change all occurrences of a string at once. For example:

print(fac_vec)
levels(fac_vec) <- c("peas", "carrots")  # Change the levels of fac_vec
fac_vec

[1] cheese   crackers cheese   crackers cheese   crackers cheese  
Levels: cheese crackers
[1] peas    carrots peas    carrots peas    carrots peas   
Levels: peas carrots

There is more to be said about factors, but this is all we will explore at this point.

In newer versions of R, all strings are treated like factors behind the scenes, meaning there’s really no difference between factor and character types in terms of how much space they take up in the computer’s memory. However, R still treats the two types differently, so it’s important to remember that they are different!

This video discusses coercion, sequences, and factors.

https://youtu.be/iusiO1dRQdY

4.3.1.6 Combining Vectors

Given two vectors, it’s easy to combine them into one vector:

a <- c(1, 2, 3)
b <- c(4, 5, 6, 7)
c(a, b)  # Combine vectors a and b

[1] 1 2 3 4 5 6 7

The combine function (c) is smart enough to recognize that a and b are vectors, and performs concatenation to create the resultant longer vector. You can also use the combine function to add a single element to the end of a vector:

a <- c("CEO", "CFO")  # Initialize 
a <- c(a, "CTO")      # Redefine a by combining a with a new element
a

[1] "CEO" "CFO" "CTO"

In R, there may sometimes be more than one way to do the same thing, and one of the ways might be much faster or take much less computer memory to do. In other words, two sets of R commands can be correct, but one may perform better than the other. Writing “performant” (high performance) code is an advanced topic that we will not discuss much in this introductory course. You’ve just seen one way to add an element to the end of a vector, but if you do this a lot (perhaps in a for loop, which we’ll talk about later), it can be very slow. In this situation you’re better off creating the whole vector at once and updating each element as needed.

What if you try to combine vectors of different types?

a <- c(1, 2, 3)
b <- c("four", "five")
c(a, b)

[1] "1"    "2"    "3"    "four" "five"

Again, we see that the c function has converted all elements to be character strings, and the resultant vector is a character vector. Since we’ve seen type conversion arise a few times now, it’s appropriate to talk more explicitly about how it works. We’ll do that in the next section.

4.3.1.7 Type Conversion

There may be times when you’d like to convert from one type of data into another. An example would be the character string "1", which R does not view as a number. Therefore, the following does not work:

"1" + "2"  # R can't add two character strings

Error in "1" + "2": non-numeric argument to binary operator

To remedy issues like this, R provides functions in order to convert from one data type into another:

as.character: converts to character
as.numeric: converts to numeric
as.logical: converts to logical
as.factor: converts to factor

Using these functions, R will “do its best” to convert whatever you start with into the desired data type, but it’s not always possible to make the conversion. Below are a few examples which do and don’t work well.

Converting from a numeric to a character vector is always possible:

x <- c(3, 2, 1)

y <- as.character(x)   # Here's how to convert to a character vector
print(x)
print(y)

[1] 3 2 1
[1] "3" "2" "1"

However, converting from a character vector to a numeric only works if the characters represent numbers. Any element that won’t convert will be given

w <- c("1", "12.3", "-5", "22")   # This character vector can be converted to numeric
as.numeric(w)

[1]  1.0 12.3 -5.0 22.0

v <- c("frank", "went", "to", "mars")   # This character vector can't be converted to numeric
as.numeric(v)

Warning: NAs introduced by coercion

[1] NA NA NA NA

None of the elements can be converted into a number, so R prints a warning message, and the result is an NA in each element, which stands for “not available”. NA indicates that a value is missing, and can arise in many different ways, which we will not explain here.

NA values have interesting behavior in R. Generally, anything that “touches” an NA becomes an NA. You can try out these commands for yourself to see what happens:

NA * 0

NA - NA

c(NA, 1, 2)

If only part of a vector can be converted, then the result will contain some converted values and some NA’s:

u <- c("1.2", "chicken", "33")
as.numeric(u)

Warning: NAs introduced by coercion

[1]  1.2   NA 33.0

What other conversions are possible? Character vectors can also be converted into logical:

s <- c("TRUE", "FALSE", "T", "F", "cat")   # All but the last element can be converted to logical
as.logical(s)

[1]  TRUE FALSE  TRUE FALSE    NA

Based on the examples we’ve seen before, it should make sense that numeric vectors containing 0 or 1 can also be converted into a logical vector:

as.logical(c(1, 0, 1, 0))  # Here we create the vector and convert it in the same line

[1]  TRUE FALSE  TRUE FALSE

Logical vectors can also be converted into character or numeric vectors. Based on what you know, make a prediction about what the following commands will produce:

as.numeric(c(T, F, F, T)) as.character(c(T, F, F, T))

Check your predictions by running the commands in R.

Remember that “solo” objects are just vectors of length 1, so any of these type conversions should work on a single object as well, like so:

as.numeric("99")

[1] 99

Along with the conversion functions as...., there are companion functions which simply check whether a vector is of a certain type:

is.character: checks if character
is.numeric: checks if numeric
is.logical: checks if logical
is.factor: checks if factor

Here are some examples:

a <- c("1", "2", "3")
is.character(a)

[1] TRUE

is.numeric(a)

[1] FALSE

a <- as.numeric(a)
is.character(a)

[1] FALSE

is.numeric(a)

[1] TRUE

As we’ve seen, type conversion is sometimes performed automatically, specifically when using the combine function (c).

To understand more about this, try typing ?c to bring up the documentation, and have a look at the “Details” section.

This video finishes the discussion of vectors.

https://youtu.be/XKdZzHBRO9o

4.3.2 Matrices

Not all data can be arranged as an ordered set of elements, so R has other data structures besides vectors. Another data type is the matrix, which can be thought of as a grid of numbers. Here’s an example of creating a grid:

data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
A <- matrix(data, 3, 3)
A

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Here we’ve made a matrix with three rows and columns, by first creating a vector called data, and using the matrix function and giving it the data, the number of rows, and the number of columns.

Notice that R fills the matrix one column at a time, from left to right.

Here’s how you access the data within a matrix:

A[1,1]  # Get the first element of the first row

[1] 1

A[2,3]  # Get the third element of the second row

[1] 8

A[1,]  # Get the entire first row

[1] 1 4 7

A[,3]  # Get the entire third column

[1] 7 8 9

Just like with vectors, square brackets must be used to access the elements of a matrix. Don’t use parentheses like this: A(1,2).

diag(A)  # Get the diagonal elements of A

[1] 1 5 9

You can get the shape of a matrix with the dim function:

dim(A)  # How many rows & columns does A have?

[1] 3 3

Which gives an integer vector telling us A has three rows and three columns.

In R, create the matrix A above, and write code to compute the first element of the second row times the third element of the third row.

You can do some simple math with matrices, like this:

A + 1  # Add a number to each element of the matrix

     [,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9
[3,]    4    7   10

A * 2  # Multiply each element by a number

     [,1] [,2] [,3]
[1,]    2    8   14
[2,]    4   10   16
[3,]    6   12   18

A ^ 2  # Square each element

     [,1] [,2] [,3]
[1,]    1   16   49
[2,]    4   25   64
[3,]    9   36   81

If you’ve worked with matrices in a math class, you may have talked about some of the following operations: Here we can find the transpose of a matrix (the rows become columns and the columns become rows):

t(A)  # Find the transpose

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

# Find the trace:
sum(diag(A))  # Get the diagonal elements of A, then sum them

[1] 15

Here are some things you can do with two matrices:

B <- matrix(1, 3, 3)  # Create a 3x3 matrix of all 1's (notice how we only need one 1?)

A + B  # Add two matrices together

     [,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9
[3,]    4    7   10

A * B  # Multiply the elements of A and B together

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

A %*% B  # Perform matrix multiplication between A and B

     [,1] [,2] [,3]
[1,]   12   12   12
[2,]   15   15   15
[3,]   18   18   18

Notice the difference between the last two examples? Just using * multiplies the matching elements of A and B together, while the new operator %*% performs matrix multiplication, like you may have seen in a linear algebra class.

In R, perform matrix multiplication between A and the transpose of A.

If two matrices don’t have the same shape, you won’t be able to add their elements together:

C <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 3, 4)

A * C

Error in A * C: non-conformable arrays

The error message: non-conformable arrays tells us that A and C have different shapes, so it’s impossible to multiply their matching elements together. But you can still perform matrix multiplication between them:

A %*% C

     [,1] [,2] [,3] [,4]
[1,]   30   66  102  138
[2,]   36   81  126  171
[3,]   42   96  150  204

Any data type (numeric, character, etc.) can be represented as a vector, but matrices only work with numeric types.

A matrix is just a special case of a data structure called an array. Matrices have two dimensions (row and column), and arrays can have any number of dimensions (1, 2, 3, 4, 5, etc.). We won’t discuss arrays in this course much.

Try running the following code in R, which should produce a warning message:

data <- c(4.5, 6.1, 3.3, 2.0); A <- matrix(data, 2, 3);

Read the warning message and the code carefully, and see if you can figure out the problem. What change would you make to the above code so that it runs?

Remember everything inside a vector must have the same data type. Here we’ve seen that matrices all have to be numeric data types. Wouldn’t it be nice if there were a way to store objects of different types (without doing type conversion)? This is what lists can do!

It turns out, matrices can work with non-numeric types as well! But like vectors, mixed type matrices are not allowed. For this, you’ll have to use a dataframe, as we discuss later.

This video gives an introduction to Matrices.

https://youtu.be/hknL1EbrIB4

4.3.3 Lists

A List is an ordered set of components. This may sound similar to a vector, but the important difference is that with lists there is no requirement that the components have the same data type. Here is an example of a list:

A <- list(42, "chicken", TRUE)
A

[[1]]
[1] 42

[[2]]
[1] "chicken"

[[3]]
[1] TRUE

Here we see each component of the list printed in order, with [[1]], [[2]], and [[3]] indicating the first, second, and third components. To access just one of the components, use double square brackets ([[ and ]]):

# Get the second component of A
A[[2]]

[1] "chicken"

Notice that each component of A is a different data type (numeric, character, logical), which is not a problem for lists. Nothing was converted automatically, as we saw happen with vectors. Here’s how to add a component to an existing list:

A[[4]] <- matrix(c(1, 2, 3, 4, 5, 6), 2, 3)

Notice how we accessed component 4, which didn’t exist yet, and assigned it a value. We actually added a matrix as the fourth component, this is not possible with vectors! Now A has four components:

[[1]]
[1] 42

[[2]]
[1] "chicken"

[[3]]
[1] TRUE

[[4]]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Lists can even contain other lists!

If you try to assign a list to be one of its own components (e.g. A[[5]] <- A), then R will make a copy of A and assign the copy to be one of the components of A. Thus there is no “self reference”, and no issue with Russel’s Paradox.

So far we’ve seen vectors, lists, matrices, and arrays.

How are they different and how are they similar?

List components can also have names. Here we add an component with a name:

A[["color"]] <- "yellow"
A

[[1]]
[1] 42

[[2]]
[1] "chicken"

[[3]]
[1] TRUE

[[4]]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

$color
[1] "yellow"

Notice how this new component displays differently? Instead of showing [[5]], the component is labeled with a dollar sign, then its name: $color.

We first use the term name for individual variables, but here we see that components of lists can also have names. When we encounter data frames later, we’ll see how each row and column can also have its own name.

You can access components using their name in two ways:

A[["color"]]  # Use double square brackets to access a named element

[1] "yellow"

A$color  # Use dollar sign to access a named element

[1] "yellow"

But the color component is also the fifth component of the list, so we can access it like this as well:

A[[5]]

[1] "yellow"

Here’s a new list created by giving names to each element:

person <- list(name = "Millard Fillmore", occupation = "President", birth_year=1800)
person

$name
[1] "Millard Fillmore"

$occupation
[1] "President"

$birth_year
[1] 1800

Below is some R code:

S1$year <- S2[2,2] + S3[[“age”]]

Assuming this code works, what are the data structures of S1, S2, and S3?

purrr is a very useful R package for working with lists.

4.3.3.1 Lists and Vectors

Lists and Vectors are different data types, but in some ways they behave the same: Find the length of a list:

length(person)  # Same for vectors and lists!

[1] 3

Combine two lists:

c(A, person)  # Same for vectors and lists!

[[1]]
[1] 42

[[2]]
[1] "chicken"

[[3]]
[1] TRUE

[[4]]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

$color
[1] "yellow"

$name
[1] "Millard Fillmore"

$occupation
[1] "President"

$birth_year
[1] 1800

A == "chicken"  # Compare against a character

                        color 
FALSE  TRUE FALSE FALSE FALSE

However, there are some things that vectors can do that lists can’t:

A + 1  # Add a number to each component (won't work)

Error in A + 1: non-numeric argument to binary operator

A == T  # Compare against a logical (won't work)

Error: 'list' object cannot be coerced to type 'logical'

A == 12  # Compare against a numeric (won't work)

Error: 'list' object cannot be coerced to type 'double'

So there are trade-offs when deciding whether a list or a vector is most appropriate.

This video discusses lists.

https://youtu.be/-Y02JkqDlWU

4.3.3.2 Lists of Vectors

Certain types of lists show up all the time in R, lists of vectors:

vec_1 <- c("Alice", "Bob", "Charlie")
vec_2 <- c(99.4, 87.6, 22.1)
vec_3 <- c("F", "M", "M")
special_list <- list(name = vec_1, grade = vec_2, sex = vec_3)
special_list

$name
[1] "Alice"   "Bob"     "Charlie"

$grade
[1] 99.4 87.6 22.1

$sex
[1] "F" "M" "M"

Here, each list stores a different piece of information about several people. Here’s another example:

rocks <- list(specimen=c("A", "B", "C"),
              type=c("igneous", "metamorphic", "sedimentary"),
              weight=c(21.2, 56.7, 3.8),
              age=c(120, 10000, 5000000)
              )
rocks

$specimen
[1] "A" "B" "C"

$type
[1] "igneous"     "metamorphic" "sedimentary"

$weight
[1] 21.2 56.7  3.8

$age
[1]     120   10000 5000000

When defining the rocks list, we’ve spread the command across multiple lines for clarity. The commas at the end of some of the lines separate the elements of the list. R will continue reading the next line until it finds the closing parenthesis, ).

There are so many sets of data that fit into this pattern, that R has a special data type called a data frame, which we will discuss in the next section.

Create a matrix, a character vector, and a logical object, then place them all in a new list called “my_list”, with the names “my_matrix”, “my_vector”, and “my_logical”.

4.3.4 Data Frames

At their core, data frames are just lists of vectors, but they also have some extra features as well. Here, we’ll re-define the rocks list from the previous section, but this time we’ll create it as a data frame:

rocks <- data.frame(type = c("igneous", "metamorphic", "sedimentary"),
                    weight = c(21.2, 56.7, 3.8),
                    age = c(120, 10000, 5000000))
rocks  # We'll add the specimen names later

ABCDEFGHIJ0123456789

type <chr>	weight <dbl>	age <dbl>
igneous	21.2	120
metamorphic	56.7	10000
sedimentary	3.8	5000000

Now when R displays rocks, it arranges the data in rows and columns, similar to how it displays matrices. Unlike matrices, however, the columns don’t all have to be the same data type!

Remember that a data frame is basically a list of vectors, so even though it can contain different types of data (because it is a list), each column is a vector, which means each column must have all elements of the same type.

The names of the columns are the names of the components of rocks, and the rows contain the data from each component vector. Remember that a data frame is basically a list of vectors, so we can access a component by its position or name:

rocks[[1]]

[1] "igneous"     "metamorphic" "sedimentary"

rocks$weight

[1] 21.2 56.7  3.8

However, we can also access a data frame as if it were a matrix:

rocks[1,3]  # Get the datum from the first row, third column.

[1] 120

rocks[1,]  # Get the first row, this gives another data frame with a single row.

ABCDEFGHIJ0123456789

	type <chr>	weight <dbl>	age <dbl>
1	igneous	21.2	120

rocks[,2]  # Get the second column, this gives a vector.

[1] 21.2 56.7  3.8

Here’s how to get the shape of a data frame (number of rows and columns):

dim(rocks)

[1] 3 3

If we start with a list of vectors, we can convert it to a data frame with as.data.frame:

people <- list(name = c("Alice", "Bob", "Charlie"), 
               grade = c(99.4, 87.6, 22.1), 
               sex = c("F", "M", "M"))
as.data.frame(people)

ABCDEFGHIJ0123456789

name <chr>	grade <dbl>	sex <chr>
Alice	99.4	F
Bob	87.6	M
Charlie	22.1	M

R comes with pre loaded with several data frames, such as mtcars, which contains data from the 1974 Motor Trend Magazine for 32 different automobiles:

mtcars

ABCDEFGHIJ0123456789

	mpg <dbl>	cyl <dbl>	disp <dbl>	hp <dbl>	drat <dbl>	wt <dbl>	qsec <dbl>	vs <dbl>	am <dbl>
Mazda RX4	21.0	6	160.0	110	3.90	2.620	16.46	0	1
Mazda RX4 Wag	21.0	6	160.0	110	3.90	2.875	17.02	0	1
Datsun 710	22.8	4	108.0	93	3.85	2.320	18.61	1	1
Hornet 4 Drive	21.4	6	258.0	110	3.08	3.215	19.44	1	0
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0
Valiant	18.1	6	225.0	105	2.76	3.460	20.22	1	0
Duster 360	14.3	8	360.0	245	3.21	3.570	15.84	0	0
Merc 240D	24.4	4	146.7	62	3.69	3.190	20.00	1	0
Merc 230	22.8	4	140.8	95	3.92	3.150	22.90	1	0
Merc 280	19.2	6	167.6	123	3.92	3.440	18.30	1	0

A list of included data sets in R can be found by running data().

Look at the column of car names on the left side of the mtcars data frame. It doesn’t have a column name (like mpg, cyl, etc.), because it’s not actually a column. These are row names, and you can access them like this:

row.names(mtcars)

 [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
 [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
 [7] "Duster 360"          "Merc 240D"           "Merc 230"           
[10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
[13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
[16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
[19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
[22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
[25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
[28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
[31] "Maserati Bora"       "Volvo 142E"

You can also access the column names like this:

names(mtcars)

 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"

These are two examples of attributes, which are like extra information which are attached to an object. We’ll discuss attributes more later when we discuss R objects. The column names and row names are just vectors, and you can access / modify them as such:

row.names(rocks) <- c("A", "B", "C")
rocks

ABCDEFGHIJ0123456789

	type <chr>	weight <dbl>	age <dbl>
A	igneous	21.2	120
B	metamorphic	56.7	10000
C	sedimentary	3.8	5000000

names(rocks)[[1]] <- "rock type"
rocks

ABCDEFGHIJ0123456789

	rock type <chr>	weight <dbl>	age <dbl>
A	igneous	21.2	120
B	metamorphic	56.7	10000
C	sedimentary	3.8	5000000

Row and column names are allowed to have spaces in them, but you must be careful how you access them. The following code will not work: rocks$rock type , because R will stop looking for the name you are referencing once it encounters a space. To access this column, you must enclose the reference in “backticks” ( ` ) like so: rocks$`rock type`.

Look at the set of available data sets in R, and pick 2 data sets. For each data set, answer the following questions:

What are the column names?
What are the row names?
What is the data type for each column?
How many rows are in the data frame?
How many columns are in the data frame?

This is the last section you should include in Progress Check 2. Knit your output document and submit on Canvas.

Any feedback for this section? Click here

This video discusses lists of vectors.

https://youtu.be/9BGRIC1js04