4.2 Data Types

Think of all the things you might be expected to remember. These different items can probably be categorized into different types of information, like phone numbers, passwords, birthdays, historical events, and math theorems for example. R was designed to handle different types of data as well, though the types are different from the examples just given.

R can store and manipulate different pieces of information, called data, and these data can be of several different types. Here are some examples of different types of data:

a <- 12.34      # a is a number
b <- "Hello"    # b is a string of characters
c <- TRUE       # c is a special type of data that is either true or false

R has special names for these examples, and there are other types of data as well. Below, we’ll talk about each data type, one at a time.

The term “data” is actually plural! A single piece of data is called a “datum”. So to refer to a set of data, you would say “these data”, and to refer to a single piece of data, you would say “this datum”.

4.2.1 Numeric

Many data exist as numbers, and R has a specific data type for storing those numbers, called the numeric data type. Here are some examples:

a <- -11
b <- 13.37
c <- 1/137

Note that integers, decimals, and fractions are all examples of numeric data in R. We can prove that these are all the same data type using the class function:

class(a)
[1] "numeric"
class(b)
[1] "numeric"
class(c)
[1] "numeric"

So far, we’ve defined the a object a few different times, which is allowed! Every time we define a, R forgets the old value. Therefore we should reuse object names with caution, because it can become difficult to remember what the latest value is! When we discuss loops later, however, we will use code to automatically change the value of an object several times in order to do useful things!

When you have numeric objects, you may want to perform math operations on them. R has a number of built in functions to deal with numeric data, here are some examples:

print(a + b)  # Add two numeric values
print(b - c)  # Subtract two numeric values 
print(a * b)  # Multiply two numeric values
print(a^3)    # Take the cube of a numeric value
[1] 2.37
[1] 13.3627
[1] -147.07
[1] -1331

When performing math on numeric objects, R will obey order of operations, so the following two examples will give different results:

a + b * c    # R will perform the multiplication before the addition
[1] -10.90241
(a + b) * c  # R will perform the addition first, then the multiplication 
[1] 0.01729927

Notice that we’ve added extra spaces in the code to help you understand what’s going on. This is another example of code style, which we’ll talk more about later.

Wait a second, we didn’t use the the print function just now, but R still displayed the results of the calculations! What is going on? This behavior is peculiar to something called R Markdown, which is what we used to create this book (yes, this book was creating using R! Pretty cool, huh?). If the last command given in a code block produces a result, and you don’t assign that result to anything (using <-), then R will print out that result. This means we don’t always have to use the print function when we want to display R output.

Notice all the decimal points? R can be very precise when performing computations. However, viewing all of the digits stored by R can be distracting and hard to read. You can show just some of the digits by using the round function:

a <- 0.123456
round(a, 3)
[1] 0.123

It also turns out that R stores more digits than what it shows when it prints, though we won’t go into detail on that now.

This video discusses numerics.

https://youtu.be/juscNzIrmJQ

4.2.2 Integer

In general, numeric data in R are treated as if they can be any decimal number (technically, they are a double precision number, if you know what that means; if not, it’s not important right now). However, there is a way to specify that a specific numeric object is an integer, by placing an “L” at the end of it, like so:

x <- 20   # x will be a numeric object
y <- 20L  # y will be an integer object
class(x)
[1] "numeric"
class(y)
[1] "integer"

Integers take half of the space in a computer’s memory or hard drive, so if you are working with or storing a lot of numbers which are integers, it might make sense to declare them as integer type in R. This will make more sense when we discuss vectors later.

This video discusses integers.

https://youtu.be/rNkEAPsipCk

4.2.3 Character

Not all data are numbers! R also has the capability to store strings of characters, and this is the aptly named character type (or sometimes called a character string or just string). Here are some examples:

d <- "Hello"         # This string is defined with   *double*   quotes 
e <- 'how are you?'  # This string is defined with   *single*   quotes!
print(d)
print(e)
[1] "Hello"
[1] "how are you?"

Notice how we can define character strings using single quotes or double quotes, as long as we are consistent. So this is not valid:

# Note the mismatched single/double quotes:
f <- "this does not work' 
Error: :2:6: unexpected INCOMPLETE_STRING
1: # Note the mismatched single/double quotes:
2: f <- "this does not work' 
        ^

So, make sure you are consistent. However, you may see another problem with this: some strings contain quotes in them, like this:

g <- 'This won't work'
Error: :1:16: unexpected symbol
1: g <- 'This won't
                   ^

Since single quotes are being used to define the string, they can’t be used in the string itself, because R will “think” the string is ending at the second '. One option is to change the defining quotes to be double quotes, then the single quote will be safely included in the string:

g <- "I'm happy that this works!"
print(g)
[1] "I'm happy that this works!"

Another option is to use a backslash when using quotes inside the string, so that R “knows” the quote is part of the string and not ending the definition of the string:

g <- 'I\'ve found another way that works!'
print(g)
[1] "I've found another way that works!"

Notice that when we define g we place a \' anywhere in the string where we want a ' to be, but when printed out, we see that R has interpreted it as just a '. Notice also that we didn’t have to change the defining quotes to be double quotes in this case. The backslash is called the escape character, and it signifies that what follows it should be interpreted literally by R, and any special meaning should be ignored.

Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another backslash, which functions as an escape character, like so: g <- “here is a backslash: \\”. You will see both backslashes when using the print function (which is meant for any data type), but if you use the special cat function (which is meant for character types specifically), all escape characters will be “processed”, and you will see just a single backslash.

Try the same thing with the newline character, \n!

To see a list of special characters, try typing ?Quotes into the R console

Here is an important string to know about:

h <- ""  # This string is empty!

h is a character string with no characters, called an empty string.

You can perform math on numeric data, so what can you do with strings? The answer is, quite alot, using some functions that R provides. Here are some of them:

nchar(g)  # This prints out the number of characters in a string
[1] 34
substr(g, 6, 10)  # This extracts just part of a string, using the start and stop positions you provide
[1] "found"
strsplit(g, " ")  # This splits the string up using a specified "delimiter" string, a single space in this case 
[[1]]
[1] "I've"    "found"   "another" "way"     "that"    "works!" 

When you split a string, this produces a list containing a vector of character strings. This is an example of how data can be organized in a structured way. We’ll talk more about so called data structures in the next section.

paste("hello", "world")  # This combines multiple strings together into one string!
[1] "hello world"

Remember that you can learn more about a function using ?. Type ?paste into R and read the documentation carefully. Can you determine what the “sep” argument does? What do you think would happen if we ran the code print(“hello”, “world”, sep=“-”)?

There are other ways of manipulating strings, but we’ll return to this later.

This video discusses characters.

https://youtu.be/1JgmnulM_4g

4.2.4 Logical

Numeric objects can be any number, character objects can be any string of characters, but logical objects can only be two different values: True or False.
Logical data types are also known as “boolean” data types. Here we define some Logical objects:

a <- TRUE
b <- FALSE
c <- T
d <- F
print(a)
[1] TRUE
print(b)
[1] FALSE
print(c)
[1] TRUE
print(d)
[1] FALSE

So you can see that we can define a logical object using the full name or just the first letter. Here’s how to get the “opposite” of a logical object

!a
[1] FALSE

Logical data are the simplest type, but there are actually some clever things you can do with them. You can test whether simple mathematical expressions are true or false.

# Create x and y
x <- 3
y <- 4
# Check: is x less than y? (should give TRUE)
x < y
[1] TRUE

The third command is a way to check if the value of x is less than the value of y. The result of this comparison is a logical, in this case, TRUE. Here are other ways of making comparisons:

x <= y  # Check if x is less or equal to y
[1] TRUE
x == y  # Check if x is equal to y (note how you need two equals signs)
[1] FALSE
x >= y  # Check if x is greater or equal to y 
[1] FALSE
x >= y  # Check if x is greater than y 
[1] FALSE

Comparisons can be made using strings as well:

x <- "Hello"
y <- "hello"
x == y
[1] FALSE

Remember that R is case sensitive, and two strings must be exactly the same to be considered equal.

Of course any object (like x) will be equal to itself:

x == x
[1] TRUE

Surprisingly, logicals can be treated as numerics, where TRUE is treated as 1 and FALSE is treated as 0. Here are some examples:

TRUE + TRUE  # TRUE is treated as 1
[1] 2
FALSE * 7  # FALSE is treated as 0
[1] 0
(2 < 3) + (1 == 2)  # What's going on here? 
[1] 1

The last example deserves some thought. Start with each expression in parentheses, and decide whether it will evaluate to true or false. Then remember how logicals are treated as numbers, and determine what happens when you add them together.

Numeric, integer, character, and logical data types are probably the most important data types to know in R, but there are others that were not covered here. These include:

  • complex
  • factor
  • raw

At least one of these (factor) will be covered later, but you can find more information about the other types here

In the R console, type the following R commands and observe the result

x <- "5"

y <- 5

z <- (x == y)

  1. What data type is x? (check with R using the class function)
  2. What data type is y?
  3. What data type is z?
  4. What is the value of z, and why does this make sense?

Now that we’ve discussed different types of data, we’ll now see how they can be structured together in meaningful ways.

What about dates? R actually has three built-in date classes. This can be confusing at first, but packages like lubridate make it easy to work with dates in R.

This video discusses logicals.

https://youtu.be/GH9AZcexokU