4.2 Data Types
Think of all the things you might be expected to remember. These different items can probably be categorized into different types of information, like phone numbers, passwords, birthdays, historical events, and math theorems for example. R was designed to handle different types of data as well, though the types are different from the examples just given.
R can store and manipulate different pieces of information, called data, and these data can be of several different types. Here are some examples of different types of data:
a <- 12.34 # a is a number
b <- "Hello" # b is a string of characters
c <- TRUE # c is a special type of data that is either true or false
R has special names for these examples, and there are other types of data as well. Below, we’ll talk about each data type, one at a time.
The term “data” is actually plural! A single piece of data is called a “datum”. So to refer to a set of data, you would say “these data”, and to refer to a single piece of data, you would say “this datum”.
4.2.1 Numeric
Many data exist as numbers, and R has a specific data type for storing those numbers, called the numeric data type. Here are some examples:
Note that integers, decimals, and fractions are all examples of numeric data in R.
We can prove that these are all the same data type using the class
function:
[1] "numeric"
[1] "numeric"
[1] "numeric"
So far, we’ve defined the a
object a few different
times, which is allowed! Every time we define a
, R
forgets the old value. Therefore we should reuse object names
with caution, because it can become difficult to remember what the
latest value is! When we discuss loops later, however,
we will use code to automatically change the value of an object several
times in order to do useful things!
When you have numeric objects, you may want to perform math operations on them. R has a number of built in functions to deal with numeric data, here are some examples:
print(a + b) # Add two numeric values
print(b - c) # Subtract two numeric values
print(a * b) # Multiply two numeric values
print(a^3) # Take the cube of a numeric value
[1] 2.37
[1] 13.3627
[1] -147.07
[1] -1331
When performing math on numeric objects, R will obey order of operations, so the following two examples will give different results:
[1] -10.90241
[1] 0.01729927
Notice that we’ve added extra spaces in the code to help you understand what’s going on. This is another example of code style, which we’ll talk more about later.
Wait a second, we didn’t use the the print
function just
now, but R still displayed the results of the calculations! What is
going on? This behavior is peculiar to something called R Markdown,
which is what we used to create this book (yes, this book was creating
using R! Pretty cool, huh?). If the last command given in a
code block produces a result, and you don’t assign that result to
anything (using <-
), then R will print out that result.
This means we don’t always have to use the print function when we want
to display R output.
Notice all the decimal points?
R can be very precise when performing computations.
However, viewing all of the digits stored by R can be distracting and hard to read.
You can show just some of the digits by using the round
function:
[1] 0.123
It also turns out that R stores more digits than what it shows when it prints, though we won’t go into detail on that now.
This video discusses numerics.
4.2.2 Integer
In general, numeric data in R are treated as if they can be any decimal number (technically, they are a double precision number, if you know what that means; if not, it’s not important right now). However, there is a way to specify that a specific numeric object is an integer, by placing an “L” at the end of it, like so:
[1] "numeric"
[1] "integer"
Integers take half of the space in a computer’s memory or hard drive, so if you are working with or storing a lot of numbers which are integers, it might make sense to declare them as integer type in R. This will make more sense when we discuss vectors later.
This video discusses integers.
4.2.3 Character
Not all data are numbers! R also has the capability to store strings of characters, and this is the aptly named character type (or sometimes called a character string or just string). Here are some examples:
d <- "Hello" # This string is defined with *double* quotes
e <- 'how are you?' # This string is defined with *single* quotes!
print(d)
print(e)
[1] "Hello"
[1] "how are you?"
Notice how we can define character strings using single quotes or double quotes, as long as we are consistent. So this is not valid:
Error in parse(text = input): :2:6: unexpected INCOMPLETE_STRING
1: # Note the mismatched single/double quotes:
2: f <- "this does not work'
^
So, make sure you are consistent. However, you may see another problem with this: some strings contain quotes in them, like this:
Error in parse(text = input): :1:16: unexpected symbol
1: g <- 'This won't
^
Since single quotes are being used to define the string, they can’t be used in the string itself, because R will “think” the string is ending at the second '
.
One option is to change the defining quotes to be double quotes, then the single quote will be safely included in the string:
[1] "I'm happy that this works!"
Another option is to use a backslash when using quotes inside the string, so that R “knows” the quote is part of the string and not ending the definition of the string:
[1] "I've found another way that works!"
Notice that when we define g
we place a \'
anywhere in the string where we want a '
to be, but when printed out, we see that R has interpreted it as just a '
.
Notice also that we didn’t have to change the defining quotes to be double quotes in this case.
The backslash is called the escape character, and it signifies that what follows it should be interpreted literally by R, and any special meaning should be ignored.
Since backslash also has special meaning itself, if you want a
backslash in your string, you need to use another backslash, which
functions as an escape character, like so:
g <- “here is a backslash: \\”
. You will see both
backslashes when using the print
function (which is meant
for any data type), but if you use the special cat
function
(which is meant for character types specifically), all escape characters
will be “processed”, and you will see just a single backslash.
Try the same thing with the newline character, \n
!
To see a list of special characters, try typing ?Quotes
into the R console
Here is an important string to know about:
h
is a character string with no characters, called an empty string.
You can perform math on numeric data, so what can you do with strings? The answer is, quite alot, using some functions that R provides. Here are some of them:
[1] 34
substr(g, 6, 10) # This extracts just part of a string, using the start and stop positions you provide
[1] "found"
strsplit(g, " ") # This splits the string up using a specified "delimiter" string, a single space in this case
[[1]]
[1] "I've" "found" "another" "way" "that" "works!"
When you split a string, this produces a list containing a vector of character strings. This is an example of how data can be organized in a structured way. We’ll talk more about so called data structures in the next section.
[1] "hello world"
Remember that you can learn more about a function using
?
. Type ?paste
into R and read the
documentation carefully. Can you determine what the “sep” argument does?
What do you think would happen if we ran the code
print(“hello”, “world”, sep=“-”)
?
There are other ways of manipulating strings, but we’ll return to this later.
This video discusses characters.
4.2.4 Logical
Numeric objects can be any number, character objects can be any string of characters, but logical objects can only be two different values: True or False.
Logical data types are also known as “boolean” data types.
Here we define some Logical objects:
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
So you can see that we can define a logical object using the full name or just the first letter. Here’s how to get the “opposite” of a logical object
[1] FALSE
Logical data are the simplest type, but there are actually some clever things you can do with them. You can test whether simple mathematical expressions are true or false.
[1] TRUE
The third command is a way to check if the value of x
is less than the value of y
.
The result of this comparison is a logical, in this case, TRUE
.
Here are other ways of making comparisons:
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
Comparisons can be made using strings as well:
[1] FALSE
Remember that R is case sensitive, and two strings must be exactly the same to be considered equal.
Of course any object (like x
) will be equal to itself:
[1] TRUE
Surprisingly, logicals can be treated as numerics, where TRUE
is treated as 1
and FALSE
is treated as 0
.
Here are some examples:
[1] 2
[1] 0
[1] 1
The last example deserves some thought. Start with each expression in parentheses, and decide whether it will evaluate to true or false. Then remember how logicals are treated as numbers, and determine what happens when you add them together.
Numeric, integer, character, and logical data types are probably the most important data types to know in R, but there are others that were not covered here. These include:
- complex
- factor
- raw
At least one of these (factor) will be covered later, but you can find more information about the other types here
In the R console, type the following R commands and observe the result
x <- "5"
y <- 5
z <- (x == y)
- What data type is x? (check with R using the
class
function) - What data type is y?
- What data type is z?
- What is the value of z, and why does this make sense?
Now that we’ve discussed different types of data, we’ll now see how they can be structured together in meaningful ways.
What about dates? R actually has three built-in date classes. This can be confusing at first, but packages like lubridate make it easy to work with dates in R.
This video discusses logicals.
Any feedback for this section? Click here