Examples of Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (number of rows, number of columns)

m <- matrix(nrow = 2, ncol = 3)          # create matrix with 2 row and 3 column
m
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
dim(m)               # get dimentions
## [1] 2 3
attributes(m)        # get attribute of m
## $dim
## [1] 2 3
m <- matrix(1:6, nrow = 2, ncol = 3)    # create matrix with values
m
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
m[1,2]              # get the element at first row and second column
## [1] 3
m <- 1:10           # get vectors
m
##  [1]  1  2  3  4  5  6  7  8  9 10
dim(m)<- c(2,5)     # create matrix directly from vectors by adding dimension attribute
m
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
x <- 1:3
y <- 10:12
cbind(x,y)         # create matrix by column binding
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
rbind(x,y)         # create matrix by row binding
##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12

Examples of Factors

Factors are used to represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label. Factors are important in statistical modeling and are treated specially by modelling functions like lm() and glm(). Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

x <- factor(c("yes", "yes", "no", "yes", "no"))   
x
## [1] yes yes no  yes no 
## Levels: no yes
table(x)
## x
##  no yes 
##   2   3
unclass(x)          # See the underlying representation of factor 
## [1] 2 2 1 2 1
## attr(,"levels")
## [1] "no"  "yes"
x <- factor(c("yes", "yes", "no", "yes", "no"),  levels <- c("yes", "no"))     # The order of the levels of a factor can be set using the levels argument to factor()
x
## [1] yes yes no  yes no 
## Levels: yes no

Examples of Missing values

Missing values are denoted by NA or NaN for q undefined mathematical operations. is.na() is used to test objects if they are NA is.nan() is used to test for NaN A NaN is also NA but the converse is not true

x <- c(1, 2, NA, 10, 3)    ## Create a vector with NAs in it
is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE
x <- c(1, 2, NaN, NA, 4)   ## Now create a vector with both NA and NaN values
is.na(x)
## [1] FALSE FALSE  TRUE  TRUE FALSE
is.nan(x)
## [1] FALSE FALSE  TRUE FALSE FALSE

Examples of Data Frames

Data frames are used to store tabular data in R. Data frames are represented as a special type of list where every element of the list has to have the same length Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Data frames have a special attribute called row.names which indicate information about each row of the data frame.

x <- data.frame(foo = 1:4, bar = c(T, T, F, F))       # create a data frame
x
##   foo   bar
## 1   1  TRUE
## 2   2  TRUE
## 3   3 FALSE
## 4   4 FALSE
nrow(x)                   
## [1] 4
ncol(x)
## [1] 2
data.matrix(x)                # convert data frame to a matrix
##      foo bar
## [1,]   1   1
## [2,]   2   1
## [3,]   3   0
## [4,]   4   0

Examples of names

R objects can have names, which is very useful for writing readable code and self-describing objects

x <- 1:3
names(x)
## NULL
names(x) <- c("New York", "Seattle", "Los Angeles")     # set the names for vector x
x
##    New York     Seattle Los Angeles 
##           1           2           3
x <- list("Los Angeles" = 1, Boston = 2, London = 3)    # list can also have names 
x
## $`Los Angeles`
## [1] 1
## 
## $Boston
## [1] 2
## 
## $London
## [1] 3
m <- matrix(1:4, nrow = 2, ncol = 2)
dimnames(m) <- list(c("a", "b"), c("c", "d"))        # Matrices can have both column and row names.
m
##   c d
## a 1 3
## b 2 4
colnames(m) <- c("h", "f")                           # set column names 
rownames(m) <- c("x", "z")                           # set row names
m
##   h f
## x 1 3
## z 2 4

Examples of subsetting operation

There are three operators that can be used to extract subsets of R objects. • The [ operator always returns an object of the same class as the original. It can be used to select multiple elements of an object. • The [[ operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. • The $operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[.

x <- c("a", "b", "c", "c", "d", "a") 
x[1] ## Extract the first element
## [1] "a"
x[2] ## Extract the second element 
## [1] "b"
x[1:4] ##  extract multiple elements
## [1] "a" "b" "c" "c"
x[c(1, 3, 4)]
## [1] "a" "c" "c"
u <- x> "a"
x[u]    ## extract elements of a vector that satisfy a given condition.
## [1] "b" "c" "c" "d"
x[x>"a"]
## [1] "b" "c" "c" "d"
x <- matrix(1:6, 2, 3) 
x
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
x[1, 2]   ## get row 1 column 2 element in matrix x
## [1] 3
x[2, 1]
## [1] 2
x[1, ] ## Extract the first row 
## [1] 1 3 5
x[, 2] ## Extract the second column 
## [1] 3 4
x[1, 2, drop = FALSE]  ## turn off the default returning vector
##      [,1]
## [1,]    3
x[1, ]
## [1] 1 3 5
x[1, , drop = FALSE] 
##      [,1] [,2] [,3]
## [1,]    1    3    5
x <- list(foo = 1:4, bar = 0.6) 
x
## $foo
## [1] 1 2 3 4
## 
## $bar
## [1] 0.6
x[[1]]      ## get the first element in list use [[]]
## [1] 1 2 3 4
x[["bar"]]  ## get the element bar
## [1] 0.6
x$bar 
## [1] 0.6
x <- list(foo = 1:4, bar = 0.6, baz = "hello")  # create a list
name <- "foo"
x[[name]]   ## computed index for "foo" 
## [1] 1 2 3 4
x$foo    ## get the element with name foo
## [1] 1 2 3 4
x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) # create a nested list
x[[c(1, 3)]] ## Get the 3rd element of the 1st element
## [1] 14
x[[1]][[3]]  ## same as above
## [1] 14
x[[c(2, 1)]]## 1st element of the 2nd element 
## [1] 3.14
x <- list(aardvark = 1:5)  ## create a new list
x$a                        ## partial matching of a list element name
## [1] 1 2 3 4 5
x[["a"]]                   ## by default, exact matching of a list element name
## NULL
x[["a", exact = FALSE]]    ## partial matching of a list element name
## [1] 1 2 3 4 5

Examples of removing NA values

A common task in data analysis is removing missing values (NAs).

x <- c(1, 2, NA, 4, NA, 5)
bad <- is.na(x)
print(bad)
## [1] FALSE FALSE  TRUE FALSE  TRUE FALSE
x[!bad]    ## removing the NA values
## [1] 1 2 4 5
## creating two vectors with missing values, now we want to take subset with no missing values in both vectors
x <- c(1, NA,3, 4, NA, 5)              
y <- c("a", "b", NA, "d", NA, "f")
good <- complete.cases(x, y)
good
## [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE
x[good]    # good cases in x
## [1] 1 4 5
y[good]    # good cases in y
## [1] "a" "d" "f"

Examples of vectorized operations

Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects. This allows you to write code that is efficient, concise, and easier to read than in non-vectorized languages.

x <- 1:4 
y <- 6:9
z <- x + y 
z
## [1]  7  9 11 13
x >= 2
## [1] FALSE  TRUE  TRUE  TRUE
x-y
## [1] -5 -5 -5 -5
x*y
## [1]  6 14 24 36
x <- matrix(1:4, 2, 2)
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
y <- matrix(rep(10, 4), 2, 2) 
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## element-wise multiplication
x*y
##      [,1] [,2]
## [1,]   10   30
## [2,]   20   40
## element-wise division
x/y
##      [,1] [,2]
## [1,]  0.1  0.3
## [2,]  0.2  0.4
## true matrix multiplication
x %*% y
##      [,1] [,2]
## [1,]   40   40
## [2,]   60   60