R matrix, factor, and operations on missing value and other objects

Examples of Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (number of rows, number of columns)

m <- matrix(nrow = 2, ncol = 3)          # create matrix with 2 row and 3 column
m

##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA

dim(m)               # get dimentions

## [1] 2 3

attributes(m)        # get attribute of m

## $dim
## [1] 2 3

m <- matrix(1:6, nrow = 2, ncol = 3)    # create matrix with values
m

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

m[1,2]              # get the element at first row and second column

## [1] 3

m <- 1:10           # get vectors
m

##  [1]  1  2  3  4  5  6  7  8  9 10

dim(m)<- c(2,5)     # create matrix directly from vectors by adding dimension attribute
m

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

x <- 1:3
y <- 10:12
cbind(x,y)         # create matrix by column binding

##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12

rbind(x,y)         # create matrix by row binding

##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12

Examples of Factors

Factors are used to represent categorical data and can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label. Factors are important in statistical modeling and are treated specially by modelling functions like lm() and glm(). Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

x <- factor(c("yes", "yes", "no", "yes", "no"))   
x

## [1] yes yes no  yes no 
## Levels: no yes

table(x)

## x
##  no yes 
##   2   3

unclass(x)          # See the underlying representation of factor

## [1] 2 2 1 2 1
## attr(,"levels")
## [1] "no"  "yes"

x <- factor(c("yes", "yes", "no", "yes", "no"),  levels <- c("yes", "no"))     # The order of the levels of a factor can be set using the levels argument to factor()
x

## [1] yes yes no  yes no 
## Levels: yes no

Examples of Missing values

Missing values are denoted by NA or NaN for q undefined mathematical operations. is.na() is used to test objects if they are NA is.nan() is used to test for NaN A NaN is also NA but the converse is not true

x <- c(1, 2, NA, 10, 3)    ## Create a vector with NAs in it
is.na(x)

## [1] FALSE FALSE  TRUE FALSE FALSE

is.nan(x)

## [1] FALSE FALSE FALSE FALSE FALSE

x <- c(1, 2, NaN, NA, 4)   ## Now create a vector with both NA and NaN values
is.na(x)

## [1] FALSE FALSE  TRUE  TRUE FALSE

is.nan(x)

## [1] FALSE FALSE  TRUE FALSE FALSE

Examples of Data Frames

Data frames are used to store tabular data in R. Data frames are represented as a special type of list where every element of the list has to have the same length Each element of the list can be thought of as a column and the length of each element of the list is the number of rows. Data frames have a special attribute called row.names which indicate information about each row of the data frame.

x <- data.frame(foo = 1:4, bar = c(T, T, F, F))       # create a data frame
x

##   foo   bar
## 1   1  TRUE
## 2   2  TRUE
## 3   3 FALSE
## 4   4 FALSE

nrow(x)

## [1] 4

ncol(x)

## [1] 2

data.matrix(x)                # convert data frame to a matrix

##      foo bar
## [1,]   1   1
## [2,]   2   1
## [3,]   3   0
## [4,]   4   0

Examples of names

R objects can have names, which is very useful for writing readable code and self-describing objects

x <- 1:3
names(x)

## NULL

names(x) <- c("New York", "Seattle", "Los Angeles")     # set the names for vector x
x

##    New York     Seattle Los Angeles 
##           1           2           3

x <- list("Los Angeles" = 1, Boston = 2, London = 3)    # list can also have names 
x

## $`Los Angeles`
## [1] 1
## 
## $Boston
## [1] 2
## 
## $London
## [1] 3

m <- matrix(1:4, nrow = 2, ncol = 2)
dimnames(m) <- list(c("a", "b"), c("c", "d"))        # Matrices can have both column and row names.
m

##   c d
## a 1 3
## b 2 4

colnames(m) <- c("h", "f")                           # set column names 
rownames(m) <- c("x", "z")                           # set row names
m

##   h f
## x 1 3
## z 2 4

Examples of subsetting operation

There are three operators that can be used to extract subsets of R objects. • The [ operator always returns an object of the same class as the original. It can be used to select multiple elements of an object. • The [[ operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. • The $operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[.

x <- c("a", "b", "c", "c", "d", "a") 
x[1] ## Extract the first element

## [1] "a"

x[2] ## Extract the second element

## [1] "b"

x[1:4] ##  extract multiple elements

## [1] "a" "b" "c" "c"

x[c(1, 3, 4)]

## [1] "a" "c" "c"

u <- x> "a"
x[u]    ## extract elements of a vector that satisfy a given condition.

## [1] "b" "c" "c" "d"

x[x>"a"]

## [1] "b" "c" "c" "d"

x <- matrix(1:6, 2, 3) 
x

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

x[1, 2]   ## get row 1 column 2 element in matrix x

## [1] 3

x[2, 1]

## [1] 2

x[1, ] ## Extract the first row

## [1] 1 3 5

x[, 2] ## Extract the second column

## [1] 3 4

x[1, 2, drop = FALSE]  ## turn off the default returning vector

##      [,1]
## [1,]    3

x[1, ]

## [1] 1 3 5

x[1, , drop = FALSE]

##      [,1] [,2] [,3]
## [1,]    1    3    5

x <- list(foo = 1:4, bar = 0.6) 
x

## $foo
## [1] 1 2 3 4
## 
## $bar
## [1] 0.6

x[[1]]      ## get the first element in list use [[]]

## [1] 1 2 3 4

x[["bar"]]  ## get the element bar

## [1] 0.6

x$bar

## [1] 0.6

x <- list(foo = 1:4, bar = 0.6, baz = "hello")  # create a list
name <- "foo"
x[[name]]   ## computed index for "foo"

## [1] 1 2 3 4

x$foo    ## get the element with name foo

## [1] 1 2 3 4

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) # create a nested list
x[[c(1, 3)]] ## Get the 3rd element of the 1st element

## [1] 14

x[[1]][[3]]  ## same as above

## [1] 14

x[[c(2, 1)]]## 1st element of the 2nd element

## [1] 3.14

x <- list(aardvark = 1:5)  ## create a new list
x$a                        ## partial matching of a list element name

## [1] 1 2 3 4 5

x[["a"]]                   ## by default, exact matching of a list element name

## NULL

x[["a", exact = FALSE]]    ## partial matching of a list element name

## [1] 1 2 3 4 5

Examples of removing NA values

A common task in data analysis is removing missing values (NAs).

x <- c(1, 2, NA, 4, NA, 5)
bad <- is.na(x)
print(bad)

## [1] FALSE FALSE  TRUE FALSE  TRUE FALSE

x[!bad]    ## removing the NA values

## [1] 1 2 4 5

## creating two vectors with missing values, now we want to take subset with no missing values in both vectors
x <- c(1, NA,3, 4, NA, 5)              
y <- c("a", "b", NA, "d", NA, "f")
good <- complete.cases(x, y)
good

## [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE

x[good]    # good cases in x

## [1] 1 4 5

y[good]    # good cases in y

## [1] "a" "d" "f"

Examples of vectorized operations

Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects. This allows you to write code that is efficient, concise, and easier to read than in non-vectorized languages.

x <- 1:4 
y <- 6:9
z <- x + y 
z

## [1]  7  9 11 13

x >= 2

## [1] FALSE  TRUE  TRUE  TRUE

x-y

## [1] -5 -5 -5 -5

x*y

## [1]  6 14 24 36

x <- matrix(1:4, 2, 2)
x

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

y <- matrix(rep(10, 4), 2, 2) 
x

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

## element-wise multiplication
x*y

##      [,1] [,2]
## [1,]   10   30
## [2,]   20   40

## element-wise division
x/y

##      [,1] [,2]
## [1,]  0.1  0.3
## [2,]  0.2  0.4

## true matrix multiplication
x %*% y

##      [,1] [,2]
## [1,]   40   40
## [2,]   60   60