Simple introduction to R

R input

This part describes examples of R input of single or multiple values. We use c() function to create vectors of objects

x <- 1    ## integer, nothing printed
print(x) ## explicit printing

## [1] 1

x           ## auto-printing occurs

## [1] 1

y <- 1.5 + 2   ## numeric number
print(y)

## [1] 3.5

## [1] 3.5

x <- c(0.5, 0.6)                  ## numeric, real number
x <- c(1,3)                      ## integer
x <- c(TRUE, FALSE)              ## logical
x <- c(T, F)                       ## logical
x <- c("a", "b", "c")          ##  character 
x <- c(1+0i, 2+4i)             ##  complex
x[0]                # print the class type of x

## complex(0)

class(x)          # show the class of variable x

## [1] "complex"

x[1]                 # print the first element of x

## [1] 1+0i

y<- 1:6           # sequential numbers
y

## [1] 1 2 3 4 5 6

x<-"a"            # one character
print(x)

## [1] "a"

R vector operation

This part describes examples of vector operation in R

x <- c("abc",10) # combination of character and integer
print(x)

## [1] "abc" "10"

as.numeric(x)    # convert x to numeric class

## Warning: NAs introduced by coercion

## [1] NA 10

as.integer(x)     # convert x to integer class

## Warning: NAs introduced by coercion

## [1] NA 10

as.logical(x)      # convert x to logical class

## [1] NA NA

as.character(x)  # convert x to character class

## [1] "abc" "10"

as.numeric(x[2]) + 3 # convert to numeric class

## [1] 13

R list operation

Lists are special type of vector that contain elements of different classes.

x <- list("abc",10) # create a list
result <- x[[2]]+3
print(result)

## [1] 13

R control structures

This part shows examples of R control structure: if-else statements and for loop Control structures in R allow you to control the flow of execution of a series of R expressions. Basically, control structures allow you to put some “logic” into your R code, rather than just always executing the same R code every time. Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly.

# if-else statements
x<-10
if(x>3)  
{
  y<-10
}else
{
  y<-0
}
print(x)

## [1] 10

print(y)

## [1] 10

# for loop
for(i in 1:10)
{
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

x<-c("a","b","c","d")
for(i in 1:4)
{# print each element in vector x
  print(x[i])
}

## [1] "a"
## [1] "b"
## [1] "c"
## [1] "d"

Reading data from file

This is example of reading data from a file by R

# load all data
data <- read.csv("http://cs.plu.edu/~caora/Rdata/grapeJuice.csv", header = T)
# load partial data
initial <- read.csv("http://cs.plu.edu/~caora/Rdata/grapeJuice.csv", header = T, nrows=5)
# rename the columns in the data
names(initial) <- c("name1","name2","name3","name4","name5")
# display the column with name1
initial$name1

## [1] 222 201 247 169 317

# get head and summary of the data from file
head(data)

##   sales price ad_type price_apple price_cookies
## 1   222  9.83       0        7.36          8.80
## 2   201  9.72       1        7.43          9.62
## 3   247 10.15       1        7.66          8.90
## 4   169 10.04       0        7.57         10.26
## 5   317  8.38       1        7.33          9.54
## 6   227  9.74       0        7.51          9.49

summary(data)

##      sales           price           ad_type     price_apple   
##  Min.   :131.0   Min.   : 8.200   Min.   :0.0   Min.   :7.300  
##  1st Qu.:182.5   1st Qu.: 9.585   1st Qu.:0.0   1st Qu.:7.438  
##  Median :204.5   Median : 9.855   Median :0.5   Median :7.580  
##  Mean   :216.7   Mean   : 9.738   Mean   :0.5   Mean   :7.659  
##  3rd Qu.:244.2   3rd Qu.:10.268   3rd Qu.:1.0   3rd Qu.:7.805  
##  Max.   :335.0   Max.   :10.490   Max.   :1.0   Max.   :8.290  
##  price_cookies   
##  Min.   : 8.790  
##  1st Qu.: 9.190  
##  Median : 9.515  
##  Mean   : 9.622  
##  3rd Qu.:10.140  
##  Max.   :10.580

Draw figures to visualize the data

par(mfrow = c(1,2))  #set the 1 by 2 layout plot window
boxplot(data$sales,horizontal = TRUE, xlab="sales") # boxplot to check if there are outliers
hist(data$sales,main="",xlab="sales",prob=T) # histogram to explore the data distribution shape
lines(density(data$sales),lty="dashed",lwd=2.5,col="red")

Analyze the effectiveness for sales between two types of ads

The marketing team wants to find out the ad with better effectiveness for sales between the two types of ads, one is with natural production theme; the other is with family health caring theme.

 #divide the dataset into two sub dataset by ad_type
sales_ad_nature = subset(data,ad_type==0)
sales_ad_family = subset(data,ad_type==1)
#calculate the mean of sales with different ad_type
mean(sales_ad_nature$sales)

## [1] 186.6667

mean(sales_ad_family$sales)

## [1] 246.6667

# calculating the t test
t.test(sales_ad_nature$sales,sales_ad_family$sales)

## 
##  Welch Two Sample t-test
## 
## data:  sales_ad_nature$sales and sales_ad_family$sales
## t = -3.7515, df = 25.257, p-value = 0.0009233
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -92.92234 -27.07766
## sample estimates:
## mean of x mean of y 
##  186.6667  246.6667

 #set the 1 by 2 layout plot window
 par(mfrow = c(1,2))

 # histogram to explore the data distribution shapes
 hist(sales_ad_nature$sales,main="",xlab="sales with nature production theme ad",prob=T)
 lines(density(sales_ad_nature$sales),lty="dashed",lwd=2.5,col="red")

 hist(sales_ad_family$sales,main="",xlab="sales with family health caring theme ad",prob=T)
 lines(density(sales_ad_family$sales),lty="dashed",lwd=2.5,col="red")

 # more plotting examples
 # line charts
 plot(sales_ad_family$sales, sales_ad_nature$sales) #(type="o", col="blue")
 # Bar plot
 barplot(sales_ad_family$sales)

# pie charts
 testData <- c(100,20,300,100,1)
 pie(testData, col=rainbow(length(testData)),labels=c("Mon","Tue","Wed","Thu","Fri"))

Use the optimize function to find the maximum profit

Assume you want to get higher profit rather than just higher sales quantity, and you find out the relationship between sales and price is: Sales = 772.64 – 51.24 * price. Assume the cost per each juice is 5, you can now calculate the profit by: Y = (price – 5) * Sales = – 51.24 * price * price + 1028.84 * price – 3863.2

f <- function(x) {
    profit = -51.24*x*x + 1028.84 * x - 3863.2
    return(profit)
}
optimize(f,lower=0,upper=20,maximum=TRUE)

## $maximum
## [1] 10.03942
## 
## $objective
## [1] 1301.28