This part describes examples of R input of single or multiple values. We use c() function to create vectors of objects
x <- 1 ## integer, nothing printed
print(x) ## explicit printing
## [1] 1
x ## auto-printing occurs
## [1] 1
y <- 1.5 + 2 ## numeric number
print(y)
## [1] 3.5
y
## [1] 3.5
x <- c(0.5, 0.6) ## numeric, real number
x <- c(1,3) ## integer
x <- c(TRUE, FALSE) ## logical
x <- c(T, F) ## logical
x <- c("a", "b", "c") ## character
x <- c(1+0i, 2+4i) ## complex
x[0] # print the class type of x
## complex(0)
class(x) # show the class of variable x
## [1] "complex"
x[1] # print the first element of x
## [1] 1+0i
y<- 1:6 # sequential numbers
y
## [1] 1 2 3 4 5 6
x<-"a" # one character
print(x)
## [1] "a"
This part describes examples of vector operation in R
x <- c("abc",10) # combination of character and integer
print(x)
## [1] "abc" "10"
as.numeric(x) # convert x to numeric class
## Warning: NAs introduced by coercion
## [1] NA 10
as.integer(x) # convert x to integer class
## Warning: NAs introduced by coercion
## [1] NA 10
as.logical(x) # convert x to logical class
## [1] NA NA
as.character(x) # convert x to character class
## [1] "abc" "10"
as.numeric(x[2]) + 3 # convert to numeric class
## [1] 13
Lists are special type of vector that contain elements of different classes.
x <- list("abc",10) # create a list
result <- x[[2]]+3
print(result)
## [1] 13
This part shows examples of R control structure: if-else statements and for loop Control structures in R allow you to control the flow of execution of a series of R expressions. Basically, control structures allow you to put some “logic” into your R code, rather than just always executing the same R code every time. Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly.
# if-else statements
x<-10
if(x>3)
{
y<-10
}else
{
y<-0
}
print(x)
## [1] 10
print(y)
## [1] 10
# for loop
for(i in 1:10)
{
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
x<-c("a","b","c","d")
for(i in 1:4)
{# print each element in vector x
print(x[i])
}
## [1] "a"
## [1] "b"
## [1] "c"
## [1] "d"
This is example of reading data from a file by R
# load all data
data <- read.csv("http://cs.plu.edu/~caora/Rdata/grapeJuice.csv", header = T)
# load partial data
initial <- read.csv("http://cs.plu.edu/~caora/Rdata/grapeJuice.csv", header = T, nrows=5)
# rename the columns in the data
names(initial) <- c("name1","name2","name3","name4","name5")
# display the column with name1
initial$name1
## [1] 222 201 247 169 317
# get head and summary of the data from file
head(data)
## sales price ad_type price_apple price_cookies
## 1 222 9.83 0 7.36 8.80
## 2 201 9.72 1 7.43 9.62
## 3 247 10.15 1 7.66 8.90
## 4 169 10.04 0 7.57 10.26
## 5 317 8.38 1 7.33 9.54
## 6 227 9.74 0 7.51 9.49
summary(data)
## sales price ad_type price_apple
## Min. :131.0 Min. : 8.200 Min. :0.0 Min. :7.300
## 1st Qu.:182.5 1st Qu.: 9.585 1st Qu.:0.0 1st Qu.:7.438
## Median :204.5 Median : 9.855 Median :0.5 Median :7.580
## Mean :216.7 Mean : 9.738 Mean :0.5 Mean :7.659
## 3rd Qu.:244.2 3rd Qu.:10.268 3rd Qu.:1.0 3rd Qu.:7.805
## Max. :335.0 Max. :10.490 Max. :1.0 Max. :8.290
## price_cookies
## Min. : 8.790
## 1st Qu.: 9.190
## Median : 9.515
## Mean : 9.622
## 3rd Qu.:10.140
## Max. :10.580
par(mfrow = c(1,2)) #set the 1 by 2 layout plot window
boxplot(data$sales,horizontal = TRUE, xlab="sales") # boxplot to check if there are outliers
hist(data$sales,main="",xlab="sales",prob=T) # histogram to explore the data distribution shape
lines(density(data$sales),lty="dashed",lwd=2.5,col="red")
The marketing team wants to find out the ad with better effectiveness for sales between the two types of ads, one is with natural production theme; the other is with family health caring theme.
#divide the dataset into two sub dataset by ad_type
sales_ad_nature = subset(data,ad_type==0)
sales_ad_family = subset(data,ad_type==1)
#calculate the mean of sales with different ad_type
mean(sales_ad_nature$sales)
## [1] 186.6667
mean(sales_ad_family$sales)
## [1] 246.6667
# calculating the t test
t.test(sales_ad_nature$sales,sales_ad_family$sales)
##
## Welch Two Sample t-test
##
## data: sales_ad_nature$sales and sales_ad_family$sales
## t = -3.7515, df = 25.257, p-value = 0.0009233
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -92.92234 -27.07766
## sample estimates:
## mean of x mean of y
## 186.6667 246.6667
#set the 1 by 2 layout plot window
par(mfrow = c(1,2))
# histogram to explore the data distribution shapes
hist(sales_ad_nature$sales,main="",xlab="sales with nature production theme ad",prob=T)
lines(density(sales_ad_nature$sales),lty="dashed",lwd=2.5,col="red")
hist(sales_ad_family$sales,main="",xlab="sales with family health caring theme ad",prob=T)
lines(density(sales_ad_family$sales),lty="dashed",lwd=2.5,col="red")
# more plotting examples
# line charts
plot(sales_ad_family$sales, sales_ad_nature$sales) #(type="o", col="blue")
# Bar plot
barplot(sales_ad_family$sales)
# pie charts
testData <- c(100,20,300,100,1)
pie(testData, col=rainbow(length(testData)),labels=c("Mon","Tue","Wed","Thu","Fri"))
Assume you want to get higher profit rather than just higher sales quantity, and you find out the relationship between sales and price is: Sales = 772.64 – 51.24 * price. Assume the cost per each juice is 5, you can now calculate the profit by: Y = (price – 5) * Sales = – 51.24 * price * price + 1028.84 * price – 3863.2
f <- function(x) {
profit = -51.24*x*x + 1028.84 * x - 3863.2
return(profit)
}
optimize(f,lower=0,upper=20,maximum=TRUE)
## $maximum
## [1] 10.03942
##
## $objective
## [1] 1301.28