Hands-on course on July 1 2015 07-01-2015d1710
2015-07-01Hands-on course on July 1 2015 07-01-2015d1710
- certificate evaluation form and course files found here
- -http://azim58.ngrok.io/files/ws-pub/2015/07%20July/07-01-2015d1718/Hands%20on%20R%20course
some notes
07-01-2015d1012
R Statistics course
Ramon Diaz-Uriarte
do not read Gentleman's bioinfo one, nor Crawley's
RCommander
JGR (Java Gui for R)
JGR + Deducer
runif = random uniform distribution
R variables can hold any type of object (this does not need to be explicitly defined)
rnorm
create a random number from a normal distribution from 1 or 0 to a max value
rep is for repeat
c is for concatenate (paste together)
factor object is good when you have something that is a class label in a statistical analysis.
head function shows first 6 lines of an object
apply function applies something to an object
apply(object, dimension that you will do things, )
1:20 defines a range from 1 to 20
functions can be defined on the fly while being passed as a parameter into a function
thin line between using and programming R
names tells you the names of each component in a list
The $ sign used to access the interal components of a list.
functional composition
library to load a library/package in library
It's better not to save the workspace from RStudio
? in front of function brings up help
you can type example around a function to see an example
run
he likes to be explicit about using the names of variables being passed to a function
apropos searches functions that have certain text in their name.
vignettes give you several work examples (pdf user guides)
sos package for using help
cran task views
coding style
try not to go beyond column 80
variables are really a vector in r
vectors hold elements of the same type. can't mix numbers and strings
if a vector has more than 1 dimension, then it is a matrix
dataframes can have elements of different types
dataframes organize things in terms of columns
like a spreadsheet
dataframes are rectangular table objects
dataframes are a type of list
a list is a general container
getwd() is get working directory
the working directory can also be changed (session->set working directory)
setwd()
summary of a data frame
str of a data frame (str is for structure)
read.table function (the header parameter is VERY important to indicate and pay attention to)
dealing with missing values; use an NA
read csv function to get csv file
You can save specific objects in R
ggplot2
source command to run a script
vectors
regular sequences with sequence function
range by 1 like 2:7
rep function (repeat function)
output often length of largest object
recycling rule can lead to nasty surprises
he doesn't like using "=" for assignment
identical function
floating point arithmetic in r (be careful)
which can give indices of elements in a list that meet criteria
to get elements of a vector pass another vector with the positions you want
You can name each element in a vector in R
ages <- c(Juan = 23, Maria = 35, Irene = 12, Ana = 93)
vectors that have names are not dataframes
this is like a lookup table
age
factors
factors are recoded to numeric things
as.numeric()
rbind to bind rows
cbind to bind columns
you can use rbind to add something to a matrix
drop = FALSE to not drop dimensions
the apply function can be used on a matrix, but not on a vector
obtaining indices of an object of a matrix
which(A==999, arr.ind = TRUE)
row col
[1,] 2 3
a list is a very general container
s3 and s4 classes allow objects to have functions to allow for a type of object oriented programming
s4 classes are more sophisticated whereas s3 is for things like "print", "plot", (more built-in things), etc.
A dataframe is a special type of list
transform datafram into matrix
data.matrix(AB)
as.matrix(AB)
with function to get data from tables
attach can add things to the list of packages that R searches through when finding a command, but this strategy is not used often anymore.
iteration in R
names.of.friends <- c("Ana", "Rebeca", "Marta",
"Quique", "Virgilio")
for(friend in names.of.friends) {
cat("\n I should call", friend, "\n")
apply can be used instead of for loops and they are easier to parallelize
defining a function in R example
multByTwo <- function(x) {
z <- 2 * x
return(z)
}
optional arguments in R example
plotAndLm <- function(x, y, title = "A figure") {
lm1 <- lm(y ~ x)
cat("\n Printing the summary of x\n")
print(summary(x))
cat("\n Printing the summary of y\n")
print(summary(y))
cat("\n Printing the summary of the linear regression\n")
print(summary(lm1))
plot(y ~ x, main = title)
abline(lm1)
return(lm1)
}
}