Hands-on course on July 1 2015 07-01-2015d1710

2015-07-01

Hands-on course on July 1 2015 07-01-2015d1710



some notes

07-01-2015d1012

R Statistics course

Ramon Diaz-Uriarte

do not read Gentleman's bioinfo one, nor Crawley's

RCommander
JGR (Java Gui for R)
JGR + Deducer

runif = random uniform distribution

R variables can hold any type of object (this does not need to be explicitly defined)

rnorm
create a random number from a normal distribution from 1 or 0 to a max value

rep is for repeat
c is for concatenate (paste together)
factor object is good when you have something that is a class label in a statistical analysis.

head function shows first 6 lines of an object

apply function applies something to an object
apply(object, dimension that you will do things, )

1:20 defines a range from 1 to 20

functions can be defined on the fly while being passed as a parameter into a function

thin line between using and programming R

names tells you the names of each component in a list

The $ sign used to access the interal components of a list.

functional composition

library to load a library/package in library

It's better not to save the workspace from RStudio

? in front of function brings up help

you can type example around a function to see an example

run

he likes to be explicit about using the names of variables being passed to a function

apropos searches functions that have certain text in their name.

vignettes give you several work examples (pdf user guides)

sos package for using help

cran task views

coding style

try not to go beyond column 80

variables are really a vector in r

vectors hold elements of the same type. can't mix numbers and strings

if a vector has more than 1 dimension, then it is a matrix

dataframes can have elements of different types
dataframes organize things in terms of columns
like a spreadsheet
dataframes are rectangular table objects

dataframes are a type of list

a list is a general container

getwd() is get working directory

the working directory can also be changed (session->set working directory)

setwd()


summary of a data frame

str of a data frame (str is for structure)

read.table function (the header parameter is VERY important to indicate and pay attention to)

dealing with missing values; use an NA

read csv function to get csv file

You can save specific objects in R

ggplot2

source command to run a script

vectors
regular sequences with sequence function

range by 1 like 2:7

rep function (repeat function)

output often length of largest object

recycling rule can lead to nasty surprises

he doesn't like using "=" for assignment

identical function

floating point arithmetic in r (be careful)

which can give indices of elements in a list that meet criteria

to get elements of a vector pass another vector with the positions you want

You can name each element in a vector in R
ages <- c(Juan = 23, Maria = 35, Irene = 12, Ana = 93)
vectors that have names are not dataframes
this is like a lookup table

age

factors
factors are recoded to numeric things
as.numeric()

rbind to bind rows
cbind to bind columns

you can use rbind to add something to a matrix

drop = FALSE to not drop dimensions

the apply function can be used on a matrix, but not on a vector

obtaining indices of an object of a matrix

which(A==999, arr.ind = TRUE)
row col
[1,] 2 3


a list is a very general container

s3 and s4 classes allow objects to have functions to allow for a type of object oriented programming
s4 classes are more sophisticated whereas s3 is for things like "print", "plot", (more built-in things), etc.

A dataframe is a special type of list

transform datafram into matrix
data.matrix(AB)
as.matrix(AB)

with function to get data from tables

attach can add things to the list of packages that R searches through when finding a command, but this strategy is not used often anymore.

iteration in R
names.of.friends <- c("Ana", "Rebeca", "Marta",
"Quique", "Virgilio")
for(friend in names.of.friends) {
cat("\n I should call", friend, "\n")


apply can be used instead of for loops and they are easier to parallelize

defining a function in R example

multByTwo <- function(x) {
z <- 2 * x
return(z)

}

optional arguments in R example

plotAndLm <- function(x, y, title = "A figure") {
lm1 <- lm(y ~ x)
cat("\n Printing the summary of x\n")
print(summary(x))
cat("\n Printing the summary of y\n")
print(summary(y))
cat("\n Printing the summary of the linear regression\n")
print(summary(lm1))
plot(y ~ x, main = title)
abline(lm1)
return(lm1)


}
}