table() is useful for counting the number of observations in different groups or levels
> color <- c("red", "red", "red", "blond", "blond", "blond", "blond", "brunette", "brunette", "brunette", "brunette")
> style <- c("perm", "mullet", "perm", "mohawk", "mullet", "mullet", "braids", "mullet", "mullet", "perm", "braids")
> table(color, style)
style
color braids mohawk mullet perm
blond 1 1 2 0
brunette 1 0 2 1
red 0 0 1 2
tapply() is very useful for applying a function (e.g., mean, max, length, or one of your own) to the observations in different groups or levels
> length.in.back <- c(2, 5, 1.5, 8, 3, 4.5, 13, 9, 2, 3.5, 3)
> tapply(length.in.back, INDEX = style, FUN = mean)
braids mohawk mullet perm
8.000000 8.000000 4.700000 2.333333
> tapply(length.in.back, INDEX = style, FUN = function(x) { sd(x)/mean(x) }) #calculating the C.V.
braids mohawk mullet perm
0.8838835 NA 0.5709110 0.4460713
subset() can be useful for subsetting your data according to a condition or conditions
When you have factors as predictor variables, it can be very useful to generate a design matrix that specifies which effects (beta parameters) should come into play for each data point. The design matrix is essentially your data, re-formatted as a matrix of zeros and ones specifying the categorical attributes of each data point. In R, design matrices are easily generated using the model.matrix function. For example, say you had a large dataset with a categorical predictor variable with 10 levels, for which it would otherwise be tedious to assemble a design matrix (e.g., 10 different species). The simple R command "x <- model.matrix(~species) will automatically generate the design matrix for modeling the effect of species with an intercept term (use model.matrix(~species-1) if you don't want an intercept term). If you had 2 categorical variables you wanted to include as predictor variables (e.g., species and habitat type), you could generate a design matrix for a model including main and interaction effects with the R command "x <- model.matrix(~species*habtype). For the seed predation example from Bolker's chapter 8, I used model.matrix to specify a model for the effect of species on seed predation rates as follows:
species_ind <- model.matrix(~species-1) # design matrix for species- no intercept
params <- numeric(9)
NLL_sp_species3 <- function(params,numeaten,available) {
ltheta <- params[1]
betas <- c(params[2:9])
-sum(dbetabinom(x=numeaten,size=available,prob=plogis(species_ind %*% betas),
theta=exp(ltheta),log=TRUE))
}
SP.species3 <- optim(fn=NLL_sp_species3,par=c(ltheta=log(0.35),betas=rep(qlogis(0.05),times=8)),
numeaten = taken, available = available, method="BFGS")