Working With Data

table() is useful for counting the number of observations in different groups or levels

> color <- c("red", "red", "red", "blond", "blond", "blond", "blond", "brunette", "brunette", "brunette", "brunette")
> style <- c("perm", "mullet", "perm", "mohawk", "mullet", "mullet", "braids", "mullet", "mullet", "perm", "braids")
> table(color, style)

color      braids mohawk mullet perm
  blond         1      1      2    0
  brunette      1      0      2    1
  red           0      0      1    2

tapply() is very useful for applying a function (e.g., mean, max, length, or one of your own) to the observations in different groups or levels

> <- c(2, 5, 1.5, 8, 3, 4.5, 13, 9, 2, 3.5, 3)
> tapply(, INDEX = style, FUN = mean)

  braids   mohawk   mullet     perm 
8.000000 8.000000 4.700000 2.333333 

> tapply(, INDEX = style, FUN = function(x) { sd(x)/mean(x) }) #calculating the C.V.
   braids    mohawk    mullet      perm 
0.8838835        NA 0.5709110 0.4460713

subset() can be useful for subsetting your data according to a condition or conditions

When you have factors as predictor variables, it can be very useful to generate a design matrix that specifies which effects (beta parameters) should come into play for each data point. The design matrix is essentially your data, re-formatted as a matrix of zeros and ones specifying the categorical attributes of each data point. In R, design matrices are easily generated using the model.matrix function. For example, say you had a large dataset with a categorical predictor variable with 10 levels, for which it would otherwise be tedious to assemble a design matrix (e.g., 10 different species). The simple R command "x <- model.matrix(~species) will automatically generate the design matrix for modeling the effect of species with an intercept term (use model.matrix(~species-1) if you don't want an intercept term). If you had 2 categorical variables you wanted to include as predictor variables (e.g., species and habitat type), you could generate a design matrix for a model including main and interaction effects with the R command "x <- model.matrix(~species*habtype). For the seed predation example from Bolker's chapter 8, I used model.matrix to specify a model for the effect of species on seed predation rates as follows:

species_ind <- model.matrix(~species-1)  # design matrix for species- no intercept

params <- numeric(9)
NLL_sp_species3 <- function(params,numeaten,available) { 
   ltheta <- params[1]
   betas  <- c(params[2:9])  
   -sum(dbetabinom(x=numeaten,size=available,prob=plogis(species_ind %*% betas),

SP.species3 <- optim(fn=NLL_sp_species3,par=c(ltheta=log(0.35),betas=rep(qlogis(0.05),times=8)),
              numeaten = taken, available = available, method="BFGS")
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License