Print

Care with estimated breeding values

Predictions of breeding values from animal models have been used to make a range of inferences on parameters of evolutionary importance, including, selection with respect to genetic variation at the individual level, population trends in mean breeding value over time (microevolution), and genetically-based differences among populations. However, despite the promising name of the technique with which these predictions are made (best linear unbiased prediction, or BLUP), applications of predicted breeding values can be very seriously biased, and are often far from the best option.

One simple example of very serious bias was described by Erik Postma (E. Postma, 2006. Implications of the difference between true and predicted breeding values for the study of natural selection and micro-evolution. Journal of Evolutionary Biology 19: 309-320.). If predictions of breeding values are obtained from a model that does not account for (potential) environmentally-induced trends in phenotype over time, trends in predicted breeding values will mimic the phenotypic trend, even in the complete absence of a genetic trend. We can verify this by simulation:

est.without.cohort<-1:100
est.with.cohort<-1:100

for(rep in 1:100){
  random.effects<-cbind(1:100,rnorm(100,0,1))
  dogs<-as.data.frame(cbind(1:1000,expand.grid(1:200,1:5)[,2],
           ceiling(runif(1000,0,100)),NA,rnorm(1000,0,1)))
  names(dogs)<-cbind("ID","COHORT","LITTER","LITTER.value","ENV")
  dogs$LITTER.value<-random.effects[match(dogs$LITTER,random.effects[,1]),2]
  dogs$PHEN<-dogs$COHORT*0.3+dogs$LITTER.value+dogs$ENV
  dogs$COHORT<-as.factor(dogs$COHORT)
  dogs$LITTER<-as.factor(dogs$LITTER)

  library(asreml)

  model1<-asreml(PHEN~1, random=~LITTER, data=dogs)
  blups1<-cbind(1:100,summary(model1)$coef.random[,1])
  dogs$blups.model1<-blups1[match(as.character(dogs$LITTER),
           as.character(blups1[,1])),2]

  model2<-asreml(PHEN~COHORT, random=~LITTER, data=dogs)
  blups1<-cbind(1:100,summary(model2)$coef.random[,1])
  dogs$blups.model2<-blups1[match(as.character(dogs$LITTER),
          as.character(blups1[,1])),2]


  est.without.cohort[rep]<-summary(lm(blups.model1~
                       as.numeric(COHORT),data=dogs))$coeff[2]
  est.with.cohort[rep]<-summary(lm(blups.model2~
                       as.numeric(COHORT),data=dogs))$coeff[2]

  # this line is because the code for associating blups with their
  # effects is only right when all effects occur in the data
  # this will make the "In cbind(..." warnings ignorable
  if(length(summary(model1)$coef.random[,1])!=100|
       length(summary(model2)$coef.random[,1])!=100) rep<-rep-1
}

> mean(est.without.cohort)
[1] 0.02618011
> sd(est.without.cohort)
[1] 0.02118003
> 
> 
> mean(est.with.cohort)
[1] -7.418099e-06
> sd(est.with.cohort)
[1] 0.02072310
>


Here the pedigree is defined by litters, rather than the inverse of the A matrix typical to animal models, and some phenotype of dogs is being tested for evolution. Even though there is not change in the population, we pick it up if we fail to model temporal variation at the level of the phenotype.

We now see how to get unbiased results from this statistical procedure of regressing predicted breeding values on time to make inferences about microevolution. However, a nasty statistical issue remains, one that often comes up when we do statistics on statistics. If we want to test the statistical significance of the trend in breeding values over time, we could (and have) naively fit regressions of breeding values on time. However, this approach treats every predicted breeding value as (a) known without error and (b) independent. In fact, in the sorts of pedigrees that we deal with in the analysis of data from natural populations, breeding values are predicted with a lot of error, and relatedness (the very factor that we rely on to run our models) means that individual predictions of breeding values are non-independent, both across and within cohorts. Jarrod Hadfield (J.D. Hadfield, A.J. Wilson, D. Garant, B.C. Sheldon, L.E.B. Kruuk. 2010. The Misuse of BLUP in Ecology and Evolution. American Naturalist 175: 116-125.) provided the solution, which is to account for the full uncertainty and non-independence by integrating the regression over the joint posterior distribution of the breeding values. The posterior distribution of a animal model is generally computationally expensive to obtain, and so the simulation approach we implemented above is out of the question. But we can make one iteration of the model, and analyze it with both ways to see the difference:

library(asreml)
  library(MCMCglmm)

  ## simulate the data (this time ten years)
  random.effects<-cbind(1:200,rnorm(200,0,1))
  dogs<-as.data.frame(cbind(1:2000,expand.grid(1:200,1:10)
           [,2],rep(ceiling(runif(200,0,20)),10),NA,rnorm(1000,0,1)))
  names(dogs)<-cbind("ID","COHORT","LITTER","LITTER.value","ENV")
  for(x in 1:2000) dogs$LITTER[x]<-paste(dogs$COHORT[x],dogs$LITTER[x],sep="")

  dogs$LITTER.value<-random.effects[match(as.numeric(as.factor(dogs$LITTER)),
                       random.effects[,1]),2]
  dogs$PHEN<-dogs$COHORT*0.3+dogs$LITTER.value+dogs$ENV
  dogs$COHORT<-as.factor(dogs$COHORT)
  dogs$LITTER<-as.factor(dogs$LITTER)

  ## fit the models
  model2<-asreml(PHEN~COHORT, random=~LITTER, data=dogs)
  model.MCMCglmm<-MCMCglmm(PHEN~COHORT, random=~LITTER, data=dogs,pr=TRUE)

  ## estimate the change in breeding values and evaluate statistical confidence
  ## 1) blup regression
  blups1<-summary(model2)$coef.random[,1]
  dogs$blups.model2<-blups1[
             match(paste("LITTER_",as.character(dogs$LITTER),sep=""),
                      names(blups1))]
  summary(lm(blups.model2~as.numeric(COHORT),data=dogs))

  ## 2) integration over the posterior distribution
  animal.indexes<-match(paste("LITTER.",as.numeric(as.character(dogs$LITTER)),
                  sep=""),names(model.MCMCglmm$Sol[1,][]))  
  post<-1:1000
  for(i in 1:1000){
    post[i]<-summary(lm(model.MCMCglmm$Sol[i,][animal.indexes]~
                           as.numeric(as.character(dogs$COHORT))))$coef[2]
  }
  table(post>0) ## note this is one tailed

  ## get something more comparable to the SE reported above
  se.range<-HPDinterval(as.mcmc(post),0.5)
  mode<-posterior.mode(as.mcmc(post))
  mean(abs(se.range[2]-mode),abs(mode-se.range[1]))

## and we can get the output:

# >   ## estimate the change in breeding values and evaluate statistical confidence
# >   ## 1) blup regression
# >   blups1<-summary(model2)$coef.random[,1]
# >   dogs$blups.model2<-blups1[
# +              match(paste("LITTER_",as.character(dogs$LITTER),sep=""),
# +                       names(blups1))]
# >   summary(lm(blups.model2~as.numeric(COHORT),data=dogs))
# 
# Call:
# lm(formula = blups.model2 ~ as.numeric(COHORT), data = dogs)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -2.31619 -0.56308  0.04386  0.63695  1.79261 
# 
# Coefficients:
#                     Estimate Std. Error t value Pr(>|t|)
# (Intercept)        -0.016829   0.042875  -0.393    0.695
# as.numeric(COHORT)  0.004905   0.006910   0.710    0.478
# 
# Residual standard error: 0.8876 on 1998 degrees of freedom
# Multiple R-squared: 0.0002522,	Adjusted R-squared: -0.0002482 
# F-statistic: 0.504 on 1 and 1998 DF,  p-value: 0.4778 
# 
# > 
# >   ## 2) integration over the posterior distribution
# >   # animal.indexes<-match(paste("LITTER.",
#        as.numeric(as.character(dogs$LITTER)),sep=""),
#        names(model.MCMCglmm$Sol[1,][]))  
# >   post<-1:1000
# >   for(i in 1:1000){
# +     post[i]<-summary(lm(model.MCMCglmm$Sol[i,]
#     [animal.indexes]~as.numeric(as.character(dogs$COHORT))))$coef[2]
# +   }
# >   table(post>0) ## note this is one tailed
# 
# FALSE  TRUE 
#   410   590 
# > 
# >   ## get something more comparable to the SE reported above
# >   se.range<-HPDinterval(as.mcmc(post),0.5)
# >   mode<-posterior.mode(as.mcmc(post))
# >   mean(abs(se.range[2]-mode),abs(mode-se.range[1]))
# [1] 0.01473613



So the test for a trend in breeding values is anti-conservative when uncertainty and non-independence at the level of individual predicted breeding values is not taken into account. Both estimates were non-significant, as they should have been, in this particular run, because we fit the right model and so the slope of the regression of estimated breeding values on cohort was very small in both analyses. But note how the standard error appeared to be less than half what it should have been in the analysis that didn't account for the uncertainty and the non-independence in the 'best' predictions of the breeding values.

Hadfield et al. 2010 also provides detailed discussion of (a) why uses of blups to test whether or not breeding values covary with fitness and whether or not different groups of individuals differ genetically (b) solutions to these problems as well.


Menu

Main Menu