Breeding exercise instructions

Lecture 7: Conventional & Advanced Breeding

Introduction

In this breeding exercise, we will explore a single breeding cycle that combines some of the methods you have learned in previous lectures. These methods include pedigree breeding (PB), single seed descent (SSD), doubled haploid (DH), marker assisted selection (MAS) and genomic selection (GS). The breeding cycle for inbred varieties is described in the figure below:

 

Notes:

HRT: Head Row Trial.

PYT: Preliminary Yield Trial.

AYT: Advanced Yield Trial.

EYT: Elite Yield Trial.

SSD: Single Seed Descent.

DH: Doubled Haploid.

NS: Random Selection.

PS: Phenotypic Selection.

MAS: Marker Assisted Selection.

GS: Genomic Selection.

 

Here are some information about the fictitious species that we are working with.

This species has 5 chromosomes with genetic lengths of 200, 180, 160, 140 and 120 centiMorgans (cM). This species can be selfed or crossed easily, and the founders/lines/varieties are often inbreds. We are interested in 3 traits (size, color and shape) and we want to breed for higher trait values in all 3. Size and color are quantitative traits (i.e. controlled by many loci), while shape is a qualitative trait (i.e. controlled by few loci).

Somehow, we know that size and color are correlated and controlled by the same 800 randomly chosen loci. On average, these loci are 1 cM apart. The effects of these loci on size and color are drawn randomly from a normal distribution. We also know that shape is controlled by a subset of 10 loci, and the effects of these loci on shape are the opposite of the effects on size. Therefore, this creates a penalty toward size when selecting for higher shape value. Unlike size and color, the effects on shape are the same in magnitude for all 10 loci.

Fortunately, genomic is available for this species. We have a genotyping array with 8,000 randomly distributed markers across the genome that can be used in GS. Thanks to many other academic researchers who have poured their souls into mapping QTLs for the shape trait, we have all 10 markers that are perfectly linked to the shape loci. These markers can be readily used for MAS.

The life cycle of this species is 12 months, and it can be shortened to 3 months under SSD. This species is amenable to DH as well. Size and color can only be scored after flowering, while shape can be scored after 3 months post-germination under normal growth condition.

This species is fairly small and can be easily managed in greenhouses in the early stages (founders to F6/DH). The breeding program then moves to the field in the later stages beginning with HRT. The field plot for all trials is normally a 1m2 square, except for HRT where the plot is 0.2m2. We assume that the residual/error variance per plot in HRT to be 5 times larger than any other trials. In the greenhouse setting, measurements are often taken on single plants. We assume that the residual/error variance per plant to be 10 times larger than non-HRT trials.

OK, enough of the species background, let’s get started on the breeding exercise.

Quick setup

We will be using R and a little bit of MS Excel for this breeding exercise. We need the following three R packages:

  • AlphaSimR to simulate populations and breeding programs. For more information, please refer to its user manual. If you are interested, there is a free online course on this package.
  • rrBLUP to perform genomic prediction. For more information, please refer to its user manual.
  • ggplot2 to plot the results. For more information, please refer to its home page.

If you don’t already have these R packages installed, please do so using the following scripts.

install.packages("AlphaSimR")
install.packages("rrBLUP")
install.packages("ggplot2")

Next, we need to set a working directory.

# this is just an example, please adjust accordingly.
setwd("C:/Users/cyang/Desktop/")

Unfortunately, I do not have time to compile the breeding exercise as an R package. For now, please got to https://cjyang-work.github.io/breeding_exercise to download the R functions, scripts and CSV files. Once you have downloaded them, please place them in the same working directory that you have just set.

Now, load the R script file by running the following:

source("breeding_exercise_v2.R")
## --------------------------------------------------
## --- Breeding exercise version 0.2 (2022-10-30) ---
## -----                                        -----
## --- Questions/Issues? Contact cyang@sruc.ac.uk ---
## --------------------------------------------------

The R script file contains 6 functions: BP.eval, BP.plot, BP.check, cost.calc, gen.pred and var2. These functions use the three previously mentioned packages to perform the simulation analyses on various breeding programs that you will try. To check if you have loaded the R script file correctly, run the following and you should see the functions listed in the R console:

ls()
## [1] "BP.check"    "BP.eval"     "BP.plot"     "cost.calc"   "gen.pred"   
## [6] "ped.label"   "ped.segment" "var2"

Simulating breeding programs

The BP.eval function has everything we need to simulate the breeding program. There are many parameters (function arguments) that go into BP.eval. To make it easier, we can set most of them in the CSV file that you have just downloaded. The CSV file is best viewed in MS Excel. It currently has 3 rows and 15 columns. Each row specifies a breeding program, and each column specifies the required parameters. For now, we will ignore what these mean, start with the defaults, and return to these later. Please close the CSV file and go back to R.

Please run the following scripts in R. This may take a few minutes.

BP <- read.csv("BP.csv", as.is=TRUE)
out <- BP.eval(va=c(1,1,1), vae=c(0,0,0), vres=c(1,1,1), ca=0.5, ew=c(1,1), BP=BP, nsim=10)

Next, let’s compare the differences among the 3 breeding programs. The results are shown in plots. Since the plots are rather big, they are saved in the current working directory as “breeding_exercise_XXXXXX.png”. Note that XXXXXX refers to the time when the plots are saved to prevent overwriting the results.

BP.plot(out=out, threshold=c(2, 2, 2), nsim=10)

The plots are saved in 3-by-3 panels.

  • Each row panel corresponds to the traits (size, color and shape).

  • Each column panel corresponds to the trials (PYT, AYT, EYT) where the traits were scored.

  • X-axis is the breeding program.

  • Y-axis is the phenotypic trait values.

  • Blue horizontal line is the trait mean in the founders.

  • Red horizontal line is the threshold for breeding target.

  • Point is the trait mean of each simulation, and the associated vertical line is the range.

From the plots, there are some that pass the threshold. But, we do not know if any of these pass all 3 thresholds. To check for that, we need to use the following function.

check <- BP.check(out=out, threshold=c(2, 2, 2))
##       Program 1 Program 2 Program 3
## F1         2700      2700      2700
## F2/DH     58800    108800    108800
## HRT       40000     50000     50000
## PYT       80000    100000    100000
## AYT       32000     40000     40000
## EYT1       8000      8000      8000
## EYT2       8000      8000      8000
## Total    229500    317500    317500
## Out of 200 EYT lines x simulations, the following pass the thresholds for size (2), color (2) and shape (2).
## Program 1: 0
## Program 2: 2
## Program 3: 3

Here, we see that Program 1 has none, Program 2 has 2, and Program 3 has 3. Ignore the cost table for now, we will get back to it by the end of the next section.

Practice Time

Now that we have gone through the default examples, we can further explore other breeding programs. To help guide your choice of breeding programs, below is a table of descriptions and accepted values for each parameter that goes into BP.eval.

Parameter Description Format Accepted Values
va Additive genetic variances (founders) c(x,x,x) x \(\geq\) 0
vae Additive x Environmental variances (founders) c(x,x,x) x \(\geq\) 0
vres Residual variances (per plot) c(x,x,x) x \(\geq\) 0
ca Additive genetic correlation between size and color x -1 \(\leq\) x \(\leq\) 1
ew Economic weight for selection on size and color c(x,x) any
nsim Number of simulations x x \(>\) 1
BP Additional parameters data.frame See next table

Here are the parameters required for BP.

Parameter Description Format Accepted Values
program Breeding program x 1,2,3, …
nc Number of random crosses among founders x 100 \(\leq\) x \(\leq\) 10000
nF2 Number of F2 progeny (or DH) per F1 family x 10 \(\leq\) x \(\leq\) 1000
method Inbreeding method x SSD, DH
sel.F2 Selection method in F2/DH x NS, PS, MAS
sel.HRT Selection method in HRT x NS, PS, GS
sel.PYT Selection method in PYT x NS, PS, GS
sel.AYT Selection method in AYT x NS, PS, GS
sF2 Number of selected F2/DH per family x 1 \(\leq\) x \(\leq\) nF2
sHRT Number of selected lines per family in HRT x 1 \(\leq\) x \(\leq\) sF2
sPYT Number of selected lines per family in PYT x 1 \(\leq\) x \(\leq\) sHRT
sAYT Number of selected lines in AYT x 1 \(\leq\) x \(\leq\) nc \(\times\) sPYT
rPYT Number of replications (environments) in PYT x x \(\geq\) 1
rAYT Number of replications (environments) in AYT x x \(\geq\) 1
rEYT Number of replications (environments) in EYT x x \(\geq\) 1

Please note that a maximum of one GS is allowed. For example, if sel.HRT=GS, then sel.PYT and sel.AYT can no longer have GS.

For our practice, let’s try to create the following breeding programs.

  1. Use SSD and PS in all possible generations.

  2. Use DH and PS in all possible generations.

  3. Select only in the F2/DH.

  4. Use GS in HRT.

  5. Use GS in AYT.

How do these compare? Does the better breeding program cost more? Is there any modification that we can do to reduce the cost and maintain the performance? What happens if we start changing the trait variances and other parameters?

Here is a table of cost associated with the available tasks in the breeding programs.

Generation Task Cost
Founder Make crosses in the greenhouse 100/month (greenhouse) + 1/plant (grow) + 5/cross (cross)
F1 Self to make F2 100/month (greenhouse) + 1/plant (grow) + 1/plant (self)
F2 Self to make F3, select on shape 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) + 5/plant (PS) + 10/plant (MAS)
F3/F4/F5/F6 SSD 100/month (greenhouse) + 1/plant (grow) + 1/plant (self)
F1 Make DH 40/progeny
DH Select on shape 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) + 5/plant (PS) + 10/plant (MAS)
HRT Trial, select on size/color 40/plot (field) + 10/plot (phenotype), 20/line (genotype)
PYT Trial, select on size/color 40/plot (field) + 10/plot (phenotype), 20/line (genotype)
AYT Trial, select on size/color 40/plot (field) + 10/plot (phenotype), 20/line (genotype)
EYT Trial 40/plot (field) + 10/plot (phenotype)

The End

Good luck and have fun. Please email me at if you have any question or bugs to report. Thank you. :)

 

 

Updated on October 30, 2022