Breeding exercise instructions
Lecture 7: Conventional & Advanced Breeding
2022-10-30
Introduction
In this breeding exercise, we will explore a single breeding cycle that combines some of the methods you have learned in previous lectures. These methods include pedigree breeding (PB), single seed descent (SSD), doubled haploid (DH), marker assisted selection (MAS) and genomic selection (GS). The breeding cycle for inbred varieties is described in the figure below:
Notes:
HRT: Head Row Trial.
PYT: Preliminary Yield Trial.
AYT: Advanced Yield Trial.
EYT: Elite Yield Trial.
SSD: Single Seed Descent.
DH: Doubled Haploid.
NS: Random Selection.
PS: Phenotypic Selection.
MAS: Marker Assisted Selection.
GS: Genomic Selection.
Here are some information about the fictitious species that we are working with.
This species has 5 chromosomes with genetic lengths of 200, 180, 160, 140 and 120 centiMorgans (cM). This species can be selfed or crossed easily, and the founders/lines/varieties are often inbreds. We are interested in 3 traits (size, color and shape) and we want to breed for higher trait values in all 3. Size and color are quantitative traits (i.e. controlled by many loci), while shape is a qualitative trait (i.e. controlled by few loci).
Somehow, we know that size and color are correlated and controlled by the same 800 randomly chosen loci. On average, these loci are 1 cM apart. The effects of these loci on size and color are drawn randomly from a normal distribution. We also know that shape is controlled by a subset of 10 loci, and the effects of these loci on shape are the opposite of the effects on size. Therefore, this creates a penalty toward size when selecting for higher shape value. Unlike size and color, the effects on shape are the same in magnitude for all 10 loci.
Fortunately, genomic is available for this species. We have a genotyping array with 8,000 randomly distributed markers across the genome that can be used in GS. Thanks to many other academic researchers who have poured their souls into mapping QTLs for the shape trait, we have all 10 markers that are perfectly linked to the shape loci. These markers can be readily used for MAS.
The life cycle of this species is 12 months, and it can be shortened to 3 months under SSD. This species is amenable to DH as well. Size and color can only be scored after flowering, while shape can be scored after 3 months post-germination under normal growth condition.
This species is fairly small and can be easily managed in greenhouses in the early stages (founders to F6/DH). The breeding program then moves to the field in the later stages beginning with HRT. The field plot for all trials is normally a 1m2 square, except for HRT where the plot is 0.2m2. We assume that the residual/error variance per plot in HRT to be 5 times larger than any other trials. In the greenhouse setting, measurements are often taken on single plants. We assume that the residual/error variance per plant to be 10 times larger than non-HRT trials.
OK, enough of the species background, let’s get started on the breeding exercise.
Quick setup
We will be using R and a little bit of MS Excel for this breeding exercise. We need the following three R packages:
AlphaSimR
to simulate populations and breeding programs. For more information, please refer to its user manual. If you are interested, there is a free online course on this package.
rrBLUP
to perform genomic prediction. For more information, please refer to its user manual.
ggplot2
to plot the results. For more information, please refer to its home page.
If you don’t already have these R packages installed, please do so using the following scripts.
install.packages("AlphaSimR")
install.packages("rrBLUP")
install.packages("ggplot2")
Next, we need to set a working directory.
# this is just an example, please adjust accordingly.
setwd("C:/Users/cyang/Desktop/")
Unfortunately, I do not have time to compile the breeding exercise as an R package. For now, please got to https://cjyang-work.github.io/breeding_exercise to download the R functions, scripts and CSV files. Once you have downloaded them, please place them in the same working directory that you have just set.
Now, load the R script file by running the following:
source("breeding_exercise_v2.R")
## --------------------------------------------------
## --- Breeding exercise version 0.2 (2022-10-30) ---
## ----- -----
## --- Questions/Issues? Contact cyang@sruc.ac.uk ---
## --------------------------------------------------
The R script file contains 6 functions: BP.eval
,
BP.plot
, BP.check
, cost.calc
,
gen.pred
and var2
. These functions use the
three previously mentioned packages to perform the simulation analyses
on various breeding programs that you will try. To check if you have
loaded the R script file correctly, run the following and you should see
the functions listed in the R console:
ls()
## [1] "BP.check" "BP.eval" "BP.plot" "cost.calc" "gen.pred"
## [6] "ped.label" "ped.segment" "var2"
Simulating breeding programs
The BP.eval
function has everything we need to simulate
the breeding program. There are many parameters (function arguments)
that go into BP.eval
. To make it easier, we can set most of
them in the CSV file that you have just downloaded. The CSV file is best
viewed in MS Excel. It currently has 3 rows and 15 columns. Each row
specifies a breeding program, and each column specifies the required
parameters. For now, we will ignore what these mean, start with the
defaults, and return to these later. Please close the CSV file and go
back to R.
Please run the following scripts in R. This may take a few minutes.
BP <- read.csv("BP.csv", as.is=TRUE)
out <- BP.eval(va=c(1,1,1), vae=c(0,0,0), vres=c(1,1,1), ca=0.5, ew=c(1,1), BP=BP, nsim=10)
Next, let’s compare the differences among the 3 breeding programs. The results are shown in plots. Since the plots are rather big, they are saved in the current working directory as “breeding_exercise_XXXXXX.png”. Note that XXXXXX refers to the time when the plots are saved to prevent overwriting the results.
BP.plot(out=out, threshold=c(2, 2, 2), nsim=10)
The plots are saved in 3-by-3 panels.
Each row panel corresponds to the traits (size, color and shape).
Each column panel corresponds to the trials (PYT, AYT, EYT) where the traits were scored.
X-axis is the breeding program.
Y-axis is the phenotypic trait values.
Blue horizontal line is the trait mean in the founders.
Red horizontal line is the threshold for breeding target.
Point is the trait mean of each simulation, and the associated vertical line is the range.
From the plots, there are some that pass the threshold. But, we do not know if any of these pass all 3 thresholds. To check for that, we need to use the following function.
check <- BP.check(out=out, threshold=c(2, 2, 2))
## Program 1 Program 2 Program 3
## F1 2700 2700 2700
## F2/DH 58800 108800 108800
## HRT 40000 50000 50000
## PYT 80000 100000 100000
## AYT 32000 40000 40000
## EYT1 8000 8000 8000
## EYT2 8000 8000 8000
## Total 229500 317500 317500
## Out of 200 EYT lines x simulations, the following pass the thresholds for size (2), color (2) and shape (2).
## Program 1: 0
## Program 2: 2
## Program 3: 3
Here, we see that Program 1 has none, Program 2 has 2, and Program 3 has 3. Ignore the cost table for now, we will get back to it by the end of the next section.
Practice Time
Now that we have gone through the default examples, we can further
explore other breeding programs. To help guide your choice of breeding
programs, below is a table of descriptions and accepted values for each
parameter that goes into BP.eval
.
Parameter | Description | Format | Accepted Values |
---|---|---|---|
va | Additive genetic variances (founders) | c(x,x,x) | x \(\geq\) 0 |
vae | Additive x Environmental variances (founders) | c(x,x,x) | x \(\geq\) 0 |
vres | Residual variances (per plot) | c(x,x,x) | x \(\geq\) 0 |
ca | Additive genetic correlation between size and color | x | -1 \(\leq\) x \(\leq\) 1 |
ew | Economic weight for selection on size and color | c(x,x) | any |
nsim | Number of simulations | x | x \(>\) 1 |
BP | Additional parameters | data.frame | See next table |
Here are the parameters required for BP
.
Parameter | Description | Format | Accepted Values |
---|---|---|---|
program | Breeding program | x | 1,2,3, … |
nc | Number of random crosses among founders | x | 100 \(\leq\) x \(\leq\) 10000 |
nF2 | Number of F2 progeny (or DH) per F1 family | x | 10 \(\leq\) x \(\leq\) 1000 |
method | Inbreeding method | x | SSD, DH |
sel.F2 | Selection method in F2/DH | x | NS, PS, MAS |
sel.HRT | Selection method in HRT | x | NS, PS, GS |
sel.PYT | Selection method in PYT | x | NS, PS, GS |
sel.AYT | Selection method in AYT | x | NS, PS, GS |
sF2 | Number of selected F2/DH per family | x | 1 \(\leq\) x \(\leq\) nF2 |
sHRT | Number of selected lines per family in HRT | x | 1 \(\leq\) x \(\leq\) sF2 |
sPYT | Number of selected lines per family in PYT | x | 1 \(\leq\) x \(\leq\) sHRT |
sAYT | Number of selected lines in AYT | x | 1 \(\leq\) x \(\leq\) nc \(\times\) sPYT |
rPYT | Number of replications (environments) in PYT | x | x \(\geq\) 1 |
rAYT | Number of replications (environments) in AYT | x | x \(\geq\) 1 |
rEYT | Number of replications (environments) in EYT | x | x \(\geq\) 1 |
Please note that a maximum of one GS is allowed. For example,
if sel.HRT=GS
, then sel.PYT
and
sel.AYT
can no longer have GS.
For our practice, let’s try to create the following breeding programs.
Use SSD and PS in all possible generations.
Use DH and PS in all possible generations.
Select only in the F2/DH.
Use GS in HRT.
Use GS in AYT.
How do these compare? Does the better breeding program cost more? Is there any modification that we can do to reduce the cost and maintain the performance? What happens if we start changing the trait variances and other parameters?
Here is a table of cost associated with the available tasks in the breeding programs.
Generation | Task | Cost |
---|---|---|
Founder | Make crosses in the greenhouse | 100/month (greenhouse) + 1/plant (grow) + 5/cross (cross) |
F1 | Self to make F2 | 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) |
F2 | Self to make F3, select on shape | 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) + 5/plant (PS) + 10/plant (MAS) |
F3/F4/F5/F6 | SSD | 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) |
F1 | Make DH | 40/progeny |
DH | Select on shape | 100/month (greenhouse) + 1/plant (grow) + 1/plant (self) + 5/plant (PS) + 10/plant (MAS) |
HRT | Trial, select on size/color | 40/plot (field) + 10/plot (phenotype), 20/line (genotype) |
PYT | Trial, select on size/color | 40/plot (field) + 10/plot (phenotype), 20/line (genotype) |
AYT | Trial, select on size/color | 40/plot (field) + 10/plot (phenotype), 20/line (genotype) |
EYT | Trial | 40/plot (field) + 10/plot (phenotype) |
The End
Good luck and have fun. Please email me at cyang@sruc.ac.uk if you have any question or bugs to report. Thank you. :)