GBS data and SNP detectionThe sequencing produced an average of 1,700,000 reads persample (range 203,996 to 5,897,687 of good barcoded reads); from these reads atotal of 1,804,668 tags across all 640 plant samples were identified. The SNPanalysis in Tassel5, using the red clover genome assembly(De Vega, et al., 2015) as a reference, resulted in264,927 SNP across all 640 plants.
After substantial clean-up of the data (seeSNP discovery in the methods section), a panel of 12,577 robust polymorphic SNPwere produced, and 10 samples were lost due to poor sequencing results. Ofthese 12,577 SNP, 8,118 were present in the seven chromosomes, and 4,459 werelocated in the unmapped contigs of the assembly. For information the SNP wereidentified as both transition and transversion SNP (Table 4).
Overall the, SNPmarkers showed reasonable levels of heterozygosity; however a small proportionhad greater than expected levels of homozygosity (Figure 2). This is mostlikely due to the GBS analysis, where there is a slight tendency tounderestimate heterozygosity. It is a matter of sequencing depth, and poorlysequenced variants can lead to under estimation of true heterozygosity.
Thedata represented in the rest of this paper concentrates on the 8,118 SNPpresent in the seven chromosomes.Phylogenetic relationships of the ecotype panelThe 8,118 SNP data were grouped according to UPGMA in thepackage Cluster (Figure 3a, Table 5). The change in slope angle in Figure 3aindicated four groups to the data. These groups were Asia (including Iran),Mainland Europe, UK, and the Iberian Peninsula. Using these groupings, a relationshiptree was produced in the same package (Figure 3b). The relationship tree didnot show any distinction between the varieties and the ecotypes; Milvus, Brittaand AberRuby were allocated to group 2 (Mainland Europe), and Broadway andCrossway to group 4 (Iberian Peninsula). However, there was a secondary changein slope angle at k=2, dividing the population into Asia and Europe. There weretwo anomalies in the data classification that of the Italian ecotypes Aa4445and Aa4441 in the clusters represented by Asia and UK respectively (Table 5).
A principal component analysis of the data was coloured byaccession as defined by the relationship tree. The PCA plot (Figure 3c) clearlyshowed that the Asian accessions were separate to the European ones, and that theIberian and UK populations were distinct from Mainland Europe; with the twoanomalous Italian accessions coloured to their proposed groups. The PCAinferred that accession Aa4445 was not part of the Asian cluster, and is geneticallyrelated to Mainland European accessions (red spots in European cluster).STRUCTURE analysis assumed nine groups of ancestry to thepopulation, as inferred by ?K(Figure 3f, Table 6). STRUCURE also described a large secondary peak at K=2, againsubdividing the population into European and Asian accessions. The nine grouppopulation (Figure 2e) could also be classified in general into four sets, ofAsia (group 3), Iberia (group 2 and 8), the UK (group 5), and more ambiguouslyMainland Europe (group 1, 4, 6, 7, 9). Group one consisted in the main ofCroatian and Bosnian accession; group four of Italian and Slovakian accessions;interestingly group seven contained only one accession that of Italy (Aa4445),and accession Aa4441 again was predicted to have shared ancestry with the UKpopulation; group nine was a mix of accessions from Hungary, Poland, The CzechRepublic and Slovakia. These four groups appeared to be related by geographicdistribution, the collection sites all being fairly close together.
However,group six, which contained three of the varieties (Milvus, Britta and AberRuby),consisted of accessions from Northern Europe, Eastern Europe and Italy. Therewere four oddly placed accessions; these were Aa4441 from Italy in group five(UK), and Aa4480 and Aa4487 from Spain and Aa4406 from Pakistan in group six(Europe 3). There were four accessions, from Bosnia (Aa4280), Bulgaria (Aa355,Aa4358) and The Czech Republic (Aa4304), which showed mixed populations ofindividuals. Individuals from these accessions covered groups 1, 4, 6 and 9. AMOVA was used to test the hypothesis that the varietieswere indistinguishable, in terms of genetic diversity and differentiation, fromthe ecotypes. The AMOVA (Table 7) supports this hypothesis, and thedomestication process has not significantly altered the genetic diversity ofthe varieties in comparison to the ecotypes, and that they areindistinguishable from each other.
However,the AMOVA also described the individual accessions to be relatively welldifferentiated as shown by the ? score = 0.14 and ?2 =345.55, and this accounted for 13.66%of the variation seen in the population. Therefore, the AMOVA showed that thereis more variation within the different accessions rather than between them.
This is actually typical of obligate outbreeding populations and was notsurprising. Genetic diversity of the ecotype panelPopulation genetics parametersTable 8 shows the basic statistics produced during theanalysis in Pegas. We evaluated thepossibility of a bottleneck during domestication by comparing observed toexpected heterozygosity and using the FIS values with a nullhypothesis of random mating within each accession. The varieties total showedno decrease in the HO as compared to the HE (HO=0.364, HE=0.
279), this indicated no bottleneck during domestication,and the FIS supported no inbreeding depression as caused by smallpopulations during the breeding process. This was also the case for theanalysis of Milvus, Britta and AberRuby and Broadway and Crossway, whenassessed separately.A breakdownanalysis of the varieties and the ecotypes (Table 8) revealed that thevarieties were more related to each other than were the ecotypes (FST= 0.039 v 0.076 Varieties v Ecotypes total), and there were similar levels of geneticdiversity in both the varieties and ecotypes total (DST = 0.011 v0.02).
The analysis also showed that there was very low levels of geneticdiversity within the samples of Milvus, Britta and AberRuby (DST=0.007)and that they were related populations (FST = 0.022, DEST= 0.
014), this was also the case for Broadway and Crossway (DST=0.004,FST = 0.013, DEST = 0.012) although this was expected asthey are related varieties. The four groups, asdefined in UPGMA, (Table 8) revealed that the ecotypes from Asia were the leastrelated to each other (FST = 0.093, DEST = 0.045) and hadthe highest levels of genetic diversity (DST = 0.
029). Cluster two(Europe) was made up of related individuals (FST = 0.047, DEST= 0.019), that had low levels of genetic diversity (DST = 0.013).Cluster three (UK) mirrored the European analysis in terms of relatedness andgene diversity. Cluster four (Iberian Peninsula) proved to be relatively welldifferentiated (FST = 0.
06, DEST = 0.029), however withlow levels of gene diversity within the samples (DST = 0.029). STRUCTUREanalysis inferred two Iberian populations, this may account for the level ofdifferentiation in these accessions.
Regions under selection: via outlier detectionThree methods were used to detect outlier markers; basic Ridentified 869 loci, BayeScan identified 548 loci and Sam?ada identified 1,021loci. In basic R, outlier SNP of interest included position Chr7_5219303(Figure 4a). This SNP is within gene number 38211, a 7 transmembrane MLO familyprotein that is involved in mildew resistance. There is evidence in the fieldto suggest that the Iranian accessions do have a degree of mildew resistanceover other ecotype accessions. However, this has not been verified underexperimental conditions.
In BayeScan the 584 loci (Figure 4b.) were found in genemodels that have been identified amongst others as transcription factorsregulating growth responses, senescence, seed dormancy, flowering and diseaseresistance. Transcription factors are key regulators of gene expression and mayhave a direct effect on a plants ability to respond to its environment. Theseare therefore important factors when considering selective advantage.
In Sam?ada1,021 SNP loci significantly correlated to the geography of the collectionsites, after Bonferroni correction for both the Wald and G Score. The majorityof the SNP (85.6% of the 1,020 outliers) were deemed to be under the selectiveinfluence of longitude to some extent (Table 9). Regression analysis was performed using the principalcomponents against the geographic co-ordinates.
This analysis used all of theSNP data not just the outliers, and a very high strength of the regression line(r = 0.93) between PC1 and longitude of the collection sites was found (Figure5), the other two components of latitude and altitude were not so stronglycorrelated (data not shown). This is contributable evidence to the outliersbeing under the influence of longitude as a selective force. There were 56common SNP to all three analyses (Figure 4c., Table 10); these SNP arepotentially adaptive loci and are involved in many diverse functions such asRNA silencing, photosynthesis, floral development, plant defence and biotic andabiotic stress. Haplotype reconstructionHaplotypes were examined from the SNP that were found to beoutliers, potentially under selective pressure.
There were too many haplotypevariates for phastPHASE to identify common haplotypes within the ecotype panel,therefore the outlier SNP were analysed by PCA to distinguish haplotypepatterns. The selection of SNP on chromosome 7, that were proposed to beinvolved in mildew resistance were analysed by PCA and graphed. As a control arandom selection of 100 SNP, from chromosome 7 were also graphed. The PCAanalysis was able to separate the haplotypes of accession Aa3507, from the restof the accessions (Figure 6a.), the control SNP showed a clustering pattern thatreflected the population structure PCA (Figure 6.
b). These graphs could indicatefurther evidence that the haplotypes for the Iranian accessions are differentto those from the rest of the ecotype panel (for these selective SNP). Thisprocess was repeated for the 56 common SNP, however due to the strongassociation of longitude no informative clustering was produced. Finally the 548SNP identified from BayeScan were also graphed by PCA, but again no informativeclustering was seen. In both these cases the pattern of SNP reflected thepopulation structure PCA graph.
Linkage disequilibriumThe decay plots and heatmaps (Figure 7) show r2values plotted against physical distance on each of the chromosomes for theentire population. There was a significant peak in the heatmap for chromosomeseven, and further analysis in Synbreed using the varieties and ecotypesseparately, revealed that this peak was evident in the ecotypes only (figure 8).This region is close to SNP Chr7_5219303 that was proposed to be underselection for mildew resistance in the Iranian accessions Aa3506 and Aa3507.The critical r2 for each chromosome wascalculated for the varieties and the ecotypes separately.
For the varietiesthis value ranged from 0.026 – 0.032 per chromosome (Table 11), however in theecotypes this value was much lower at 0.007 – 0.008 (Table 11). In thevarieties some 0.1% (418 of 3219126) of pairs were completely linked with an r2= 1 (Table 12); there were no completely linked pairs in the ecotypes. Theaverage global LD for the varieties was 0.
028 with 80% of the markers pairsbeing in LD. The average global LD for the ecotypes was 0.008 again with 80% ofthe markers pairs being in LD. Phenotypic assessmentDuring the experiment, there was considerable loss of plantmaterial (Table 13). This loss of material was due to poor winter survival andthe inability of the ecotypes to survive the cutting regime.
The plants weregiven a lax cut at the end of each year to over winter. During the second year,two harvests were collected. The number of plants that survived into year three(2017) was little over 30% and so the experiment was scrapped and no furtherdata were collected. This meant that phenotype measurements were collected inone year only; however, two sets of measurements were collected where possible.
Flowering dateFrom the 514 plants that survived into year two, floweringdates were collected. Of these plants, 415 flowered within the time constraintand a further 99 had flowered by day 100, the day of harvest. There was a largerange of flowering across the ecotype panel, with the average being at day 70(Figure 9). All of the plants in three accessions (Britta, Denmark and Finland)did not flower before the cut off day 81; however, they had flowered by day100. These are all Scandinavian in origin, and Britta was developed as a lateflowering variety to maximise vegetative growth (Lundin, et al.
, 1974). As the data were notnormally distributed, it was log10 transformed to improvehomogeneity, and subjected to a GWAS analysis in GAPIT. GAPIT produced akinship matrix and this was used along with the PCA components as covariateswithin the GWAS analysis to account for relatedness and population structure(Figure 10a).
The analysis produced one SNP of interest above the FDR (Figure10b.). This SNP was identified as chr3_22736556, which lies within gene 10558according to the genome assembly, this was a basic leucine zipper transcriptionfactor. The sequence was BLAST searched at NCBI (Figure 11) and was found to besynonymous to the VEG2 gene in pea, which is involved in flowering time andinflorescence development (Sussmilch, et al.
, 2015). An investigation into theeffects of the alleles at this locus was undertaken, and it was found that theminor allele had the greatest effect reducing the time to flowering by 11 days.The average flowering date for homozygous major allele was 73.3 days, forheterozygotes was 66.6 days and for homozygous minor allele was 62.
4 days.Britta, Denmark and Finland genotypes were all homozygous for the major allelefor this SNP. Vegetative analysisIn year two of the experiment two sets of measurements weretaken for plant height and width, and stem number. Box plots were drawn torepresent the spread of the data per country. To investigate the degree ofprostrateness in the panel a width:height ratio was calculated (Figure 12). Thedata described Broadway and Crossway varieties to be quite prostrate, but notas prostrate as the Portuguese accession. The UK and Spanish accessions showedthe next highest average prostrateness. The accessions from Spain and Portugal,contain ecotypes that had the lowest stem heights recorded.
The average numberof stems per plant per country was recorded in Figure 13. Broadway and Crosswayvarieties had the greatest number of stems per plant; this probably reflectsthe plants ability to regrow after cutting. These are average values from twocuttings in year two.