期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2022
卷号:119
期号:9
DOI:10.1073/pnas.2201078119
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:At least with regard to horticultural crops, we are currently experiencing a step change in crop breeding targets. Historically, breeders focused on high-yielding, resilient varieties; however, this has led to considerable dissatisfaction with modern varieties of fruits and vegetables (
1). The recent increasing willingness of consumers to pay a premium for quality is, however, driving a renaissance of breeding for quality traits. That said, flavor is a highly complex composite trait made up of the interactions between the chemical composition of the crop as well as the taste, olfaction, and psychology of the consumer (
2,
3). In recent years, flavor has been assessed by costly consumer sensory panels or by breeders themselves in the field. Both approaches have disadvantages. Field evaluation, while allowing the evaluation of many varieties in a day, is highly subjective and error prone. Although population-based sensory panels are well established and accurate, they are difficult to scale to large breeding programs. These limitations in flavor phenotyping are elegantly addressed via the employment of metabolomic selection–based machine learning in the report by Colantonio et al. (
4).
Machine learning has been gaining increasing traction as a means to analyze various high-throughput phenotyping applications, which enable researchers to identify meaningful patterns in relevant plant data. For example, it has proven utility in two-dimensional light imaging as a proxy for plant biomass, reflectance ratios as proxies for yield, hyperspectral reflectance as proxies for leaf chlorophyll and nitrogen content, and canopy temperatures as proxies for the drought response (reviewed in ref.
5). It is additionally already being used in breeding, particularly in the form of genomic selection, in which genome-wide marker data are used to predict the genetic value of an unobserved candidate in a breeding population via estimating the effects of all markers (
6). A recent refinement of this approach—genome optimization via virtual simulation—simulates a virtual genome encompassing the most abundant advantageous alleles in a genetic pool, thereby helping plot the optimal route for breeding (
7).
In their study, Colantonio et al. (
4) used a combination of metabolomic profiles and consumer taste panel information to train machine learning models such that they can predict how flavorsome a fruit will be from its chemical composition. To do so, they took target metabolomic profiling and consumer panel ratings from previous studies in tomato and blueberry (
1,
3,
8) and used a suite of 18 different statistical and machine learning models to predict various taste sensations, including liking, sweet, sour, and taste intensity. The data used encompassed sugars, acids, volatiles, and the taste sensations mentioned above (as well as umami in the case of tomato). As a first approach, the metabolites were partitioned according to compound class. Interestingly, when the results of the consumer tests were assessed following this partitioning, it was apparent that the proportion of variance of each trait that was explained by the sugars and acids varied across the flavor attributes as well as between species. For instance, while sugars and acids predominantly explain blueberry sweetness, the volatiles were the main contributors to this trait in tomato. Colantonio et al. (
4) next applied 18 statistical and machine learning methods to predict sensory traits from the metabolite levels; the highest prediction accuracies were observed for the XGBoost library, gradient boosting machines, and random forest models, with the XGBoost model recording accuracies of 0.62 to 0.87 across all traits and in both species. Using these approaches, sweetness, flavor intensity, and sourness were the most predictable traits in tomato, and sourness and sweetness were the most predictable traits in blueberry.
While the above-described studies demonstrated the utility of metabolite data for the prediction of taste, metabolomics data are far less easily obtained (both experimentally and with regard to access from databases) than genomics data. That said, using available data encompassing whole-genome sequencing data, chemical profiles, and sensory panel data for 70 varieties of tomato (
1) allowed Colantonio et al. (
4) to evaluate the prediction potential of metabolomic and genomic selections. In order to do so, they used the genomic best linear unbiased prediction method (
9) to predict the consumer sensory ratings from a subset of almost 80,000 single-nucleotide polymorphisms as well as metabolomic information from the same 70 varieties to predict the panel ratings. Metabolomic selection was found to greatly outperform the genomic selection in the prediction of all traits, thereby highlighting the potential of this approach to support breeding (
Fig. 1). Indeed, relatively high accuracies for certain traits could be obtained when using the metabolome data from as few as 50 individuals. Finally, it was demonstrated that BayesA and gradient booster machines were able to identify which sugars, acids, or volatiles enhanced or suppressed consumer sensory perceptions of flavor. These analyses revealed that, for example in tomato, glucose and fructose are the most important sensory perception enhancers, while the volatiles 1-penten-3-one and 2-phenylethanol as well as E-2-pentenal and 4-carene were also important for sweetness with a different (although sometimes overlapping) set of metabolites influencing these sensory perceptions in blueberry.
Fig. 1.
Metabolic-based prediction model in tomato and blueberry. The weighted correlation network analysis of tomato and blueberry metabolites and their clusters is based on biochemical classification. Eighteen statistical and machine learning methods were used to predict fruit flavor based on its chemistry.
As described by the authors, the use of metabolomic selection is an excellent complement to a molecular breeding program since this enables quantitative trait loci (QTL) mapping or genome-wide association studies. As such, the flavor-related metabolites identified by metabolomic selection could be used to identify the causal genes (or at least the genetic locus) that influence their abundance level and create markers for molecular breeding. Indeed, a legion of studies has been published across a wide range of crop species in which metabolic QTL or gene associations have been assessed (
10,
11). That said, while many of these have additionally assessed yield-associated traits, the linkage of this information to the perception of taste or alternatively, to health benefits following consumption remains relatively scarce. Given that metabolite traits often display large variability and low heritability and are subject to complex interaction effects (
12,
13), gaining success in modifying compositional traits involved in either taste or nutrition relies deeply on identifying the correct metabolite targets. The integration of metabolomic selection with marker-assisted selection provides a powerful route to ensure that this identification is correct. While linear regression, random forest models, and partial least square regression methods have been recorded to have variable success in predicting flavor (
12,
14,
15), the machine learning models employed by Colantonio et al. (
4) displayed superior performance over these methods, irrespective of the fruit or trait to which they were applied.
The idea of metabolomics-assisted breeding has long been postulated (
16); however, for this purpose, its cost was often deemed an insurmountable barrier. Taken alongside the rapid development of metabolomics technologies and in particularly, their increased coverage (
17), the results of Colantonio et al. (
4) suggest that when coupled with machine learning, this may no longer be the case. Computation will be a major driver in this process as well as in the perspective use of this approach to improve dietary-based aspects of human health. It can be anticipated that these advances will pave the way to knowledge-informed breeding of both tastier and more nutritious food.