摘要:Confrontation of climate models with observationally-based reference datasets is widespread and integral to model development. These comparisons yield skill metrics quantifying the mismatch between simulated and reference values and also involve analyst choices, or meta-parameters, in structuring the analysis. Here, we systematically vary five such meta-parameters (reference dataset, spatial resolution, regridding approach, land mask, and time period) in evaluating evapotranspiration (ET) from eight CMIP5 models in a factorial design that yields 68 700 intercomparisons. The results show that while model–data comparisons can provide some feedback on overall model performance, model ranks are ambiguous and inferred model skill and rank are highly sensitive to the choice of meta-parameters for all models. This suggests that model skill and rank are best represented probabilistically rather than as scalar values. For this case study, the choice of reference dataset is found to have a dominant influence on inferred model skill, even larger than the choice of model itself. This is primarily due to large differences between reference datasets, indicating that further work in developing a community-accepted standard ET reference dataset is crucial in order to decrease ambiguity in model skill.