摘要:We construct a statistical model predicting wind vector over very complex terrain characterized by low wind speeds and changeable wind directions. These are necessary inputs for atmospheric dispersion modelling of hypothetical radioactive pollution events in short-term or medium-term future to better protect the local population. The statistical model uses predictions of a numerical weather prediction model as some of its inputs, so they together form a hybrid model. The statistical model is realized as a nonlinear autoregressive exogenous model whose dynamics is described with a Gaussian process model. It relies on training data, and there is more training data available than the computing system is able to process. One possibility of avoiding this issue is to use a randomly selected subset of the available historical measurements as the training data. However, a better choice of training data may result in a model that performs better. We develop and test a smart training set selection method that selects the training data points based on Euclidean distances between them. The resulting model improvement is insignificant and inconsistent. We explore the reasons for underperformance of the method. We conclude that our example does not offer much opportunity for training set selection methods to achieve better results than random selection.