文章基本信息

标题：Construction of a parametric model of competitive access in relational databases by using a random forest method
本地全文：下载
作者：Dmitry Gromey ; Eugene Lebedenko ; Dmitry Nikolaev 等
期刊名称：Eastern-European Journal of Enterprise Technologies
印刷版ISSN：1729-3774
电子版ISSN：1729-4061
出版年度：2019
卷号：3
期号：2
页码：15-24
DOI：10.15587/1729-4061.2019.170071
语种：English
出版社：PC Technology Center
摘要：We have considered the task on modeling a request execution time in autonomous relational databases with competitive queries. The shortcomings of existing approaches have been specified, which ignore the cost of the share of successive operations in the cooperative access to data in a memory hierarchy. We have examined the issue of the application of relative cost for the implementation of components in the operations of a plan of the query, instead of calculating the predicted time of computation.A technique has been proposed for the formal construction of precedents for a training sample, as well as the approach to building a regression model. The developed modification of the machine learning method random forest is used for calculating the request execution time based on their texts and temporary marks of the start, duration of execution.The constructed parametrical model of competitive access to data is required for obtaining accurate estimates of request execution time when using parallel computations. Models with such characteristics are needed to solve the problems on automated management of a physical data scheme, for building self-identifiable DBMS. The key differences from existing approaches are the application of a request execution time as the target value, accounting the values of predicates and mutual influence of requests that are executed in parallel.To confirm the results obtained, a simulation model has been used based on the widely known test TPC-C. The used function of loss, taking into consideration the regression nature of the model, was the ratio of the sum of modules of difference between the actual and obtained time to the actual time. The check itself was carried out based on a reference sample, generated for the increasing length of training at postponed data. In the course of this study we have proved a possibility to apply the machine learning method random forest for processing statistical data on the execution of SQL queries. The result obtained is promising for such an approach and makes it possible to derive the parametric models of competitive request processing.
关键词：autonomous systems of database management;self-identifiable databases;random forest;competitive access;parallel computing in relational database management systems.