出版社:The Japanese Society for Artificial Intelligence
摘要:We study a regression tree algorithm tailored to casualty insurance pure premium estimation problems. Casualty insurance premium is mainly determined by the expected amount that the insurance companies have to pay for the contract. Therefore, casualy insurance companies have to estimate the expected insurance amount on the basis of insurance risk factors. This problem is formulated as a regression problem, i.e. estimation of conditional mean E[Y|x], where Y is insurance amounts and x is risk factors. In this paper, we aim to implement the regression problem in regression tree framework. The difficulty of the problem lies in the fact that the distribution of insurance amount P(Y|x) is highly skewed and exhibits a long-tail toward positive direction. Conventional least-square-error regression tree algorithm is notoriously unstable under such long-tailed error distribution. On the other hand, several types of robust regression trees, such as least-absolute-error regression tree, are neither appropriate in this situation because they yields significant bias to conditional mean E[Y|x]. In this paper, we propose a two-stage tree fitting algorithm. In the first stage, the algorithm constructs a quantile tree, a kind of robust regression tree, which is stable but biased to conditional mean E[Y|x]. In the second stage, the algorithm corrects the bias using least-square error regression tree. We discuss the theoretical background of the algorithm and empirically investigate the performances. We applied the proposed algorithm to a car insurance data set of 318,564 records provided from a north-american insurance company and obtained significantly better results than conventional regression tree algorithm.
关键词:casualty insurance premium estimation ; long-tail distribution ; regression tree ; robust estimation