摘要:Never before in history is the data growing at such a high volume, variety and velocity. It not only provides multi-sources of information for people to discover useful, important and valuable nuggets of information, but also increases the difficulty in finding such nuggets in almost all fields. Particularly, the field of healthcare is known for its dominical or ontological complexity and variety of clinical data or medical data regarding its variable data standards and data quality and so as the high data dimensionality. In order to effectively use the data at the hand to improve healthcare outcomes and processes, this paper illustrates a model called Risk Factor Detection and Disease Prediction (RFD-DP) model. The model incorporates statistics, data mining and MapReduce techniques on high dimensional clinical data to detect risk factors and generate predicator for a specified disease, hypertension disease. The experimental results indicate that the proposed model outperforms traditional feature selection and classification methods in terms of accuracy, F-score, and AUC. Consequently, the proposed model is promising to be applied to healthcare system.