摘要:SummaryThe COVID-19 pandemic has caused devastating economic and social disruption. This has led to a nationwide call for models to predict hospitalization and severe illness in patients with COVID-19 to inform the distribution of limited healthcare resources. To address this challenge, we propose a machine learning model, MedML, to conduct the hospitalization and severity prediction for the pediatric population using electronic health records. MedML extracts the most predictive features based on medical knowledge and propensity scores from over 6 million medical concepts and incorporates the inter-feature relationships in medical knowledge graphs via graph neural networks. We evaluate MedML on the National Cohort Collaborative (N3C) dataset. MedML achieves up to a 7% higher AUROC and 14% higher AUPRC compared to the best baseline machine learning models. MedML is a new machine learnig framework to incorporate clinical domain knowledge and is more predictive and explainable than current data-driven methods.Graphical abstractDisplay OmittedHighlights•MedML extracts the most predictive features from over 6 million medical concepts•MedML fuses medical knowledge into machine learning models via graph neural networks•MedML outperforms other methods on pediatric COVID-19 predictions using N3C datasetRespiratory medicine; Pediatrics; Artificial intelligence; Artificial intelligence applications