摘要:The goal of this study was to evaluate the maturity of current Deep Learning classification techniques for their application in a real maternal-fetal clinical environment. A large dataset of routinely acquired maternal-fetal screening ultrasound images (which will be made publicly available) was collected from two different hospitals by several operators and ultrasound machines. All images were manually labeled by an expert maternal fetal clinician. Images were divided into 6 classes: four of the most widely used fetal anatomical planes (Abdomen, Brain, Femur and Thorax), the mother’s cervix (widely used for prematurity screening) and a general category to include any other less common image plane. Fetal brain images were further categorized into the 3 most common fetal brain planes (Trans-thalamic, Trans-cerebellum, Trans-ventricular) to judge fine grain categorization performance. The final dataset is comprised of over 12,400 images from 1,792 patients, making it the largest ultrasound dataset to date. We then evaluated a wide variety of state-of-the-art deep Convolutional Neural Networks on this dataset and analyzed results in depth, comparing the computational models to research technicians, which are the ones currently performing the task daily. Results indicate for the first time that computational models have similar performance compared to humans when classifying common planes in human fetal examination. However, the dataset leaves the door open on future research to further improve results, especially on fine-grained plane categorization.