摘要:the development of a materials synthesis route is usually based on heuristics and experience. a possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials . However, this route is impeded by the lack of a large-scale database of synthesis formulations . In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution- based synthesis procedures extracted from the scientifc literature . Each procedure contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes . Every procedure is also augmented with the reaction formula . Through this work, we are making freely available the frst large dataset of solution-based inorganic materials synthesis procedures.