But it Looks so Real! Challenges in Training Models with Synthetic Data for International Safeguards

Zoe Gastelum - Sandia National Laboratories
Timothy Shead - Sandia National Laboratories
Matthew R. Marshall - Sandia National Laboratories

The general unavailability of real international nuclear safeguards data for data science research and development projects has led many researchers in this domain to turn to synthetic and simulated datasets to develop and prove modeling concepts. In recent work, Sandia National Laboratories has developed a large, synthetic, machine learning-validated dataset of images of containers used to transport and store natural and low-enriched uranium hexafluoride – specifically 30B and 48-type cylinders.  The dataset also includes synthetic images of distractor objects such as 55-gallon drums and propane tanks. The purpose of these synthetic images is to address need for safeguards-relevant data to support computer vision research. In our validation process, we faced the canonical challenge of generalizing models trained on synthetic data to make predictions on real-world data. In this paper, we will describe the challenges and observations from our research training models on synthetic images to make predictions on real-world images. We will present our priorities in future research directions using our large, publicly available synthetic image dataset that have the potential to enhance the state of synthetic-to-real research and development.