In its Safeguards role, the IAEA collects data from various sources: State declarations, results of inspectors’ verification activities, open source information, and commercial satellite imagery. This collection of data serves the purpose of providing credible conclusions on the peaceful use of nuclear material by States. The Nuclear Fuel Cycle (NFC) Information analysis section of the Department of Safeguards has a unique role in data evaluation, and develops statistical analysis and data visualization tools. This is all the more necessary in a context of increased number of facilities and complexity of nuclear cycles. However, due to safeguards data confidentiality, it is impossible to share real data with research partners. To address this gap, we have developed a framework called the Nuclear Solar System that is designed to generate synthetic and obfuscated but realistic data. This can facilitate data sharing with research partners without providing access to original or sensitive data. We use a classification and regression tree (CART) model trained on historic data to generate new synthetic data across the NFC. Confidentiality is maintained by both removing identifying features from model training data, as well as CART model parameter optimization to prevent real data entering the synthetic set. Early results have demonstrated an ability to match historic data distributions and trends, thus making the synthetic data demonstrably realistic and usable for method development and research while fully maintaining data confidentiality.
Year
2022
Abstract