Building Performance Evaluation Framework of Foundation Models for Nonproliferation

Alexei N Skurikhin - Los Alamos National Laboratory
Garrison Flynn - Los Alamos National Laboratory
Michael A. Geyer - Los Alamos National Laboratory
Giri R. Gopalan - Los Alamos National Laboratory
Natalie E. Klein - Los Alamos National Laboratory
Juston S. Moore - Los Alamos National Laboratory
Mark G. Myshatyn - Los Alamos National Laboratory
Nidhi K. Parikh - Los Alamos National Laboratory
Rosalyn C. Rael - Los Alamos National Laboratory
Selma L. Wanna - Los Alamos National Laboratory
Emily Casleton - Los Alamos National Laboratory
File Attachment
Recent progress in AI has culminated in foundation models (FMs) that can facilitate the development of innovative approaches for nuclear verification and geographic profiling of activities of interest. FMs are large-scale deep learning neural network models (e.g., transformer models) that are trained on very large general datasets and can then be tuned to a wide range of downstream tasks with relatively little additional task-specific training. FMs have already demonstrated a huge impact in natural language processing and are increasingly used in computer vision for tasks such as image-totext mapping, image retrieval, and image tagging. However, while FMs are powerful models, their adaptation to the domain of nuclear nonproliferation comes with potential limitations due to inadequate quality and variety of data, as well as a possibility of bias in the data used to train the original FM. Test & evaluation (T&E) of FMs, including quantification of uncertainty, is crucial for nonproliferation applications, where there are unique challenges such as unavailability of all the modalities all the time, unequal distribution of information across modalities, and unequal distribution of annotated data across different modalities. The paper will present an overview of T&E approaches for FMs, and issues such as computational complexity, scalability and deployability. The paper will also discuss T&E of FMs for computer vision to solve downstream tasks such as land-use, scene and image classification, object detection, localization, and segmentation, which are essential for the characterization of objects and activities of interest. Finally, we will consider an application of transformer models to scene classification using satellite imagery and compare transformers to convolutional neural networks using T&E metrics.