Data Augmentations for Nuclear Feature Extraction in
Semi-Supervised Contrastive Machine Learning

Jordan Stomps - University of Wisconsin–Madison
Paul Wilson - University of Wisconsin-Madison
Ken Dayman - Oak Ridge National Laboratory
Michael Willis - Oak Ridge National Laboratory
James Ghawaly - Oak Ridge National Laboratory
Dan Archer - Oak Ridge National Laboratory
File Attachment
Persistent radiation monitoring can be used as a powerful tool for detecting movements of nuclear material in a variety of use cases and nuclear nonproliferation scenarios. Existing gamma-ray detection systems can collect large volumes of data that can potentially be used in machine learning algorithms for anomaly detection or classification. However, the domain expertise and/or computational costs required to label sufficient radiation data (i.e. identify constituent nuclides) for machine learning may be prohibitive. Semisupervised machine learning alleviates the cost of labeling by learning from the limited attributed data and a larger unlabeled corpus. One method, contrastive learning, learns patterns in a self-supervised manner by using a set of data augmentations to perturb data in ways that should not alter the inferred labels and enforces maximum agreement between pairs of samples. Whereas contrastive learning is traditionally conducted on images, where valid transformations are maturely developed and intuitively understood, this work endeavors to design and apply valid data augmentations for nuclear radiation data based on the underlying physics. That is, appropriate transformations should reflect realistic radiation detector physics and maintain classification information for radiation signatures. A non-exhaustive set of six transformations are presented, ranging from channel resampling, masking, and nuclear interactions to perturbing the signal-to-background ratio, detector resolution, and gain shift. These augmentations are tailored for physical measurements, rather than just simulated data. Demonstration is conducted using radiation measurements from sodium iodide detectors deployed around the High-Flux Isotope Reactor and the Radiochemical Engineering Development Center at Oak Ridge National Laboratory. These transformations are intended to be used in a contrastive learning framework trained to identify anomalous spectra, fine-tuned using a set of manually characterized samples. The ideal result is a model that reduces the burden of labeling training data while still utilizing measurements taken, reflecting value in unlabeled data.