Application of machine learning with data encoding techniques to
predict stratum-level DP

Lohith Annadevula - University of Massachusetts Lowell
S.K. Aghara - University of Massachusetts Lowell
File Attachment
The International Atomic Energy Agency (IAEA) employs well-established statistical methods to assess the effectiveness of its inspection plans on a multi-defect stratum by evaluating defect detection probability (DP). DP is defined as the chance of identifying at least one defect when a defective stratum is subjected to a specific inspection plan. So far, deterministic methods using statistical distributions and a stochastic method using pseudo-random generators have been developed to compute DP within some finite time. The stochastic method is universally applicable to any inspection scenario, and it can generate DP results with user-specified standard error. Initial attempts were made to train machine learning (ML) models on the stochastic DP results and their respective inspection parameters to predict DP. Inspection parameters like item types, instrument types, and identification probabilities vary in length depending on the applied diversion strategy and inspection plan. These variable length parameters pose a major challenge in developing ML models, which require a fixed number of input parameters for training and prediction. The paper explores two ways to convert variable-length parameters to a fixed number of parameters; these are zero-padding and encoding techniques. Zero-padding limits the applicability of models to a few inspection scenarios limiting the variable length parameters to a fixed length, and zeros are used for missing information. On the other hand, Encoding techniques do not limit the model applicability; instead, perform certain operations on the variable length parameters to generate new encoded data with fixed parameters that are used to train ML models. The paper discusses the zero-padding scheme and two different data encoding techniques and compares the performances of ML models trained on said techniques. The R2 scores of zero-padded models and encoded models are evaluated on unseen instances of the test dataset. Upon comparison, show the superior generalization power of encoded models over zero-padded models in predicting DP.