Alumni Jake Tibbets researched and authored this article with UC Berkeley’s Department of Nuclear Engineering.
Distributed multisensor networks record multiple data streams that can be used as inputs to machine learning models designed to classify operations relevant to proliferation at nuclear reactors. The goal of this work is to demonstrate methods to assess the importance of each node (a single multisensor) and region (a group of proximate multisensors) to machine learning model performance in a reactor monitoring scenario. This, in turn, provides insight into model behavior, a critical requirement of data-driven applications in nuclear security. Using data collected at the High Flux Isotope Reactor at Oak Ridge National Laboratory via a network of Merlyn multisensors, two different models were trained to classify the reactor’s operational state: a hidden Markov model (HMM), which is simpler and more transparent, and a feed-forward neural network, which is less inherently interpretable. Traditional wrapper methods for feature importance were extended to identify nodes and regions in the multisensor network with strong positive and negative impacts on the classification problem. These spatial-importance algorithms were evaluated on the two different classifiers. The classification accuracy was then improved relative to baseline models via feature selection from 0.583 to 0.839 and from 0.811 ± 0.005 to 0.884 ± 0.004 for the HMM and feed-forward neural network, respectively. While some differences in node and region importance were observed when using different classifiers and wrapper methods, the nodes near the facility’s cooling tower were consistently identified as important—a conclusion further supported by studies on feature importance in decision trees. Node and region importance methods are model-agnostic, inform feature selection for improved model performance, and can provide insight into opaque classification models in the nuclear security domain.