STAR: Software Toolkit for Analysis Research

Year
1993
Author(s)
Justin E. Doak - Los Alamos National Laboratory
R. Whiteson - Los Alamos National Laboratory
B. Hoffbauer - Los Alamos National Laboratory
B. Hoffbauer - Los Alamos National Laboratory
J. Doak - Los Alamos National Laboratory
J. Prommel - Los Alamos National Laboratory
T.R. Thomas - Los Alamos National Laboratory
Abstract
Analyzing vast quantities of data from diverse information sources is an increasingly important element for nonproliferation and arms control analysis. Much of the work in this area has used human analysts to assimilate, integrate, and interpret complex information gathered from various sources. With the advent of fast computers, we now have the capability to automate this process thereby shifting this burden away from humans. In addition, there now exist huge data storage capabilities which have made it possible to formulate large integrated databases comprising many terabytes of information spanning a variety of subjects. We are currently designing a Software Toolkit for Analysis Research (STAR) to address these issues. The goal of STAR is to produce a research tool that facilitates the development and interchange of algorithms for locating phenomena of interest to nonproliferation and arms control experts. One major component deals with the preparation of information. The ability to manage and effectively transform raw data into a meaningful form is a prerequisite for analysis by any methodology. The relevant information to be analyzed can be either unstructured text (e.g. journal articles), structured data, signals, or images. Text can be numerical and/or character, stored in raw data files, databases, streams of bytes, or compressed into bits in formats ranging from fixed, to character-delimited, to a count followed by content. The data can be analyzed in real-time or batch mode. Once the data are preprocessed, different analysis techniques can be applied. Some are built using expert knowledge. Others are trained using data collected over a period of time. Currently, we are considering three classes of analyzers for use in our software *This work is supported by the U.S. Department of Energy, Office of Arms Control and Nonproliferation. toolkit: 1) traditional machine learning techniques, 2) the purely statistical system, and 3) expert systems