The national security establishment is racing to adopt artificial intelligence in nearly every aspect of operations, from processing payroll to processing disparate battlefield information into a cohesive whole, such as in the Pentagon’s Joint All Domain Command and Control effort to network otherwise separated operational “nodes” to one another in warfare to optimize and streamline attack.
However, training AI systems to recognize the things they are meant to recognize requires vast, even seemingly limitless volumes of annotated data.
As promising AI is, an AI system is only as effective as its training data.
At the moment, there seem to be few barriers to AI and its promise for the future, yet an actual AI-system is only as effective as its database. While many advanced systems are working on real-time analytics and various kinds of accelerated machine learning, an AI database might encounter glitches or errors should it encounter something it does not recognize or does not have any informational basis upon which to perform an analysis. Trusting the validity of algorithmic determinations can therefore present complexities because, simply put, an AI database or set of interpretive algorithms need to be trained. Oftentimes, an AI system can discern meaning or make determinations based on context, such as understanding the difference between a “ball” at which people dance and a “ball” that people kick by analyzing the surrounding words. However, in order to perform this function, the system itself needs to have learned all of the words and how they relate to one another.
“AI is only as good as the labeled data that trains those AI models,” said Manu Sharma, CEO and co-founder of Labelbox, a training data platform now supporting the intelligence community (IC).
Therefore, in order to optimize AI functionality, or simply enable it, raw data needs to be “converted into algorithmically consumable data,” a process that usually requires human intervention, Sharma explains.
Accordingly, new information arriving into an AI-enabled system needs to be ingested, learned, identified, and accurately categorized, a process data labelers describe in terms of an “iteration loop.” This loop needs to generate data that is labeled, identified and integrated into a larger system, in order to enable functional computer analytics to perform the requisite functions.
“Labeling data isn’t a one-off, brute force affair; it requires a continuous iteration loop to deliver mission-critical AI into production and keep it there. Human labor alone won’t suffice. That’s why agencies such as NASA, the Department of Energy, and the Department of Defense’s Joint Artificial Intelligence Center are using a versatile and complete training data platform to speed iteration cycles and keep annotation quality high,” Sharma says.
This kind of cataloging is necessary for an AI-enabled system to determine things like context, or arrive at an accurate conclusion by virtue of assessing and interpreting a variety of otherwise disconnected variables both individually and collectively in relation to one another.
In-Q-Tel, the intelligence communities’ investment entity, saw this as a fundamental roadblock to U.S. AI competitiveness and has invested in training data platforms to enable teams across the government to collaborate on their training data and drive faster iteration cycles.
“Core to the enablement of AI based machine learning algorithms for our Intelligence Community and National Security partners is the need to accurately and cost-effectively label vast amounts of training data,” commented George Hoyem, Managing Director at In-Q-Tel.
In order to bring this effort to fruition, the IC and I-Q-Tel are working with a training data platform called Labelbox which is working on engineering and refining an effective and continuous data labeling iteration loop.
Hoyem further commented. “Labelbox, offers our partners a state-of-the-art data annotation and data labeling platform for our IC partners to quickly and cost effectively label their AI training data.”
“Large teams of people are needed to hand label data, creating bounding boxes around objects in images or drawing lines or points between words in text. As with any human endeavor at scale, ensuring quality and consistency is a challenge,” Sharma explained.
Removing this challenge is what Labelbox is seeking to accomplish.
AI is already massively shortening the time frame needed for human decision-makers to sift through or organize hours of raw video feeds to find moments of tactical significance, streamlining the Processing Exploitation Dissemination process in an exponential fashion. Instead of needing to go through hours of raw data, AI-enabled computer algorithms can instantly, find, identify, and transmit those items, objects or occurrences most relevant to humans performing command and control. Automation, therefore, or certain kinds of computer-enabled processing, saves time, reduces human error and increases organizational efficiency.
Much of this seems to involve migrating a trusted and somewhat established process to the task of data labeling, as AI-empowered computer automation is evolving quickly and informs the Pentagon’s Joint All Domain Command and Control data-sharing effort. The core concept of JADC2 is to gather otherwise disparate pools of sensor data, analyze them in relation to one another, share them across a network of combat platforms or “nodes” to instantly complete a sensor-to-shooter pairing and get ahead of or inside an enemy’s decision-making cycle.
For example, an F-35s Mission Data Files consist of a data library of known threats specific to geographical regions throughout the world. Once a targeting sensor discerns an image or rendering of a possible threat, it bounces the image off of the compiled Mission Data Files to instantly identify what the threat is. The system itself can continuously learn and expand by virtue of ingesting and comparing new data against its library to make determinations, yet it is only as effective as the scope, depth, and accuracy of its existing training data. This training data consists of “labeled” data, a process expedited through processes being developed by In-Q-Tel and its partners such as Labelbox.
Kris Osborn is the defense editor for the National Interest. Osborn previously served at the Pentagon as a Highly Qualified Expert with the Office of the Assistant Secretary of the Army—Acquisition, Logistics & Technology. Osborn has also worked as an anchor and on-air military specialist at national TV networks. He has appeared as a guest military expert on Fox News, MSNBC, The Military Channel, and The History Channel. He also has a Masters Degree in Comparative Literature from Columbia University.