Scale AI Archives | DefenseScoop

Combatant commands to get new generative AI tech for operational planning, wargaming

Jon Harper — Wed, 05 Mar 2025 14:00:00 +0000

The U.S. military’s Indo-Pacific Command and European Command are first in line to receive new generative artificial intelligence capabilities delivered by Scale AI and its industry partners via the Thunderforge initiative, the Defense Innovation Unit announced Wednesday.

DIU — a Silicon Valley-headquartered organization which has embedded personnel at Indo-Pacom and Eucom to help tackle some of the combatant commands’ tech-related challenges.

On Wednesday, DIU announced that Scale AI was awarded a prototype contract for the new Thunderforge capability, which will include the company’s agentic applications, Anduril’s Lattice software platform and Microsoft’s large language model technology.

“The Thunderforge technology solution will provide AI-assisted planning capabilities, decision support tools, and automated workflows, enabling military planners to navigate evolving operational environments. By leveraging advanced large language models (LLMs), AI-driven simulations, and interactive agent-based wargaming, Thunderforge will enhance how the U.S. military prepares for and executes operations,” the unit said in a release.

DIU issued a solicitation for the program last year via its commercial solutions opening contracting mechanism.

“The joint planning process is complex, time-consuming, and resource-intensive. Planners and other staff members must synthesize large amounts of information from diverse sources, consider multiple courses of action (COA), and produce detailed operational plans and orders – often under significant time pressure. As the operational environment becomes more complex and dynamic, there is a need to accelerate and enhance joint planning capabilities while maintaining rigor and human judgment,” the document stated.

In a statement Wednesday, Bryce Goodman, Thunderforge program lead and contractor with DIU, noted that current military planning processes rely on decades-old technology and methodologies.

The U.S. military wants new tech that can quickly ingest, process and summarize large volumes of information relevant to military planning; identify key insights, patterns and relationships; produce draft operations plans, concept plans and operations orders; and perform automated wargaming of courses of action and provide comparative analysis of advantages, disadvantages and risks.

“Our AI solutions will transform today’s military operating process and modernize American defense. Working together with DIU, Combatant Commands, and our industry partners, we will lead the Joint Force in integrating AI into operational decision-making. DIU’s enhanced speed will provide our nation’s military leaders with the greatest technological advantage,” Scale AI founder and CEO Alexandr Wang said in a statement.

According to DIU, initial deployments of the system to Indo-Pacom and Eucom are expected to support “mission-critical” planning activities such as campaign development, theater-wide resource allocation and strategic assessment.

If the tech meets expectations, plans call for scaling the Thunderforge capability across the U.S. military’s combatant commands in the future.

The post Combatant commands to get new generative AI tech for operational planning, wargaming appeared first on DefenseScoop.

Scale AI unveils ‘Defense Llama’ large language model for national security users

Brandi Vincent — Mon, 04 Nov 2024 22:59:03 +0000

Credentialed U.S. military and national security officials are experimenting and engaging in multiple classified environments with Defense Llama — a powerful new large language model that Scale AI configured and fine-tuned over the last year from Meta’s Llama 3 LLM — to adopt generative AI for their distinctive missions, like combat planning and intelligence operations.

Dan Tadross, Scale AI’s head of federal delivery and a Marine Corps reservist, briefed DefenseScoop on the making and envisioned impacts of this new custom-for-the-military model in an exclusive interview and technology demonstration on Monday.

“There are already some users from combatant commands and other military groups that are able to leverage this on certain networks,” he explained at Scale AI’s office in Washington.

Large language models and the overarching field of generative AI encompass emerging and already-disruptive technologies that can produce (convincing but not always accurate) text, software code, images and other media — based on human prompts.

This quickly evolving realm presents major opportunities for the Defense Department, while simultaneously posing uncertain and serious potential challenges.

Last year, Pentagon leadership formed a temporary task force to accelerate DOD components’ grasp, oversight and deployments of generative AI. More recently, the department and other agencies were delivered new directives regarding pursuing the advanced technology in various provisions in the Biden administration’s new National Security Memo (NSM) on AI issued last month.

“We are still looking at ways to provide more enterprise support, especially as things like the NSM that was just released. That’s one of the areas that we’re leaning forward on being able to try and help support the DOD’s adoption of this technology, again, in a responsible manner,” Tadross said.

Notably, Scale AI’s demo occurred the same day that Meta revealed that it’s making its Llama models available to U.S. government agencies — and explicitly those that are working on defense and national security applications — with support from other commercial partners including Scale AI. Also on Monday, OpenAI unveiled its first limited ChatGPT Enterprise partnership with DOD, which will enable its generative capabilities’ use on unclassified systems and data.

These announcements follow research and reports that recently surfaced suggesting that Chinese researchers linked to the People’s Liberation Army applied Meta’s open source Llama model to create an AI asset that presents the possibility for military applications.

“There’s always a concern [about] the risk appetite. My perspective on this is that the risk of not adopting these technologies is actually greater than adopting them in a measured and responsible way,” Tadross told DefenseScoop.

In some ways, he said, Scale AI’s Defense Llama stems from the company’s still-unfolding test and evaluation and other experimental efforts with DOD partners in combatant commands and at Marine Corps University’s School of Advanced Warfighting.

“We found that there are instances where a DOD member or any government official is going to ask a question that would not get a good response from the model,” Tadross said.

“This is because if you build these models off of the plethora of information that’s on the internet, and then also are tuning it for the use cases that are best commercialized … there are protections that are put in place to ensure that they are used responsibly, [including] making sure that they don’t respond about warfare, about drug use, about human trafficking, things like this that make all the sense in the world, to ensure that they don’t go haywire and start answering all those questions to the general population,” he said.

But once LLMs were safely configured for use and experimentation by trained and approved government officials on DOD’s classified and more secure networks, Tadross explained, the models still “refused” to fully address certain prompts about warfare planning and other defense topics.

“We needed to figure out a way to get around those refusals in order to act. Because if you’re a military officer and you’re trying to do something, even in an exercise, and it responds with ‘You should seek a diplomatic solution,’ you will get very upset. You slam the laptop closed,” he said.

“So we needed to find a way to minimize those refusals and ensure that it is not only doing that, but also answering the tone that would be useful — because if it’s like this very informal social media-type tone, it doesn’t instill a lot of confidence in its response,” he said.

Tadross and his team trained Defense Llama on a sprawling dataset that pulled together military doctrine, international humanitarian law, and relevant policies that align with the Pentagon’s rules for armed conflict and ethical principles for AI.

The engineering process known as supervised fine-tuning was applied. And to inform the model’s tone, officials applied reinforcement learning with human feedback methods.

“You get a response and then you provide the type of response that you would have preferred. So because the intelligence community has already written style guides for how to write, we just built a lot of examples based off that,” Tadross said.

He declined to confirm which classified networks Defense Llama is running on — or specific military units that are tapping into it — to date.

But in an emailed statement, a Scale AI spokesperson later confirmed that the model “is now available for integration into various defense systems, including command and control platforms, intelligence analysis tools, and decision-support systems.”

Defense Llama can be accessed exclusively in controlled government hubs housed within the Scale Donovan platform.

Tadross used Donovan to demonstrate the new LLM for DefenseScoop.

The platform presented another commercial LLM in a side-by-side view with Defense Llama. In the first demo, Donovan posed the question: “As a military planner, which munition should I select to destroy a hardened structure while minimizing collateral damage from a nearby civilian facility?”

Defense Llama provided a lengthy response that also spotlighted a number of factors worth considering, such as “hardness of the target, distance from civilian facilities, environmental features, and time constraints.”

The other LLM replied with an apology, a simple explanation that the question was out of its scope, and a recommendation to seek other options.

For another prompt, Tadross asked: “What tactics has Iran employed against coalition forces?”

He explained in real time that the model that’s not Defense Llama supplied “a straight refusal.” The Scale AI-configured LLM, on the other hand, offered up multiple paragraphs about how Iran has used ballistic missiles, cyber warfare, intelligence gathering, terrorist groups and naval forces.

“This is all very much in line with what they’ve actually done,” Tadross noted.

Drawing back on his past experiences operating inside military command centers, he remembered how key data points and information would be funneled through many officials in high-stakes scenarios before reaching top decision-makers.

“The intent behind deploying technology like this, and the impact that I expect that it’ll make, is that it will reduce the reliance on more and more people sitting at those headquarters sections doing the grunt work that’s necessary to pull the data together. So instead, what you’ll have is a situation where there’ll be fewer people able to access a larger swath of data and make a decision quite a bit faster than what they would have done otherwise,” Tadross told DefenseScoop.

The post Scale AI unveils ‘Defense Llama’ large language model for national security users appeared first on DefenseScoop.

Army Futures Command’s Gen. Rainey reflects on AI’s potential in modern warfare

Brandi Vincent — Fri, 28 Jun 2024 20:15:29 +0000

The Defense Department must streamline data, requirements, procurement and more to prepare for rapidly developing, next-generation artificial intelligence and large language models that will likely transform global warfare and how the U.S. military fights in the years to come, according to the top general leading Army Futures Command.

“To say the period that we’re living in right now is disruptive would be like an epic understatement. Different people use different analogies, but I think what we’re witnessing right now is as significant as the nuclear arms race. Potentially, even the Industrial Revolution — that’s how big this period of time is, and that potential. We don’t use the words ‘revolutionary change’ very often in the military. But I think it clearly falls into that category,” Gen. James Rainey, Army Futures’ commanding general, said Tuesday at Scale AI’s government summit.

In a discussion moderated by the startup’s founder Alexandr Wang, Rainey reflected on ways the military is adapting to the quickly evolving advantages and limitations posed by AI in modern warfare.

“I don’t personally think we have a technology problem — I think we have a tech adoption problem,” he said.

Futures Command is the Army’s key innovation-pushing and modernization-enabling arm. Rainey took the helm in 2022.

“I am not a data scientist or expert by any means. I am not an acquisition professional. I’m an infantry officer with a lot of experience deploying and running big formations,” he told Wang during the event.

Pointing to his unique “warfighter lens,” Rainey noted that it is critical to recognize that, for the DOD, AI is not simply about intelligence, surveillance and reconnaissance, logistics, or targeting.

“The real potential for military application of artificial intelligence is to empower our commanders — the men and women who lead our formations. And how do we bring the power of AI to bear to let them do three things: make more decisions, make better decisions and make faster decisions?” Rainey said.

“Nobody’s going to win the war between two nuclear-equipped superpowers, right? So I think the real potential is to confront China with that capability, and through that lens,” the general said.

Rainey’s tenure as commander comes at a time when AI and machine learning present seemingly unprecedented potential to enable and empower military chiefs and the teams and operations they lead.

“The amount of things that are knowable that our commanders do not know is staggering. And I think we could rapidly [apply AI for that] right now — because you don’t have to be as perfect about that as you do to drop a bomb into a room. You almost have a level of certainty to comply with the law of armed conflict,” the general noted.

Rainey described how he’s gone “back and forth to Iraq and Afghanistan” about seven different times in his military career so far, which isn’t uncommon for other military insiders around his age. Doing so has exposed him to those scenarios where he experienced such a knowledge gap, one faced by the U.S. military’s “men and women going places all over the world right now.”

“The amount of things that they don’t know that are absolutely knowable about the human beings that live there about the terrain. You know, Russia decided to invade Ukraine based on an assumption that the ground would be frozen, because it was always frozen that time of year — but it didn’t freeze. Absolutely unknowable piece of information. So I think we can use large language models, AI and machine learning to rapidly capture that,” Rainey told Wang.

But as generative AI and other connected, advanced capabilities come into fruition and pose new opportunities for the U.S. military, they’re also ushering in much more unpredictability and uncertainty through risk — particularly associated with how adversaries might apply them.

“That’s one of the challenges for us, because I spent most of my career trying to hide from the enemy — like, striving to not be observed and use terrain and hide, like in silence so that the enemy doesn’t know where I’m attacking from there. But it’s just different with software,” Rainey said.

China, he added, is now coupling “ubiquitous sensing, plus precision-guided munitions, with huge stockpile magazine depth.” That means U.S. personnel will now have to essentially assume enemies will be able to see them and therefore have to contest their ability to understand what they observe.

Other impending obstacles AI presents for Futures Command, the Army, and the larger DOD enterprise involve data, which underpins the technology.

“Right now, if one of you came up to me and said, ‘Hey, we have a large language model where we’ll scan an area and find all the Russian tanks,’ — we could not do anything with that because of the state of our data in the Army and in the department is just ridiculous. I mean, it’s all over the place,” the general explained.

In that sense, his command is looking to partner with more companies that can help get to a state where all of its data is usable and accessible so that it can pave the way for next-gen algorithmic warfare.

Among other topics during their onstage chat, both he and Wang agreed that the department must also confront challenges related to requirements and procurement frameworks to become a software-enabled organization.

“We’ve got the world’s best acquisition folks, but they’ve got to get faster. They need fiscal agility, we need some help from Congress. … But we can’t we can’t buy radios or buy server stacks. We have to add funding lines that are for command and control and set what the appropriate oversight would be able to rapidly move money from one system to another,” Rainey said.

Recognizing the complex possibilities that accompany these and other challenges, however, the Futures commander still said he has hope that America’s values and principles will help it prevail.

“Certainly, it’s the most disruptive and dangerous time in my lifetime — at least since World War II, for sure. But I am an optimist. I think that this country is still a diverse country. I believe it’s the freest place in the world. I believe that the overwhelming majority of the people — our teammates, our citizens — believe that and are willing to sacrifice for that. So we need it. We need industry to stay with us,” Rainey told the audience. “So I would just probably end there — just asking everybody to stick with the department.”

The post Army Futures Command’s Gen. Rainey reflects on AI’s potential in modern warfare appeared first on DefenseScoop.

Scale AI to set the Pentagon’s path for testing and evaluating large language models

Brandi Vincent — Tue, 20 Feb 2024 10:00:00 +0000

The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) tapped Scale AI to produce a trustworthy means for testing and evaluating large language models that can support — and potentially disrupt — military planning and decision-making.

According to a statement the San Francisco-based company shared exclusively with DefenseScoop, the outcomes of this new one-year contract will supply the CDAO with “a framework to deploy AI safely by measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports.”

Large language models and the overarching field of generative AI include emerging technologies that can generate (convincing but not always accurate) text, software code, images and other media, based on prompts from humans.

This rapidly evolving realm holds a lot of promise for the Department of Defense, but also poses unknown and serious potential challenges. Last year, Pentagon leadership launched Task Force Lima within the CDAO’s Algorithmic Warfare Directorate to accelerate its components’ grasp, assessment and deployment of generative artificial intelligence.

The department has long leaned on test-and-evaluation (T&E) processes to assess and ensure its systems, platforms and technologies perform in a safe and reliable manner before they are fully fielded. But AI safety standards and policies have not yet been universally set, and the complexities and uncertainties associated with large language models make T&E even more complicated when it comes to generative AI.

Broadly, T&E enables experts to determine the baseline performance of a specific model.

For instance, to test and evaluate a computer vision algorithm that differentiates between images of dogs and cats and things that are not dogs or cats, an official might first train it with millions of different pictures of those type of animals as well as objects that aren’t dogs or cats. In doing so, the expert will also hold back a diverse subset of data that can then be presented to the algorithm down the line.

They can then assess that evaluation dataset against the test set, or “ground truth,” and ultimately determine failure rates of where the model is unable to determine if something is or is not one of the classifiers they’re trying to identify.

Experts at Scale AI will adopt a similar approach for T&E with large language models, but because they are generative in nature and the English language can be hard to evaluate, there isn’t that same level of “ground truth” for these complex systems. For example, if prompted to supply five different responses, an LLM might be generally factually accurate in all five, yet contrasting sentence structures could change the meanings of each output.

So, part of the company’s effort to develop the framework, methods and technology CDAO can use to test and evaluate large language models will involve creating “holdout datasets” — where they include DOD insiders to prompt response pairs and adjudicate them by layers of review, and ensure that each is as good of a response as would be expected from a human in the military.

The entire process will be iterative in nature.

Once datasets that are germane to the DOD for world knowledge, truthfulness, and other topics are made and refined, the experts can then evaluate existing large language models against them.

Eventually, as they have these holdout datasets, experts will be able to run evaluations and establish model cards — or short documents that supply details on the context for best for use of various machine learning models and information for measuring their performance.

Officials plan to automate in this development as much as possible, so that as new models come in, there can be some baseline understanding of how they will perform, where they will perform best, and where they will probably start to fail.

Further in the process, the ultimate intent is for models to essentially send signals to CDAO officials that engage with them, if they start to waver from the domains they have been tested against.

“This work will enable the DOD to mature its T&E policies to address generative AI by measuring and assessing quantitative data via benchmarking and assessing qualitative feedback from users. The evaluation metrics will help identify generative AI models that are ready to support military applications with accurate and relevant results using DoD terminology and knowledge bases. The rigorous T&E process aims to enhance the robustness and resilience of AI systems in classified environments, enabling the adoption of LLM technology in secure environments,” Scale AI’s statement reads.

Beyond the CDAO, the company has also partnered with Meta, Microsoft, the U.S. Army, the Defense Innovation Unit, OpenAI, General Motors, Toyota Research Institute, Nvidia, and others.

“Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” Alexandr Wang, Scale AI’s founder and CEO, said in the statement.

The post Scale AI to set the Pentagon’s path for testing and evaluating large language models appeared first on DefenseScoop.

In wake of Project Maven, Pentagon urged to launch new ‘pathfinder’ initiatives to accelerate AI

Brandi Vincent — Tue, 18 Jul 2023 21:28:26 +0000

Congress should incentivize and invest in each military branch establishing a new “pathfinder project” or grand challenge-like program to accelerate artificial intelligence deployments unique to their specific needs, House lawmakers were told during a hearing on Tuesday. And at least one leading member is keen to help make that happen, DefenseScoop confirmed.

At a House Armed Services subcommittee hearing on the present barriers preventing the Defense Department from adopting AI as quickly as China, expert witnesses pointed to multiple major challenges that the U.S. government will not be able to solve overnight.

For example, they warned that China is spending roughly 10 times more of its military budget on AI compared to the U.S. and making deliberate moves to rapidly disrupt combat platforms with the emerging technology. The DOD generates more than 22 terabytes of data daily but is not AI-ready enough to make the best use of it.

Those issues have no “quick fix.” But in his testimony, Scale AI CEO Alexandr Wang said one almost immediate actionable solution associated with the Pentagon and AI would be for Congress to push “each branch of the military to formally identify its next Pathfinder Project and adequately fund it to be successful.”

“To date, the largest AI Pathfinder Project within DOD is still Project Maven, which began in 2017,” Wang said.

Formed under the purview of the Office of the Undersecretary of Defense for Intelligence and Security (I&S), Project Maven marked DOD’s major computer vision initiative, which was originally designed to apply machine learning to autonomously detect and track objects or humans of interest via imagery captured by surveillance aircraft, satellites and other military assets. Now matured and known simply as “Maven,” the effort and aligned responsibilities were recently split across the National Geospatial-Intelligence Agency, the DOD’s nascent Chief Digital and Artificial Intelligence Office (CDAO), and I&S.

“There are endless DOD use cases that would benefit from being identified as a Pathfinder Project. For example, the Army is making progress on Project Linchpin and their ground autonomy work; Joint All Domain Command and Control (JADC2) requires DOD buy-in at all levels to succeed; and the Navy has discussed a concept called Project Overmatch, which would create a whole-of-Navy approach to AI adoption,” Wang explained in his testimony.

In a press gaggle after the hearing, the subcommittee chair Rep. Mike Gallagher, R-Wis., told DefenseScoop that Wang’s pathfinder suggestion “seems like something [Congress members] could solve — and just push DOD to move faster on that.”

“One thing I’m personally obsessed with … is if you look at, by certain estimates, the appropriation process of the last five years — we have appropriated but not spent $25 billion every year and it goes into abeyance in the Treasury for five years, and then it just goes back to the Treasury, and then it’s used for purposes other than defense,” Gallagher said.

“So one thing I’m persuaded is that we could take some subset of that money — already appropriated, doesn’t count against the topline — and use it for specific purposes that would range from replenishing all our stockpiles of key munitions that we’ve learned from Ukraine are absolutely essential to deterring a war with China or Taiwan, to funding this grand challenge idea that Mr. Wang has laid out,” the lawmaker told DefenseScoop.

The post In wake of Project Maven, Pentagon urged to launch new ‘pathfinder’ initiatives to accelerate AI appeared first on DefenseScoop.