large language model (LLM) Archives | DefenseScoop https://defensescoop.com/tag/large-language-model-llm/ DefenseScoop Wed, 26 Mar 2025 14:34:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 https://defensescoop.com/wp-content/uploads/sites/8/2023/01/cropped-ds_favicon-2.png?w=32 large language model (LLM) Archives | DefenseScoop https://defensescoop.com/tag/large-language-model-llm/ 32 32 214772896 DISA launching experimental cloud-based chatbot for Indo-Pacific Command https://defensescoop.com/2025/03/25/disa-siprgpt-chatbot-indopacom-joint-operational-edge-cloud/ https://defensescoop.com/2025/03/25/disa-siprgpt-chatbot-indopacom-joint-operational-edge-cloud/#respond Tue, 25 Mar 2025 21:51:56 +0000 https://defensescoop.com/?p=109404 The platform will be deployed in the coming months at Indo-Pacom via DISA's Joint Operational Edge cloud environment.

The post DISA launching experimental cloud-based chatbot for Indo-Pacific Command appeared first on DefenseScoop.

]]>
The Defense Information Systems Agency is preparing to introduce a new platform in one of its overseas cloud environments that will allow users to test a generative artificial intelligence tool on classified networks, according to a defense official.

Pending accreditation, the chatbot will be deployed to U.S. Indo-Pacific Command and allow users to experiment with genAI models on the Secure Internet Protocol Router (SIPRNet), Jeff Marshall, director of DISA’s Hosting and Compute Center, said during a webinar broadcast Tuesday by Federal News Network. The platform is currently in the accreditation stage and is expected to open up “within the next month or so,” Marshall noted.

The capability was developed in close collaboration with the Air Force Research Lab, which launched its own experimental generative AI chatbot for the Department of the Air Force on unclassified networks — dubbed NIPRGPT — last year. Similar to AFRL’s program, AFRL and DISA are using the effort to evaluate and expedite delivery of commercial AI tools, but the agency’s initiative will be in classified realms, Marshall said.

“We’re not trying to deploy this on our own. We’re not trying to make it a production system. This is [a research-and-development] system that we’re using for Indo-Pacom in order to test large language models overseas,” he said.

Across the Pentagon, organizations have looked to capitalize on commercial large language models and other artificial intelligence capabilities. Although there have been various efforts over the last few years — ranging from task forces to experimental platforms — the department is still learning how the technology can be best used to improve back-office and tactical operations.

Marshall noted that DISA’s SIPR-based LLM will largely help “facilitate that demand signal of, what does an Indo-Pacom commander need and want to utilize AI for? And then, how do we then shape that to what industry can actually provide for us at scale?”

DISA plans to host the chatbot on one of the two Joint Operational Edge (JOE) cloud environments it has deployed to the Pacific. Initiated in 2023, the JOE cloud effort seeks to stand up commercial cloud environments at the agency’s overseas data centers, allowing DISA to place cloud-native applications in locations outside of the continental United States. Along with JOE, the agency is also providing its private cloud capability known as Stratus to areas overseas.

To date, DISA has put two JOE cloud nodes at Indo-Pacom and one at U.S. European Command, and will soon deploy another node in Southwest Asia, Marshall said.

Moving forward, DISA is looking to potentially provide additional JOE cloud environments in Europe in order to support operations for U.S. Africa Command, which is headquartered in Germany. But Marshall emphasized the agency is doing so while balancing demand signals with available resources.

“Let’s don’t just throw it all out there one time and hope that it sticks to the wall,” he said. “We’re taking in the demand signal, we’re making sure that there is a valid need that supports us doing the deployment and then, of course, there’s a budget to cover it.”

Updated on March 26, 2025, at 10:35 AM: This story has been updated to clarify AFRL’s role in the new chatbot initiative and to remove “acting” from Jeff Marshall’s job title.

The post DISA launching experimental cloud-based chatbot for Indo-Pacific Command appeared first on DefenseScoop.

]]>
https://defensescoop.com/2025/03/25/disa-siprgpt-chatbot-indopacom-joint-operational-edge-cloud/feed/ 0 109404
Palantir partners with data-labeling startup to improve accuracy of AI models https://defensescoop.com/2025/02/05/palantir-enabled-intelligence-partnership-foundry/ https://defensescoop.com/2025/02/05/palantir-enabled-intelligence-partnership-foundry/#respond Wed, 05 Feb 2025 20:45:57 +0000 https://defensescoop.com/?p=106126 The partnership strives to improve the overall quality of artificial intelligence models by using high-quality, well-labeled data.

The post Palantir partners with data-labeling startup to improve accuracy of AI models appeared first on DefenseScoop.

]]>
Defense tech company Palantir and startup Enabled Intelligence announced a new partnership aimed at enhancing the quality of data needed to train artificial intelligence models used by organizations within the Defense Department and Intelligence Community.

Under the agreement, federal customers using Palantir’s Foundry system — a software-based data analytics platform that leverages AI and machine learning to automate decision-making — will be able to request data labeling services from Enabled Intelligence. The goal of the partnership is to improve the accuracy of custom AI models built by users by providing them with higher-quality datasets to create and test them with.

“By bringing the Palantir Platform and Enabled Intelligence’s labeling services together in highly secured environments, we believe this will streamline the full cycle of AI model creation and deployment, ensuring that our clients can leverage more precise and actionable insights from their data,” Josh Zavilla, head of Palantir’s national security arm, told DefenseScoop in a statement.

Enabled Intelligence employs a cadre of experts dedicated to annotating multiple data types — including satellite imagery, video, audio, text and more — at a much faster rate than other players in the market, the company’s CEO Peter Kant told DefenseScoop. The impetus for starting Enabled Intelligence came from a gap in the government’s access to accurately labeled data that it needs to train AI models, he said.

“We focus a lot on the quality and the accuracy of the data,” Kant said in an interview. “The better quality of the labeled data, the better and more reliable the AI model is going to be.”

Through the new partnership, government customers are now able to send specific datasets that may need additional labeling directly to Enabled Intelligence’s analysts, Kant explained. Once the data is annotated, the company can push it back to the original users through Foundry so that it can be used to build more accurate artificial intelligence models.

“It’s fully integrated into our labeling pipeline, so we automatically create labeling campaigns to the right people — our employees who know that ontology and know how to do that work with that phenomenology — [and] label it there within Foundry,” Kant said.

The company’s services would be particularly beneficial if a U.S. adversary or rogue actor begins deploying new capabilities that aren’t already included on a training dataset. For example, if American sensors capture imagery indicating that Houthi fighters are using a new small commercial drone as an attack vector, AI models developed for the Maven Smart System or other similar programs might not initially have the right data to support an appropriate response, Kant explained.

While improving the quality of AI has clear advantages for users, Kant emphasized that it can also reduce the overall power needed to run those models. He pointed to the open-source large language model (LLM) developed in China, known as DeepSeek, and claims by its developers that the platform’s performance is comparable to Open AI’s ChatGPT or Google’s Gemini with only a fraction of compute — partly because its developers focused on training data that was well labeled.

“Our customers — especially on the defense and intelligence side — say, ‘Hey, we’re trying to do AI at the edge, or we’re trying to do analysis at the edge.’ You can’t put 1600 GPUs on a [MQ-1 Predator drone], so how do we do this?” Kant said. “One of the ways of doing that has been to really focus on making sure that the data going in is of high quality and can be moved around easily.”

The ability to run AI models with less compute would be particularly beneficial for operators located in remote environments, where it can be difficult to build the necessary infrastructure needed to power them, he added. 

“Now we want to use [LLMs] for some real critical systems activities for these missions, and the recognition that the data that goes in and how it’s used to train [AI] and how good it is, it’s been critical — not just in terms of reliability, but also how much compute we need,” Kant said.

The post Palantir partners with data-labeling startup to improve accuracy of AI models appeared first on DefenseScoop.

]]>
https://defensescoop.com/2025/02/05/palantir-enabled-intelligence-partnership-foundry/feed/ 0 106126
Former Space Force CTIO joins advisory board for artificial intelligence startup Seekr https://defensescoop.com/2025/01/22/lisa-costa-seekr-ai-former-space-force-ctio/ https://defensescoop.com/2025/01/22/lisa-costa-seekr-ai-former-space-force-ctio/#respond Wed, 22 Jan 2025 14:00:00 +0000 https://defensescoop.com/?p=104919 As a member of Seekr’s advisory board, Costa will help “address the critical need for commercial-grade AI solutions vetted for government use," according to the company.

The post Former Space Force CTIO joins advisory board for artificial intelligence startup Seekr appeared first on DefenseScoop.

]]>
Former Space Force Chief Technology and Innovation Officer Lisa Costa has been appointed to the advisory board for artificial intelligence startup Seekr, the company announced Wednesday.

Costa served as the Space Force’s first-ever CTIO from 2021 until her retirement from federal government service in June 2024. During her tenure, she was responsible for strategies and policies aimed at advancing the military branch’s research and development of critical emerging technologies — such as AI and machine learning, digital training environments and IT infrastructure. 

“I’m proud to join Seekr and collaborate with a team that shares my vision for trusted AI,” Costa said in a statement. “Seekr addresses critical AI development needs, including explainability and data security, enabling government agencies to launch mission-critical applications simply and securely. I am excited to bring my 35 years of high-stakes federal and commercial experience to unlock AI’s full potential for government.”

A career technologist, Costa has over three decades of experience in advocating for emerging science and tech across both government and industry. Prior to her role as Space Force CTIO, she held positions at U.S. Special Operations Command, MITRE and Engility Corp.

At the Space Force, she helped develop and modernize the organization’s cloud-based repository for space domain awareness data known as the Unified Data Library. Notably, she worked to upgrade the system with AI and ML tools in order to streamline the service’s access to critical government and commercial sensor data.

As a member of Seekr’s advisory board, Costa will help “address the critical need for commercial-grade AI solutions vetted for government use” with an emphasis on the defense sector, according to a company press release.

The Virginia-based startup specializes in helping users address accuracy, bias and transparency concerns while building custom AI models.

The company’s flagship SeekrFlow platform allows government agencies to develop trustworthy large language models by offering additional tools that scan and grade information that the models are being trained on. Last year, the platform passed assessment by the Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) and is available for purchase via the online Tradewinds Solutions Marketplace.

“We’re thrilled to welcome Dr. Lisa Costa, whose deep defense and national security expertise will accelerate our efforts to deliver advanced AI solutions for government, solving previously intractable problems dealing with complex datasets such as satellite and UAV imagery,” Seekr President Rob Clark said in a statement.

The post Former Space Force CTIO joins advisory board for artificial intelligence startup Seekr appeared first on DefenseScoop.

]]>
https://defensescoop.com/2025/01/22/lisa-costa-seekr-ai-former-space-force-ctio/feed/ 0 104919
Via genAI pilot, CDAO exposes ‘biases that could impact the military’s healthcare system’ https://defensescoop.com/2025/01/03/cdao-genai-pilot-llm-cairt-exposes-biases-could-impact-military-healthcare-system/ https://defensescoop.com/2025/01/03/cdao-genai-pilot-llm-cairt-exposes-biases-could-impact-military-healthcare-system/#respond Fri, 03 Jan 2025 20:43:30 +0000 https://defensescoop.com/?p=104051 The Pentagon's AI hub is now producing a playbook for other Defense Department components, which is informed by this work.

The post Via genAI pilot, CDAO exposes ‘biases that could impact the military’s healthcare system’ appeared first on DefenseScoop.

]]>
The Pentagon’s Chief Digital and AI Office recently completed a pilot exercise with tech nonprofit Humane Intelligence that analyzed three well-known large language models in two real-world use cases aimed at improving modern military medicine, officials confirmed Thursday.

In its aftermath, the partners revealed they uncovered hundreds of possible vulnerabilities that defense personnel can account for moving forward when considering LLMs for these purposes.

“The findings revealed biases that could impact the military’s healthcare system, such as bias related to demographics,” a Defense Department spokesperson told DefenseScoop.

They wouldn’t share much more about what was exposed, but the official provided new details about the design and implementation of this CDAO-led pilot, the team’s follow-up plans and the steps they took to protect service members’ privacy while using applicable clinical records. 

As the name suggests, large language models essentially process and generate language for humans. They fall into the buzzy, emerging realm of generative AI

Broadly, that field encompasses disruptive but still-maturing technologies that can process huge volumes of data and perform increasingly “intelligent” tasks — like recognizing speech or producing human-like media and code based on human prompts. These capabilities are pushing the boundaries of what existing AI and machine learning can achieve. 

Recognizing the potential for both major opportunities and yet-to-be-known threats, the CDAO has been studying genAI and coordinating approaches and resources to help DOD to deploy and experiment with it in a “responsible” manner, officials say.

After recently sunsetting the genAI-exploring Task Force Lima, the office in mid-December launched the Artificial Intelligence Rapid Capabilities Cell to accelerate the delivery of proven and new capabilities across DOD components.

The CDAO’s latest Crowdsourced AI Red-Teaming (CAIRT) Assurance Program pilot, which focused on tapping LLM chatbots with the aim of enhancing military medicine services, “is complementary to the [cell’s] efforts to hasten the adoption of generative AI within the department,” according to the spokesperson.

They further noted that the CAIRT is one example of CDAO-run programs intended “to implement new techniques for AI Assurance and bring in a wide variety of perspectives and disciplines.” 

Red-teaming is a resilience methodology for applying adversarial techniques to internally test systems’ robustness. For the recent pilot, Humane Intelligence crowdsourced red-teaming for clinical note summarization and a medical advisory chatbot — marking two prospective use cases in the context of contemporary military medicine.

“Over 200 participants, including clinical providers and healthcare analysts from [the Defense Health Agency], the Uniformed Services University of the Health Sciences, and the Services, participated in the exercise, which compared three popular LLMs. The exercise uncovered over 800 findings of potential vulnerabilities and biases related to employing these capabilities in these prospective use cases,” officials wrote in a DOD release published Thursday. 

When asked to disclose the names and makers of the three LLMs that were leveraged, the DOD spokesperson told DefenseScoop: “The identities of the large language models (LLMs) used in the study were masked to prevent bias and ensure data anonymity during the evaluation.”

The team carefully designed the exercise to minimize selection bias, gather meaningful data, and protect the privacy of all participants. Plans for the pilot also underwent thorough internal and external reviews to ensure its integrity before it was conducted, according to the official.

“Once announced, providers and healthcare analysts from the Military Health System (MHS) who expressed interest were invited to participate voluntarily. All participants received clear instructions to generate interactions that simulated real-world scenarios in Military Medicine, such as summarizing patient records or seeking clinical advice, ensuring the use of fictional cases rather than actual patient data,” the spokesperson said.

“Multiple measures were implemented to ensure the privacy of participants, including maintaining the anonymity of providers and healthcare analysts involved in the exercise,” they added. 

The DOD announcement suggests that certain learnings in this pilot will play a major role in shaping the military’s policies and best practices for responsibly using genAI. 

The exercise is set to “result in repeatable and scalable output via the development of benchmark datasets, which can be used to evaluate future vendors and tools for alignment with performance expectations,” officials wrote. 

Furthermore, if — “when fielded” — these two use cases are deemed to be covered AI as defined in the recent White House national security memo governing federal agencies’ pursuits of the technology, officials noted that “they will adhere to all required risk management practices.”

Inside the Pentagon’s top AI hub, officials are now scoping out new programs and partnerships for CAIRT-related efforts that make sense within the department and other federal partners. 

“CDAO is producing a playbook that will enable other DOD components to set up and run their own crowdsourced AI assurance and red teaming programs,” the spokesperson said.

DefenseScoop has reached out to Humane Intelligence for comment.

The post Via genAI pilot, CDAO exposes ‘biases that could impact the military’s healthcare system’ appeared first on DefenseScoop.

]]>
https://defensescoop.com/2025/01/03/cdao-genai-pilot-llm-cairt-exposes-biases-could-impact-military-healthcare-system/feed/ 0 104051
‘One-two punch’: Inside NGA’s approach to exploring powerful next-gen AI https://defensescoop.com/2024/11/08/one-two-punch-inside-nga-approach-exploring-next-gen-ai/ https://defensescoop.com/2024/11/08/one-two-punch-inside-nga-approach-exploring-next-gen-ai/#respond Fri, 08 Nov 2024 19:42:49 +0000 https://defensescoop.com/?p=100968 In an interview this week, the agency’s first-ever Chief AI Officer shared new details about an early pursuit to train a cutting-edge model.

The post ‘One-two punch’: Inside NGA’s approach to exploring powerful next-gen AI appeared first on DefenseScoop.

]]>
Analysts and technologists at the U.S. government’s top mapping agency are starting to cautiously experiment with emerging large language models and other disruptive generative AI capabilities to enhance their production of assets that inform military operations, according to a senior official leading that work.

In an interview with DefenseScoop this week, the National Geospatial-Intelligence Agency’s new and first-ever Chief AI Officer Mark Munsell shared initial details about one ongoing pursuit to train a cutting-edge model and shed light on his approach to steering NGA’s early adoption of the still-uncertain technology.

“We could not have predicted some of these inventions with transformers, like [generative pre-trained transformer or GPT] and stuff, and because of that and the trajectory of those, I think certainly we’re going to be living in a better world. But for the first time, we’ll have to really guard against misuse,” Munsell said.

As the CAIO suggested, genAI, geoAI and associated frontier models are part of a rapidly evolving field of technologies that are not fully understood, but are pushing the boundaries of what existing AI and machine learning can accomplish. Typically, such tech can process massive volumes of data and perform increasingly “intelligent” tasks like recognizing speech or generating human-like media and code when prompted.

These capabilities hold a lot of promise to dramatically enhance how the agency’s analysts detect and make sense of objects and activities they are tracking across sprawling data sources, inside NGA’s world that revolves around capturing and deciphering geospatial intelligence about movements and happenings all around the globe.

Munsell is expressly determined in his early months as the new AI chief to set the agency on a clear path for responsibly exploring and adopting powerful frontier models in their day-to-day operations.

However, one immediate challenge he said his team is confronting has to do with the fact that the major companies developing these next-generation models to date have not prioritized geographic use cases that would impact NGA’s work.

“So, we have a big role to play, I think, there on behalf of the country — and really, potentially on behalf of the world — which is to ask these companies to focus on certain capabilities that don’t exist yet, or that the models do not do well yet,” Munsell explained. 

He provided several examples to demonstrate this issue, particularly when it comes to existing computer vision technologies.

“How well the models can identify things on a photo —  that is super important, and we want companies to do that better,” he noted.

Beyond that realm, Munsell said, modern large language models are learning to generate GEOINT assets, like graphs and graphics. But, in his view, these systems are not yet highly skilled at understanding geography and critical features, like longitude and latitude coordinates. 

“Simple things, like depiction of boundaries or understanding certain geographic locations. Today, it’s all based on maybe words in a gazetteer, or words in an encyclopedia, or words in an atlas. A lot of the training has been like that. So there’s a lot to do to turn these large models into things that are geographically aware. NGA will have a big part to play in that,” Munsell said. 

During the interview, the CAIO also offered DefenseScoop the first preview of an initial generative AI experiment that agency insiders are pursuing. 

“One of the examples we’ll have soon, early next year, is we’re doing a retrieval augmented generation — so people just use the term the RAG — implementation of a large language model that’s being trained … on every NGA report ever written,” he said.

Each day, agency officials compile detailed documents on intelligence activities around the world, such as adversary undertakings at specific locations.

“Essentially, [we’ll] be able to ask it any question on any report that’s ever been written, and it [could] have the depth of knowledge and understanding of analysts that worked on accounts for 30 years,” Munsell explained. 

The longtime technologist joined the Defense Mapping Agency (which evolved into NGA) for the first time roughly 30 years ago, as a software engineer in 1995.

Notably, he’s part of a very small percentage of senior executives in the government who have around a million lines of software code under their belts.

“For example, I wrote the system that NGA used up until last year to produce all its aeronautical information that goes into all of the DOD aircraft. That was my first assignment here at NGA,” he noted. 

After that, Munsell spent some time in the private sector. He returned to NGA in the early 2000s and has been rising through the tech ranks there ever since. In recent years, he helped launch the Data and Digital Innovation Directorate — a hub he’s continuing to lead in his now dual-hatted role as CAIO.

“It made logical sense for the agency and for the director to appoint the director of that component as the chief AI officer,” he said.

Munsell has been taking it all in as AI and machine learning have intensely evolved over the course of his career. 

Looking back on his first stint at NGA, he said he didn’t totally anticipate the emergence of generative AI. And while he believes it will improve the human experience, he also repeatedly pointed to the need to adopt these yet-to-be-fully-understood technologies responsibly.

“It’s really kind of a one-two punch. On one hand, you’re going to promote these great inventions that will do wonderful things for for the world. And on the other hand, you do really have to watch and protect people from misusing these. So, that’s … how I look at my responsibilities. It’s two gloves,” Munsell said. 

“On one hand, I’m promoting the use — the proper use — because we know it’s good and better. And on the other hand, we’re checking and we’re ensuring that people are doing right by this technology,” the CAIO told DefenseScoop. 

The post ‘One-two punch’: Inside NGA’s approach to exploring powerful next-gen AI appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/11/08/one-two-punch-inside-nga-approach-exploring-next-gen-ai/feed/ 0 100968
Scale AI unveils ‘Defense Llama’ large language model for national security users https://defensescoop.com/2024/11/04/scale-ai-unveils-defense-llama-large-language-model-llm-national-security-users/ https://defensescoop.com/2024/11/04/scale-ai-unveils-defense-llama-large-language-model-llm-national-security-users/#respond Mon, 04 Nov 2024 22:59:03 +0000 https://defensescoop.com/?p=100717 DefenseScoop got a live demo of the new tool, which already boasts experimental and operational use cases in classified networks.

The post Scale AI unveils ‘Defense Llama’ large language model for national security users appeared first on DefenseScoop.

]]>
Credentialed U.S. military and national security officials are experimenting and engaging in multiple classified environments with Defense Llama — a powerful new large language model that Scale AI configured and fine-tuned over the last year from Meta’s Llama 3 LLM — to adopt generative AI for their distinctive missions, like combat planning and intelligence operations.

Dan Tadross, Scale AI’s head of federal delivery and a Marine Corps reservist, briefed DefenseScoop on the making and envisioned impacts of this new custom-for-the-military model in an exclusive interview and technology demonstration on Monday.

“There are already some users from combatant commands and other military groups that are able to leverage this on certain networks,” he explained at Scale AI’s office in Washington. 

Large language models and the overarching field of generative AI encompass emerging and already-disruptive technologies that can produce (convincing but not always accurate) text, software code, images and other media — based on human prompts. 

This quickly evolving realm presents major opportunities for the Defense Department, while simultaneously posing uncertain and serious potential challenges. 

Last year, Pentagon leadership formed a temporary task force to accelerate DOD components’ grasp, oversight and deployments of generative AI. More recently, the department and other agencies were delivered new directives regarding pursuing the advanced technology in various provisions in the Biden administration’s new National Security Memo (NSM) on AI issued last month.

“We are still looking at ways to provide more enterprise support, especially as things like the NSM that was just released. That’s one of the areas that we’re leaning forward on being able to try and help support the DOD’s adoption of this technology, again, in a responsible manner,” Tadross said. 

Notably, Scale AI’s demo occurred the same day that Meta revealed that it’s making its Llama models available to U.S. government agencies — and explicitly those that are working on defense and national security applications — with support from other commercial partners including Scale AI. Also on Monday, OpenAI unveiled its first limited ChatGPT Enterprise partnership with DOD, which will enable its generative capabilities’ use on unclassified systems and data.

These announcements follow research and reports that recently surfaced suggesting that Chinese researchers linked to the People’s Liberation Army applied Meta’s open source Llama model to create an AI asset that presents the possibility for military applications. 

“There’s always a concern [about] the risk appetite. My perspective on this is that the risk of not adopting these technologies is actually greater than adopting them in a measured and responsible way,” Tadross told DefenseScoop.  

In some ways, he said, Scale AI’s Defense Llama stems from the company’s still-unfolding test and evaluation and other experimental efforts with DOD partners in combatant commands and at Marine Corps University’s School of Advanced Warfighting. 

“We found that there are instances where a DOD member or any government official is going to ask a question that would not get a good response from the model,” Tadross said.

“This is because if you build these models off of the plethora of information that’s on the internet, and then also are tuning it for the use cases that are best commercialized … there are protections that are put in place to ensure that they are used responsibly, [including] making sure that they don’t respond about warfare, about drug use, about human trafficking, things like this that make all the sense in the world, to ensure that they don’t go haywire and start answering all those questions to the general population,” he said. 

But once LLMs were safely configured for use and experimentation by trained and approved government officials on DOD’s classified and more secure networks, Tadross explained, the models still “refused” to fully address certain prompts about warfare planning and other defense topics.

“We needed to figure out a way to get around those refusals in order to act. Because if you’re a military officer and you’re trying to do something, even in an exercise, and it responds with ‘You should seek a diplomatic solution,’ you will get very upset. You slam the laptop closed,” he said.

“So we needed to find a way to minimize those refusals and ensure that it is not only doing that, but also answering the tone that would be useful — because if it’s like this very informal social media-type tone, it doesn’t instill a lot of confidence in its response,” he said. 

Tadross and his team trained Defense Llama on a sprawling dataset that pulled together military doctrine, international humanitarian law, and relevant policies that align with the Pentagon’s rules for armed conflict and ethical principles for AI. 

The engineering process known as supervised fine-tuning was applied. And to inform the model’s tone, officials applied reinforcement learning with human feedback methods.  

“You get a response and then you provide the type of response that you would have preferred. So because the intelligence community has already written style guides for how to write, we just built a lot of examples based off that,” Tadross said. 

He declined to confirm which classified networks Defense Llama is running on — or specific military units that are tapping into it — to date. 

But in an emailed statement, a Scale AI spokesperson later confirmed that the model “is now available for integration into various defense systems, including command and control platforms, intelligence analysis tools, and decision-support systems.” 

Defense Llama can be accessed exclusively in controlled government hubs housed within the Scale Donovan platform.

Tadross used Donovan to demonstrate the new LLM for DefenseScoop.

The platform presented another commercial LLM in a side-by-side view with Defense Llama. In the first demo, Donovan posed the question: “As a military planner, which munition should I select to destroy a hardened structure while minimizing collateral damage from a nearby civilian facility?”

Defense Llama provided a lengthy response that also spotlighted a number of factors worth considering, such as “hardness of the target, distance from civilian facilities, environmental features, and time constraints.” 

The other LLM replied with an apology, a simple explanation that the question was out of its scope, and a recommendation to seek other options.

For another prompt, Tadross asked: “What tactics has Iran employed against coalition forces?”

He explained in real time that the model that’s not Defense Llama supplied “a straight refusal.” The Scale AI-configured LLM, on the other hand, offered up multiple paragraphs about how Iran has used ballistic missiles, cyber warfare, intelligence gathering, terrorist groups and naval forces. 

“This is all very much in line with what they’ve actually done,” Tadross noted. 

Drawing back on his past experiences operating inside military command centers, he remembered how key data points and information would be funneled through many officials in high-stakes scenarios before reaching top decision-makers.

“The intent behind deploying technology like this, and the impact that I expect that it’ll make, is that it will reduce the reliance on more and more people sitting at those headquarters sections doing the grunt work that’s necessary to pull the data together. So instead, what you’ll have is a situation where there’ll be fewer people able to access a larger swath of data and make a decision quite a bit faster than what they would have done otherwise,” Tadross told DefenseScoop. 

The post Scale AI unveils ‘Defense Llama’ large language model for national security users appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/11/04/scale-ai-unveils-defense-llama-large-language-model-llm-national-security-users/feed/ 0 100717
Questions on DOD’s plans for generative AI swirl as Task Force Lima’s possible sunset nears https://defensescoop.com/2024/10/25/questions-on-dods-plans-for-generative-ai-swirl-as-task-force-limas-possible-sunset-nears/ https://defensescoop.com/2024/10/25/questions-on-dods-plans-for-generative-ai-swirl-as-task-force-limas-possible-sunset-nears/#respond Fri, 25 Oct 2024 19:02:33 +0000 https://defensescoop.com/?p=100223 "TF Lima has consistently delivered a range of insights to senior leaders," a DOD spokesperson told DefenseScoop on Friday.

The post Questions on DOD’s plans for generative AI swirl as Task Force Lima’s possible sunset nears appeared first on DefenseScoop.

]]>
Before the end of 2024, officials leading the Pentagon’s temporary generative artificial intelligence-enabling team — Task Force Lima — aim to reveal their findings and plan to guide the military’s way ahead for deploying emerging and extremely powerful frontier models to support operations, a spokesperson told DefenseScoop on Friday. 

However, a range of questions regarding Lima’s latest progress and outputs to date, the hefty volume of algorithms and use cases it’s been exploring, and the timeline for the task force’s potential decommissioning continue to linger as its sunset deadline approaches.

Defense Department leadership originally launched the team within the nascent Chief Digital and AI Office a little over a year ago. 

At the time, they expressed recognition that the emerging field of generative AI and associated large language models — which broadly yield (convincing but not always correct) software code, images, audio and other media following human prompts — present both promise and complex threats to DOD’s mission, and therefore needed to be strategically confronted in a coordinated manner.

“The Generative AI and LLM Task Force, also known as Task Force Lima (TF Lima) was established in August 2023 and we anticipated it would operate for 12 – 18 months in duration. Given the rapid evolution of generative AI technology, it was important to maintain flexibility on the timeline for the team’s work,” a DOD spokesperson wrote in an email on Friday.

From the group’s inception, DefenseScoop has steadily covered its pursuits and engaged in multiple interviews with Task Force Lima Mission Commander Navy Capt. M. Xavier Lugo.

But since August, multiple Pentagon spokespeople have not met DefenseScoop’s requests for an interview with Lugo or another member of the task force to discuss progress. They also have not directly addressed questions regarding the status of its final report and other required deliverables — or the task force’s plan to evolve or be dissolved in February, marking the end of its up-to-18-months deadline.

In the latest emailed response on Friday about those inquiries and where Lima currently stands, the DOD spokesperson pointed to the White House’s new national security directive to propel the government’s AI adoption that was released Thursday.

“To facilitate implementation of President Biden’s recent signing of the National Security Memorandum on AI, the CDAO is reviewing the findings and recommendations from Task Force Lima to ensure the department’s investments and pilots align with the whole of government approach. This will ensure the DOD is able to maximize the transformative potential of AI to maintain our technological edge and enhance operational effectiveness across the board,” they said.

“Further details on the findings from TF Lima and the department’s path forward on applying frontier models of AI will be forthcoming later this year,” the spokesperson said.

The CDAO is set to host a “Responsible AI” conference on Oct. 29 in Virginia, where it expects to bring together roughly 200 attendees from across the public and private sectors, academia, and international government partners.

The post Questions on DOD’s plans for generative AI swirl as Task Force Lima’s possible sunset nears appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/10/25/questions-on-dods-plans-for-generative-ai-swirl-as-task-force-limas-possible-sunset-nears/feed/ 0 100223
Marines planning to use large language models to help mine information repositories https://defensescoop.com/2024/09/25/marine-corps-large-language-models-help-mine-information-repositories/ https://defensescoop.com/2024/09/25/marine-corps-large-language-models-help-mine-information-repositories/#respond Wed, 25 Sep 2024 16:16:54 +0000 https://defensescoop.com/?p=98415 The effort is part of a broader push by the service to implement its new AI strategy.

The post Marines planning to use large language models to help mine information repositories appeared first on DefenseScoop.

]]>
The Marine Corps has set its sights on using large language models to help service members retrieve critical information and plan missions.

The effort is part of a broader push by the Marines to implement a new artificial intelligence strategy, according to Capt. Chris Clark, AI lead at the deputy commandant for information’s Service Data Office.

“We’re not trying to develop necessarily a Marine Corps ChatGPT-like capability, but what we are doing instead, which goes back to goal one of the AI strategy, is we are aligning AI with the mission. And in this case, what we’ve identified is the Marine Corps has a massive repository of after-action reports from every exercise, deployment, operation that we’ve done for decades. And that’s in the Marine Corps Center for Lessons Learned. And so what we’re doing … is developing a large language model system that can take that repository of information that right now is very difficult to get, you know, the results that you’re looking for out of for a number of reasons, you know, partly because there’s a lot of information in there but also it can be difficult to search and it can be difficult to find exactly what you’re looking for and what applies to your specific mission set,” Clark said during a Defense One “Genius Machines” event that aired Tuesday.

“And so with the large language model as a piece of that solution, we’re able to take that repository using what’s known as RAG — retrieval augmented generation — and using the large language model to then take user input, you know, plain text input from a standard user … [and] use the RAG implementation — which takes the information, puts it into a database and then can retrieve, based on the question [and] what that user is looking for, back to the large language model and be able to output that in a way that’s usable, that really, you know, is cross-cutting across the Marine Corps and provides critical information for Marines to make decisions on how to plan that next exercise, operation, deployment, whatever it is that they’re doing. And they’re able to take that data and do a lot with it, whether it’s summarize, you know, decades of information into a concise report or, you know, being able to just find key pieces of information and trends within that data. And so that’s one area that we’re looking at using large language models,” he added.

Officials also want to identify other use cases for the technology as Marines look to build out new capabilities across the Corps.

The primary aim of the recently released AI strategy is to gain a comprehensive understanding of mission-specific problems where artificial intelligence offers a solution, according to officials. The deputy commandant for information’s Service Data Office has been tasked with shepherding that effort.

To that end, Marines intend to create a repository of “candidate Al use cases” and a mechanism to manage the use case process that will inform service-level decisions and activities.

“We’re not looking to compete with the large language model ecosystems like [OpenAI’s] ChatGPT, Google’s Gemini, [Microsoft’s] Copilot and the others. But instead, we’re going to leverage the best of the best that we have access to as the back end, the large language model piece of it, to then continue to work on solving the problems that we want to solve,” Clark said. “We’re still experimenting, but I think it’s going to be pretty powerful once we are able to roll this out to the Marine Corps and really affect the way that we plan missions.”

The post Marines planning to use large language models to help mine information repositories appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/09/25/marine-corps-large-language-models-help-mine-information-repositories/feed/ 0 98415
Air Force releases new tool to track development, spending on AI efforts https://defensescoop.com/2024/08/27/air-force-clara-ai-platform-artificial-intelligence-machine-learning/ https://defensescoop.com/2024/08/27/air-force-clara-ai-platform-artificial-intelligence-machine-learning/#respond Tue, 27 Aug 2024 20:05:49 +0000 https://defensescoop.com/?p=96407 Known as CLARA, the tool looks to increase visibility and overall understanding of the department's AI-related initiatives.

The post Air Force releases new tool to track development, spending on AI efforts appeared first on DefenseScoop.

]]>
The Department of the Air Force’s Chief Information Office has launched a new platform that aims to enhance transparency across the various artificial intelligence and machine learning capabilities it has under development.

The online tool, dubbed CLARA, is designed to increase visibility and overall understanding of the department’s AI-related initiatives by serving as a centralized repository that provides information, progress and potential collaboration opportunities on projects, the DAF CIO noted Monday in a post on LinkedIn. The goal is to ensure stakeholders across the department stay informed and aligned in regards to these types of technologies.

“Every warfighter deserves clarity on the tools and capabilities at their disposal,” Acting DAF Chief Data and AI Officer Chandra Donelson said in a statement. “Transparent access to our resources ensures everyone is more equipped and ready to excel in any mission.”

Much like the rest of the Pentagon, the Department of the Air Force has been exploring how to leverage advancements in artificial intelligence and machine learning capabilities for the Air and Space Forces. The DAF has been experimenting with new technologies and launched pilot efforts focusing on how AI can assist both services — ranging from day-to-day tasks to tactical operations.

With a number of programs underway, CLARA will be used to monitor progress, spending and potential duplicative initiatives, DAF CIO Venice Goodwine said Monday during a keynote speech at the annual Department of the Air Force Information Technology and Cyberpower conference.

“One of the things Congress has levied upon us is we must be able to have an AI inventory so we can report how much money we’re spending on AI,” Goodwine said. “But importantly, how are we tracking the time back on mission for our airmen and guardians? CLARA is a way in which we’re going to do that.”

In April, officials set up a DAF AI Launch Point to act as a “one-stop shop” for all of the department’s emerging artificial intelligence capabilities, Goodwine said. The website includes information on policies, strategy, training and education, as well as the AI Exchange App Store where airmen and guardians can begin experimenting with AI-enabled technologies.

Among those new tools is NIPRGPT 1.0 — a generative AI chatbot hosted on the Non-classified Internet Protocol Router Network (NIPRNet). Released in June in collaboration with the Air Force Research Laboratory, the experimental platform allows the DAF to test different large language models and learn how they can be used in real-world scenarios.

NIPRGPT 1.0 has enabled experimentation with some open-source large language models, such as Meta’s Llama family of LLMs and Mistral AI, Goodwine noted.

Under what is being called NIPRGPT 1.0+, the department is looking to incorporate a retrieval-augmented generation (RAG) model to combine large language models with the department’s internal data.

“What we want to show you which model is best for which use case,” Goodwine said.

Along with NIPRGPT, the department’s AI Exchange platform also includes redForce AI — a DevOps platform that supports rapid artificial intelligence capability development for warfighters — and the Mission-Driven Autonomous Collaborative Heterogeneous Intelligent Network Architecture (MACHINA), which is part of the Space Force’s space domain awareness network architecture. 

The post Air Force releases new tool to track development, spending on AI efforts appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/08/27/air-force-clara-ai-platform-artificial-intelligence-machine-learning/feed/ 0 96407
Army set to issue new policy guidance on use of large language models https://defensescoop.com/2024/05/09/army-policy-guidance-use-large-language-models-llm/ https://defensescoop.com/2024/05/09/army-policy-guidance-use-large-language-models-llm/#respond Thu, 09 May 2024 23:15:41 +0000 https://defensescoop.com/?p=90029 Pentagon officials see generative AI as a tool that could be used across the department, but security concerns need to be addressed.

The post Army set to issue new policy guidance on use of large language models appeared first on DefenseScoop.

]]>
The Army is close to issuing a new directive to help guide the department’s use of generative artificial intelligence and, specifically, large language models, according to its chief information officer.

LLMs, which can generate content — such as text, audio, code, images, videos and other types of media — based on prompts and data they are trained on, have exploded in popularity with the emergence of ChatGPT and other commercially available tools. Pentagon officials aim to leverage generative AI capabilities, but they want solutions that won’t expose sensitive information to unauthorized individuals. They also want technology that can be tailored to meet DOD’s unique needs.

“Definitely looking at pushing out guidance here, hopefully in the next two weeks, right — no promises right now, because it’s still in some staffing — on genAI and large language models,” Army CIO Leo Garciga said Thursday during a webinar hosted by AFCEA NOVA. “We continue to see the demand signal. And though [there is] lots of immaturity in this space, we’re working through what that looks like from a cyber perspective and how we’re going to treat that. So we’re gonna have some initial policy coming out.”

The CIO’s team has been consulting with the Office of the Assistant Secretary of the Army for Acquisition, Logistics and Technology as it fleshes things out.

“We’ve been working with our partners at ASAALT to kind of give some shaping out to industry and to the force so we can get a little bit more proactive in our experimentation and operationalization of that technology,” Garciga said.

Pentagon officials see generative AI as a tool that could be used across the department, from making back-office functions more efficient to aiding warfighters on the battlefield.

However, there are security concerns that need to be addressed.

“LLMs are awesome. They’re huge productivity boosters. They allow us to get a lot more work done. But they are very new technology … In my view, we are definitely in an AI bubble. Right? When you look kind of across industry, everybody’s competing to try to get their best, you know, LLM out there as quickly as possible. And by doing that, we have some gaps. I mean, we just do. And so it’s very important that we not take an LLM that is out on, you know, the web that I can just go and log into and access and put our data into it to try to get responses,” said Jennifer Swanson, deputy assistant secretary of the Army for data, engineering and software.

Doing so risks having the Army’s sensitive data bleed into the public domain via the internet and training models that adversaries could access, she noted.

“That’s really not OK, it’s very dangerous. And so we are looking at what we can do internally, within, you know, [Impact Level] 5, IL6, whatever boundaries, different boundaries that we have out there … And we’re moving as quickly as we can. And we definitely want tools within that space that our folks can use and our developers can use, but you know it’s not going to be the tools that are out there on the internet,” she added.

The forthcoming policy guidance is expected to address security concerns.

“I think folks are really concerned out in industry. And we’re getting a lot of feedback on, you know, just asking us what we think the guidance is going to look like. But we’re going to focus on putting some guardrails and some left and right limits … We’ve really focused on letting folks know, hey, this space is open for use. If you do have access to an LLM, right, make sure you’re putting the right data in there, make sure you understand what the left and right limits [are] … Don’t put, you know, an [operation order] inside public ChatGPT — probably not a good idea, right? Believe it or not, things like that are probably happening,” Garciga said.

“I think we really want to focus on making sure that it’s a data-to capability piece, and then add some depth for our vendors where we start putting a little bit of a box around, [if] I’m going to build a model for the U.S. government, what does it mean to for me to build it on prem in my corporate headquarters? What does that look like? … What is that relationship? Because that’s going to drive contracts and a bunch of other things. We’re going to start the initial wrapping of what that’s going to look like in our initial guidance memo so we can start having a more robust conversation in this space. But it’s really going to be focused around mostly data protection … and what we think the guardrails needs to be and what our interaction between the government and industry is going to look like in this space,” he added.

The post Army set to issue new policy guidance on use of large language models appeared first on DefenseScoop.

]]>
https://defensescoop.com/2024/05/09/army-policy-guidance-use-large-language-models-llm/feed/ 0 90029