DAF-MIT AI Accelerator Phantom projects show human expertise remains central to military AI

  • Published
  • By Capt. Amelia Leonard

For many military Artificial Intelligence efforts, the most important capability is not the model itself but the expert who knows how to use it. It’s found in the engineer reconstructing an anomaly from fragmented records, the operator reviewing an AI-generated draft, the commander interpreting readiness data, or the safety professional drawing on decades of hard-earned lessons. During the DAF–MIT Accelerator Phantom program midpoint briefings held at the MIT Computer Science and Artificial Intelligence on March 19, 2026, Phantoms repeatedly showed their fellow Phantoms and the AI Accelerator leadership team that the true AI advantage depends on people who can apply technology with judgment, mission awareness, and accountability.

U.S. Space Force Guardian, 1st Lt. Kealy Murphy’s Retrieval Augmented Vehicle Engineering Navigator system; 1st Lt. Sam Fuller’s Test Hazard Analysis-Retrieval Augmented Generation program; 1st Lt. Evan Marrone’s collaborative document platform; Senior Airman Aaron Quiroga’s cyber mission-reporting workflow; and Master Sgt. Jerry Brock’s ORBIT framework all support the premise that the military’s mission advantage in utilizing AI doesn’t come from simply understanding deep coding or program prototypes. It comes from the people who know how to use AI responsibly in operational contexts.

While some of the most critical military knowledge is informal, fragmented, and vulnerable to rotation, AI can help bridge the gap between technical expertise and the current or future forces. Its capabilities allow the military to preserve information in a way that can be easily accessed and understood by anyone with appropriate access. Murphy’s RAVEN system is one example of this. “Space vehicle engineers often operate as on‑orbit detectives, reconstructing system behavior from fragmented telemetry, legacy documentation, and institutional knowledge,” according to Murphy. “This process is time‑intensive and vulnerable to knowledge loss, particularly after reach‑back support ends. Delays in anomaly resolution increase operational risk and burden an already‑limited engineering manpower,” she said. During her midpoint brief, Murphy introduced RAVEN, an agentic RAG‑Large Language Model system, aimed to accelerate anomaly investigation and preserve engineering continuity despite rapid personnel turnover. There are engineers who have been working in the field for years, and they have memorized customized ad hoc solutions to problems, but by implementing RAVEN in day-to-day Space Force operations, she predicts a reduction in operator workload while simultaneously promoting engineering continuity across personnel rotations. Over the next several months, Murphy will continue multi‑modal ingestion of key documentation into a working RAVEN knowledge base to further optimize retrieval performance to ensure reliable, mission‑ready support for on‑orbit operations upon return to her home unit.

Fuller’s THA-RAG system solves a similar operational problem, designed to assist flight test professionals in developing THA, potentially reducing planning time, improving documentation consistency, and preserving institutional safety knowledge across units. Currently, THA development is a time-intensive, expertise-dependent process critical to flight test safety planning. It is a slow process, often inconsistent across units, and is heavily dependent on institutional knowledge and experience, he said. “Many hazards and mitigation strategies have been documented across decades of flight testing, but locating and synthesizing that information remains a largely manual process,” according to Fuller. His THA-RAG system generates grounded, citation-backed THA recommendations by utilizing an embedded LLM. In both space operations and flight test safety, the technology is useful because it doesn’t eliminate the need for expert judgment, but it helps capture and extend it, he said.

Several project presentations also emphasized the need to keep humans in the process and functioning as the reviewer and final functioning approval authority.

Marrone’s proposal for a test document generation program aims to reduce drafting time while preserving reviewer control, evidence traceability, and export quality. According to Marrone, current testing data is spread across notes, spreadsheets, prior reports, and test team memory, which makes test generation documentation a slow process. Those involved in crafting the documentation spend a significant amount of time structuring content, syncing style, resolving comments, and reformatting the documents, often forgetting to include relevant information, he said. “This creates risks: delayed reporting, inconsistent section quality, and high rework during review,” said Marrone. His proposed solution is to create a configurable “Test Document Kit” with a template-driven structure, node-level AI drafting, a context-grounded question autofill, reviewer comment workflows, and an option to review and issue program patches. Marrone emphasizes the continued need for technical experts in the field, despite the ability to utilize AI automation. “Step three is the human review,” he said. “We read over all of the questions that were answered and say, ‘Is this accurate?’” Implementing this AI-assisted test document generation shortens report cycle time, improves consistency, and reduces manual rework while keeping humans in control of every accepted change, he said. “Continued progress depends on access to representative test-document data, stakeholder feedback, and compliance/deployment hardening support.” 

Quiroga’s collaborative RAG platform for cyber mission analysis and reporting follows a similar philosophy. Designed for Cyber Protection Teams, the platform generates draft plans and reports from operator documents, but keeps the user in control through direct editing and accept-or-reject review of proposed changes. This platform will, “significantly reduce man-hours during cyber operations,

allowing more time for operators to execute the mission, while simultaneously granting CPTs the ability to take on more missions,” Quiroga said. Current CPT planning can take between three to four months during which time approximately 75 percent of the work is spent analyzing mission requirements, creating reports and conducting briefings, according to Quiroga. His project directly leads to AI-enabled cyber operation mission superiority by, “...allowing cyber operators more time to execute on a network and to defend against potential adversaries,” he said. 

Both projects reflect a common goal that AI can be used as a viable tool to reduce administrative burden, with humans continuing to oversee and have the ultimate authority on what gets accepted, changed, or sent forward. 

This same people-centered approach extends beyond individual operators and into command decision-making as well. Jerry Brock’s ORBIT system focuses on workforce analytics for operational readiness in highly specialized Space Force units. Rather than relying only on static headcounts or authorizations, ORBIT is designed to help leaders assess operational health through more dynamic factors such as personnel availability, qualification status, training progression, and experience depth. His system uses a Composite Operational Health Index, a scoring model evaluating readiness based on availability, qualification status, and experience depth. “The system applies time-series analysis and predictive modeling to identify trends signaling emerging risk, and results are delivered through a leader-focused lens displaying operational health scores, trends, and indicators of personnel-driven mission risk,” he said. The goal is to provide commanders with earlier visibility into readiness risk and better decision support for workforce management. His project reinforces the same core message that people are the capability, but that AI becomes incredibly useful when it helps leaders better understand and support their teams over time. “Mission readiness depends less on assigned strength and more on the availability, qualification, and experience distribution,” he said. 

These five projects outlined during midpoint briefs point to Phantom's broader understanding of military AI adoption and how automated systems strengthen reviewer control, improve leader awareness, and give human experts more time to focus on mission-essential judgment. Across space operations, cyber, flight test, and workforce readiness, Phantoms showed that AI’s value is greatest when it supports the people who carry the mission forward.

The DAF–MIT Accelerator Phantom program is not only producing promising technical prototypes, but it is also developing mission professionals who understand how to translate and apply AI responsibly in real operational contexts.