DOW, ODNI Seek Proposals for AI Evaluation Harness & Benchmark Framework

The Department of War, in coordination with the Office of the Director of National Intelligence, is seeking industry proposals for an evaluation harness and government-defined benchmarks that would enable rigorous, reproducible and vendor-agnostic testing of artificial intelligence systems against criteria specified by the government.

What Features Are Required in the Evaluation Harness?

According to the commercial solutions opening notice published by the Defense Innovation Unit, the War Department is pursuing an evaluation harness that connects to AI models, facilitates evaluation workflows and measures their performance against benchmarks. The harness should support human-in-the-loop, agentic and adversarial evaluations. It should simulate an integrated environment to continuously test and monitor an AI model performance in challenging settings. Furthermore, the harness should generate evaluation reports and manage benchmark execution.

What Standards Must the New Benchmarks Meet?

Vendors must provide methodologies for creating benchmarks across unclassified, secret and top secret workflows that are resistant to gaming, adaptable as requirements and AI models evolve, and supported by training materials. These benchmarks should identify capabilities for particular missions, break those capabilities into measurable tasks and create realistic evaluation scenarios. They should also define clear scoring criteria, establish fair performance baselines using open models and ensure benchmarks are valid, reliable and capable of distinguishing different levels of performance.

Why Is the Government Expanding AI Evaluation Capabilities?

The government is pursuing new evaluation systems to address the rapid advancement of AI technologies. The new infrastructure should be able to evaluate newly released AI models against mission-specific benchmarks. In addition, the system should assess human-machine collaboration to determine whether joint operations yield better mission outcomes than either humans or automated systems alone.

The effort, dubbed “Mystic Depot,” follows calls by Pentagon leadership to accelerate the adoption of AI across warfighting and administrative operations, DefenseScoop reported. Interested vendors can submit their responses to the CSO by March 24.

The U.S. Army is expanding its use of enterprise contracts to streamline procurement, promote competition and leverage enterprisewide buying power as part of efforts to modernize acquisition. The Army’s push toward enterprise contracts reflects ongoing changes in how the service approaches acquisition and modernization. Reserve your seat at the Potomac Officers Club’s 2026 Army Summit on June 18 to hear from government and industry leaders discussing priorities, technology developments and other trends shaping the military service. The service said Wednesday it has awarded 14 enterprise contracts in the past eight months, consolidating 118 separate agreements into unified vehicles and enabling