DOW, ODNI Seek AI Evaluation Harness, Benchmark Proposals

The Department of War, in coordination with the Office of the Director of National Intelligence, is seeking industry proposals for an evaluation harness and government-defined benchmarks that would enable rigorous, reproducible and vendor-agnostic testing of artificial intelligence systems against criteria specified by the government.

Sign up for the Potomac Officers Club’s 2026 Artificial Intelligence Summit on March 18 to hear Cameron Stanley, chief digital and AI officer at the Department of War, and other federal, defense and industry leaders discuss the impact of AI, machine learning and automation.

Table of Contents

What Features Are Required in the Evaluation Harness?

According to the commercial solutions opening notice published by the Defense Innovation Unit, the War Department is pursuing an evaluation harness that connects to AI models, facilitates evaluation workflows and measures their performance against benchmarks. The harness should support human-in-the-loop, agentic and adversarial evaluations. It should simulate an integrated environment to continuously test and monitor an AI model performance in challenging settings. Furthermore, the harness should generate evaluation reports and manage benchmark execution.

What Standards Must the New Benchmarks Meet?

Vendors must provide methodologies for creating benchmarks across unclassified, secret and top secret workflows that are resistant to gaming, adaptable as requirements and AI models evolve, and supported by training materials. These benchmarks should identify capabilities for particular missions, break those capabilities into measurable tasks and create realistic evaluation scenarios. They should also define clear scoring criteria, establish fair performance baselines using open models and ensure benchmarks are valid, reliable and capable of distinguishing different levels of performance.

Why Is the Government Expanding AI Evaluation Capabilities?

The government is pursuing new evaluation systems to address the rapid advancement of AI technologies. The new infrastructure should be able to evaluate newly released AI models against mission-specific benchmarks. In addition, the system should assess human-machine collaboration to determine whether joint operations yield better mission outcomes than either humans or automated systems alone.

The effort, dubbed “Mystic Depot,” follows calls by Pentagon leadership to accelerate the adoption of AI across warfighting and administrative operations, DefenseScoop reported. Interested vendors can submit their responses to the CSO by March 24.

Treasury Department seal. Treasury has named 19 individuals to senior positions spanning economic policy and operations.

Treasury Announces 19 Senior Appointments Across Policy, Operations Offices

The Department of the Treasury has named 19 individuals with experience across government and the private sector to senior positions spanning economic policy, legislative affairs, operations and public affairs. Secretary of the Treasury Scott Bessent made the announcement on Tuesday. Who Are the Appointed Senior Leaders for Treasury? Austin Browning, deputy chief of staff for operations Aynsley Moore, chief of operational strategy Brandon Mayhew, deputy executive secretary Charlie Bolton, deputy assistant secretary for macroeconomics Christina Skinner, deputy assistant secretary for the Financial Stability Oversight Council Don Snyder, deputy assistant secretary for energy, strategic initiatives and tax administration Elliott Yoshio Hulse,

July 24, 2025

Pentagon. The DOW Office of the CIO has launched the Cyber Registered Apprenticeship Program.

Pentagon Launches Cyber Apprenticeship Program

The Department of War’s Office of the Chief Information Officer has launched the Cyber Registered Apprenticeship Program, or Cyber RAP, as part of efforts to strengthen its cybersecurity workforce and support national security missions. The latest DOW initiative underscores the growing demand for skilled cyber professionals across the federal landscape. Join the 2026 Cyber Summit on May 21 to hear defense officials and industry leaders discuss AI in cyber defense, post-quantum cryptography, cybersecurity compliance, zero trust and more. Register now! The DOW OCIO said Tuesday Cyber RAP, which will launch as a pilot in the summer of 2026, is designed

July 24, 2025

DLA Weapons Support Launches Rapid Sustainment Initiative for Weapon Systems

The Defense Logistics Agency Weapons Support has introduced the Rapid Sustainment Initiative, or RSI, a contracting approach designed to address gaps in sustainment planning for new weapon systems. DLA announced the initiative in a Friday post to their website. What Challenge Is the Rapid Sustainment Initiative Addressing? Sustainment planning often began at initial operating capability, or Milestone C, creating a reactive environment throughout the acquisition lifecycle that triggered delays and parts shortages. U.S. Army Brig. Gen. Beth Behn warned that insufficient sustainment planning can result in readiness issues years after fielding. How Does RSI Work? The initiative shifts DLA engagement to

July 24, 2025

What Features Are Required in the Evaluation Harness?

What Standards Must the New Benchmarks Meet?

Why Is the Government Expanding AI Evaluation Capabilities?

Related Articles