Harness Engineering

AIエージェントでシステムを構築するとき、昔はコンテキスト!コンテキスト!って言ってたけど、最近はハーネス!ハーネス!って言っている気がする。また数か月後には別の名前になっているかもしれない。まぁコンテキストエンジニアリング にしろ ハーネスエンジニアリング にしろ、AIエージェントを使うときの思考のフレームワークみたいなものなので、言葉遊びの域を出ない気もする。

ハーネスとは、動物を制御したり人を安全に固定するベルトやストラップを指すので¹、AIエージェントを制御するとか、そういうニュアンス

コンテキストエンジニアリングについては昔メモしたので、今回はハーネスエンジニアリングのメモ

Agent = Model + Harness

LangChainのブログでは、AIエージェントのモデル以外の全てがハーネス、と言い切っている。

The Anatomy of an Agent Harness

Learn how agent harnesses transform AI models into autonomous work engines. Explore core components: filesystems, sandboxes, and memory.

www.langchain.com

A harness is every piece of code, configuration, and execution logic that isn’t the model itself.
(ハーネスとは、モデル本体以外のすべてのコード、設定、および実行ロジックを指します。)

エージェントに望む動作（修正）を行うための動作を機能セットとして落とし込む、これをハーネス設計としている。この記事を引用する形で、ThoughtworksのBirgitta Böckeler氏はもう少しハーネスエンジニアリングを体系化している。

Harness engineering for coding agent users

A mental model for building trust in coding agents through feedforward guides, feedback sensors, and iterative harness engineering.

martinfowler.com

ブログ記事の図1より: 「ハーネス」という用語は、文脈によって意味が異なる

ハーネスには事前に予測するガイド型（フィードフォワード）と動作後にわかるフィードバック型があり、それらの学習ループを回すことが大事。

OpenAI’s Harness Engineering

本家OpenAIが、コーディングを全てAIエージェントに任せられるようにソフトウェア開発全体を設計し直したエピソードを、ハーネスエンジニアリングとしてまとめている

Harness engineering: leveraging Codex in an agent-first world

By Ryan Lopopolo, Member of the Technical Staff

openai.com

コードは全てCodexが記述した。人が書く時間の1/10ほどの時間と推定される
人間は、大きな目標をデザインやコードに分解し、エージェントがそれらタスクのブロックを解決する 環境づくり に集中した
人間によるQAチェックがボトルネックになってきたため、アプリ起動や監視メトリクスもAIエージェントが参照できるようにし、エージェントを強化した
巨大な単一のドキュメントではなく、簡潔な AGENT.md と構造化された docs/ がエージェントに検証させるときのコツ
エージェントのコンテキストの外にある情報は「知らない情報」と同義（例: Slackでの人間の議論など）。全てアーティファクトとしてエージェントがアクセスできるようにし、場合によっては車輪の再発明も行う
アーキテクチャ制約（例: スキーマ検証、命名規則など）はエージェントにとっては拡張機能の1つ
高スループットを活用した開発へと変わる。とにかく修正、待機は悪
運用していくとエージェントは既存の設計を再実装しがちなため、定期的にガーベージコレクションするような"黄金律"をリポジトリに入れた

記事中では「ハーネス」という単語は"Evaluation harnesses"でしか使われていないが、「AIエージェントを制御・運用する仕組み（ハーネス）を設計することがソフトウェア開発の本質になった」という意図で、ハーネスエンジニアリング、というタイトルなんだろう、多分。

AutoHarness

チェスなどのゲームの「ルールを遵守する」というハーネスをAIエージェントに記述させる AutoHarness というフレームワークがある。DeepMindの研究で、ICLR2026のRSI Workshopにて採択されている。

AutoHarness: improving LLM agents by automatically synthesizing a code harness

Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited by the external environment. For example, in the recent Kaggle GameArena chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. Often people manually write "harnesses" around LLMs to prevent such failures. In this paper, we demonstrate that Gemini-2.5-Flash can automatically synthesize such a code harness, using a small number of rounds of iterative code refinement given feedback from the (game) environment. The resulting harness prevents all illegal moves in 145 different TextArena games (both 1-player and 2-player), enabling the smaller Gemini-2.5-Flash model to outperform larger models, such as Gemini-2.5-Pro. Pushing our technique to the limit, we can get Gemini-2.5-Flash to generate the entire policy in code, thus eliminating the need to use the LLM at decision making time. The resulting code-policy receives a higher average reward than Gemini-2.5-Pro and GPT-5.2-High on 16 TextArena 1-player games. Our results show that using a smaller model to synthesize a custom code harness (or entire policy) can outperform a much larger model, while also being more cost effective.

arxiv.org

行動ポリシーの更新をツリー上で管理し、Thompson samplingで探索し、ルール遵守をコード化する。ルールという前提条件があれば、プランニングを利用してハーネスそのものも生成できる、という一例。

まとめ

Context Engineering（再掲）: AIシステムがタスクを理解し、実行するために使用する情報を、設計・構造化・最適化する手法
Harness Engineering: AIエージェントが安全・確実にタスクを実行できるように、ツール・制約・検証・フィードバックループを設計・構造化・最適化する手法。

Akshay氏ははっきりと、プロンプト、コンテキスト、ハーネスという3層のエンジニアリング構造があると明記している²。同氏の、各ベンダーのハーネスツールの考え方の違いの表が面白かった。

同氏投稿より、各社ハーネス実装早見表²

Agent = Model + Harness#

OpenAI’s Harness Engineering#

AutoHarness#

まとめ#

Agent = Model + Harness

OpenAI’s Harness Engineering

AutoHarness

まとめ