Back to white papers
White Paper

AgentKit: OpenAI's Production Agent Stack, Unpacked

Jake McCluskey
AgentKit: OpenAI's Production Agent Stack, Unpacked

Source: Introducing AgentKit, OpenAI (OpenAI DevDay, October 6, 2025)

Series: The 10 Agent Whitepapers Every Builder Should Read

TL;DR

AgentKit is OpenAI's answer to "how do I go from prototype to production" for agents. It bundles four things: Agent Builder (a visual canvas for multi-agent workflows), Connector Registry (admin-managed data and tool connections, including MCP), ChatKit (embeddable chat UI that looks like your product), and expanded Evals (datasets, trace grading, automated prompt optimization). It ships with GPT-5 underneath and no markup on model pricing. If you've been hand-rolling agent runtimes on top of the Responses API, AgentKit is now OpenAI's opinionated answer.

1. What it is

AgentKit is less a single product and more a coordinated stack. Each piece targets a different layer of the path to prod.

1.1. Agent Builder: the visual canvas

A drag-and-drop flow editor for multi-agent workflows. Nodes are agents, tools, or control-flow primitives. Edges are data flow. Features:

  • Compose logic (if/else, loops, parallel fan-out, fan-in)
  • Connect tools, pick from built-ins or MCP servers
  • Configure guardrails (topic restrictions, PII filters, response shape)
  • Versioning: commit snapshots, diff versions, roll back
  • Inline evaluations: attach an eval set to any node, run it on save
  • Preview runs: test a workflow before exposing it

Status: beta. Included in API pricing.

1.2. Connector Registry: centralized data and tool governance

The enterprise-grade connector layer. A Global Admin Console lets administrators enable specific data sources and MCP servers across the org's OpenAI products (API, ChatGPT Enterprise, Edu).

Pre-built connectors:

  • Dropbox
  • Google Drive
  • SharePoint
  • Microsoft Teams

Plus third-party MCP servers. Any MCP server (Slack, GitHub, Postgres, Sentry, Notion) can be registered.

Status: beta rollout to API + ChatGPT Enterprise + Edu customers with Global Admin Console.

1.3. ChatKit: embeddable chat UI

A drop-in chat UI you can embed in your own app that looks native to your product. Handles:

  • Streaming responses with tool-call visualization
  • File attachments
  • Theming (match your brand)
  • Auth hand-off
  • Multi-session history

The thing ChatKit replaces: the 200 to 500 lines of React plus SSE plumbing every team writes to render agent turns.

Status: generally available.

1.4. Expanded Evals

  • Datasets: curated eval sets with ground-truth labels
  • Trace grading: score an entire agent trace, not just the final answer
  • Automated prompt optimization: the evaluator proposes prompt edits that improve the score
  • Third-party model support: evaluate non-OpenAI models against your traces

Status: generally available.

2. Why it matters

Three structural reasons AgentKit moves the ecosystem:

2.1. "From prototype to prod" was the missing middle

The Responses API gives you the turn engine. The agent SDKs (OpenAI Agents SDK, LangChain, CrewAI) give you the loop. Neither gives you:

  • Admin-controlled connectors across an org
  • Versioned workflows with rollback
  • Inline evals baked into the editor
  • Embeddable UI that isn't a weekend project

AgentKit fills that gap. For anyone who's been cobbling together LangGraph + Chainlit + Airtable for eval storage + Vercel for the UI, that entire stack collapses into one product.

2.2. First-party MCP adoption

OpenAI integrating MCP into Connector Registry is the strongest signal yet that MCP has won the connector-format war. Three of the four frontier labs (Anthropic, OpenAI, Google) now ship MCP natively. Build an MCP server once, use it across every frontier agent platform.

2.3. Automated prompt optimization at workflow scope

Inline evals plus prompt optimization means the system iterates on prompts for you. The builder pattern becomes:

1. Build a workflow
2. Attach a dataset
3. Click "optimize"
4. Review proposed prompt edits
5. Merge the ones that score higher

This is still early. Don't expect it to beat a skilled prompt engineer yet. But the ergonomics are a preview of where the space is going.

3. How to do it

3.1. Build your first workflow in Agent Builder

Typical flow for a customer-support triage agent:

┌───────────────────────────────────────────────────────────────┐
│                                                               │
│   [Intake agent] ── classifies ticket ─▶ [Router]             │
│                                            │                  │
│                      ┌─────────────────────┼─────────┐        │
│                      ▼                     ▼         ▼        │
│                 [Billing]              [Tech]    [Sales]      │
│                      │                     │         │        │
│                      └──────────┬──────────┴─────────┘        │
│                                 ▼                             │
│                         [Responder agent]                     │
│                            (drafts reply)                     │
│                                 │                             │
│                                 ▼                             │
│                        [Human review node]                    │
│                                                               │
└───────────────────────────────────────────────────────────────┘

Each node is a GPT-5 agent with a scoped system prompt plus tool set. Edges carry structured state. You version the whole thing like a git repo.

3.2. Register your MCP servers centrally

Admin console → Connector Registry → Add connector → "MCP server" → paste stdio/SSE config. Once added, every workflow in the org can enable it per-node with checkboxes. Permissions are admin-controlled, not developer-controlled.

{
  "name": "postgres-readonly",
  "type": "mcp",
  "transport": "stdio",
  "command": "npx",
  "args": ["-y", "@modelcontextprotocol/server-postgres",
           "postgresql://ro_user:$PW@warehouse:5432/analytics"],
  "allowed_roles": ["analyst", "support-lead"]
}

3.3. Embed ChatKit into your app

// React
import { ChatKit } from "@openai/chatkit";

function SupportWidget({ workflowId, userId }) {
  return (
    <ChatKit
      workflowId={workflowId}   // from Agent Builder
      userId={userId}
      theme={{
        primaryColor: "#0070f3",
        font: "Inter, sans-serif",
      }}
      onToolCall={(call) => console.log("tool:", call)}
    />
  );
}

Under the hood, ChatKit handles SSE streaming, tool-call rendering, attachment uploads, and session persistence.

3.4. Attach an eval to any node

In Agent Builder, open a node → Evaluations → Attach dataset. Datasets can be:

  • CSV upload with input,expected_output columns
  • Auto-generated from production traces (pick "sampled traces from last 7 days")
  • Imported from the OpenAI Evals library

On save, the node runs the dataset and reports pass/fail. For trace grading (evaluating the whole workflow, not one node), attach the dataset at the top-level workflow instead.

3.5. Automated prompt optimization: use it carefully

Enable "Auto-optimize prompt" on a node. The system:

  1. Runs your current prompt against the dataset
  2. Proposes N variant prompts using GPT-5
  3. Scores variants against the dataset
  4. Shows the top-K with delta vs. baseline

Do not blindly merge the winning variant. The optimizer is aggressive and can over-fit to the dataset. Human review before merge.

3.6. When AgentKit is the right pick

Use AgentKit when:

  • You're standardized on OpenAI's models
  • You need admin controls on data sources across many developers
  • You want embeddable chat UI without building one
  • Your team includes non-developers who can own workflows in a visual builder

Consider alternatives when:

  • You need multi-vendor model routing (use the Claude Agent SDK plus your own router, or LangGraph)
  • You need full control over the runtime (the Agents SDK or raw Responses API)
  • Your workflows are so simple a visual builder adds overhead

4. AgentKit vs. the field

CapabilityAgentKitClaude Agent SDKMagentic-UILangGraph
Visual workflow builderYesNo (code-first)PartialStudio beta
Admin connector registryYesNo (per-project)NoNo
Embeddable chat UIYes (ChatKit)NoBuilt-inNo
Inline evalsYesExternalNoLangSmith
Auto prompt optimizationYesNoNoNo
Multi-vendor modelsPartialPartialYesYes
Open sourceNoYes (SDK)YesYes
Ideal forProd OpenAI teamsCode-first teamsHuman-in-loop browserFlexible pipelines

5. Key takeaways

  • AgentKit is the first "full stack" agent product from a frontier lab. Builder + connectors + UI + evals in one.
  • MCP is now first-party on OpenAI. If you weren't building MCP servers for your internal tools, you should be. They now work everywhere.
  • Inline evals plus auto-optimization is the future. Even if the current version is rough, the ergonomics are where the field is going.
  • Admin-controlled connectors solve a real enterprise pain. Per-developer keys and per-project connector sprawl is a security problem AgentKit addresses directly.
  • No markup on model pricing. The stack is free. You pay for inference.

Further reading