AgentKit: OpenAI's Production Agent Stack, Unpacked

Source: Introducing AgentKit, OpenAI (OpenAI DevDay, October 6, 2025)

Series: The 10 Agent Whitepapers Every Builder Should Read

TL;DR

AgentKit is OpenAI's answer to "how do I go from prototype to production" for agents. It bundles four things: Agent Builder (a visual canvas for multi-agent workflows), Connector Registry (admin-managed data and tool connections, including MCP), ChatKit (embeddable chat UI that looks like your product), and expanded Evals (datasets, trace grading, automated prompt optimization). It ships with GPT-5 underneath and no markup on model pricing. If you've been hand-rolling agent runtimes on top of the Responses API, AgentKit is now OpenAI's opinionated answer.

1. What it is

AgentKit is less a single product and more a coordinated stack. Each piece targets a different layer of the path to prod.

1.1. Agent Builder: the visual canvas

A drag-and-drop flow editor for multi-agent workflows. Nodes are agents, tools, or control-flow primitives. Edges are data flow. Features:

Compose logic (if/else, loops, parallel fan-out, fan-in)
Connect tools, pick from built-ins or MCP servers
Configure guardrails (topic restrictions, PII filters, response shape)
Versioning: commit snapshots, diff versions, roll back
Inline evaluations: attach an eval set to any node, run it on save
Preview runs: test a workflow before exposing it

Status: beta. Included in API pricing.

1.2. Connector Registry: centralized data and tool governance

The enterprise-grade connector layer. A Global Admin Console lets administrators enable specific data sources and MCP servers across the org's OpenAI products (API, ChatGPT Enterprise, Edu).

Pre-built connectors:

Dropbox
Google Drive
SharePoint
Microsoft Teams

Plus third-party MCP servers. Any MCP server (Slack, GitHub, Postgres, Sentry, Notion) can be registered.

Status: beta rollout to API + ChatGPT Enterprise + Edu customers with Global Admin Console.

1.3. ChatKit: embeddable chat UI

A drop-in chat UI you can embed in your own app that looks native to your product. Handles:

Streaming responses with tool-call visualization
File attachments
Theming (match your brand)
Auth hand-off
Multi-session history

The thing ChatKit replaces: the 200 to 500 lines of React plus SSE plumbing every team writes to render agent turns.

Status: generally available.

1.4. Expanded Evals

Datasets: curated eval sets with ground-truth labels
Trace grading: score an entire agent trace, not just the final answer
Automated prompt optimization: the evaluator proposes prompt edits that improve the score
Third-party model support: evaluate non-OpenAI models against your traces

Status: generally available.

2. Why it matters

Three structural reasons AgentKit moves the ecosystem:

2.1. "From prototype to prod" was the missing middle

The Responses API gives you the turn engine. The agent SDKs (OpenAI Agents SDK, LangChain, CrewAI) give you the loop. Neither gives you:

Admin-controlled connectors across an org
Versioned workflows with rollback
Inline evals baked into the editor
Embeddable UI that isn't a weekend project

AgentKit fills that gap. For anyone who's been cobbling together LangGraph + Chainlit + Airtable for eval storage + Vercel for the UI, that entire stack collapses into one product.

2.2. First-party MCP adoption

OpenAI integrating MCP into Connector Registry is the strongest signal yet that MCP has won the connector-format war. Three of the four frontier labs (Anthropic, OpenAI, Google) now ship MCP natively. Build an MCP server once, use it across every frontier agent platform.

2.3. Automated prompt optimization at workflow scope

Inline evals plus prompt optimization means the system iterates on prompts for you. The builder pattern becomes:

1. Build a workflow
2. Attach a dataset
3. Click "optimize"
4. Review proposed prompt edits
5. Merge the ones that score higher

This is still early. Don't expect it to beat a skilled prompt engineer yet. But the ergonomics are a preview of where the space is going.

3. How to do it

3.1. Build your first workflow in Agent Builder

Typical flow for a customer-support triage agent:

┌───────────────────────────────────────────────────────────────┐
│                                                               │
│   [Intake agent] ── classifies ticket ─▶ [Router]             │
│                                            │                  │
│                      ┌─────────────────────┼─────────┐        │
│                      ▼                     ▼         ▼        │
│                 [Billing]              [Tech]    [Sales]      │
│                      │                     │         │        │
│                      └──────────┬──────────┴─────────┘        │
│                                 ▼                             │
│                         [Responder agent]                     │
│                            (drafts reply)                     │
│                                 │                             │
│                                 ▼                             │
│                        [Human review node]                    │
│                                                               │
└───────────────────────────────────────────────────────────────┘

Each node is a GPT-5 agent with a scoped system prompt plus tool set. Edges carry structured state. You version the whole thing like a git repo.

3.2. Register your MCP servers centrally

Admin console → Connector Registry → Add connector → "MCP server" → paste stdio/SSE config. Once added, every workflow in the org can enable it per-node with checkboxes. Permissions are admin-controlled, not developer-controlled.

{
  "name": "postgres-readonly",
  "type": "mcp",
  "transport": "stdio",
  "command": "npx",
  "args": ["-y", "@modelcontextprotocol/server-postgres",
           "postgresql://ro_user:$PW@warehouse:5432/analytics"],
  "allowed_roles": ["analyst", "support-lead"]
}

3.3. Embed ChatKit into your app

// React
import { ChatKit } from "@openai/chatkit";

function SupportWidget({ workflowId, userId }) {
  return (
    <ChatKit
      workflowId={workflowId}   // from Agent Builder
      userId={userId}
      theme={{
        primaryColor: "#0070f3",
        font: "Inter, sans-serif",
      }}
      onToolCall={(call) => console.log("tool:", call)}
    />
  );
}

Under the hood, ChatKit handles SSE streaming, tool-call rendering, attachment uploads, and session persistence.

3.4. Attach an eval to any node

In Agent Builder, open a node → Evaluations → Attach dataset. Datasets can be:

CSV upload with input,expected_output columns
Auto-generated from production traces (pick "sampled traces from last 7 days")
Imported from the OpenAI Evals library

On save, the node runs the dataset and reports pass/fail. For trace grading (evaluating the whole workflow, not one node), attach the dataset at the top-level workflow instead.

3.5. Automated prompt optimization: use it carefully

Enable "Auto-optimize prompt" on a node. The system:

Runs your current prompt against the dataset
Proposes N variant prompts using GPT-5
Scores variants against the dataset
Shows the top-K with delta vs. baseline

Do not blindly merge the winning variant. The optimizer is aggressive and can over-fit to the dataset. Human review before merge.

3.6. When AgentKit is the right pick

Use AgentKit when:

You're standardized on OpenAI's models
You need admin controls on data sources across many developers
You want embeddable chat UI without building one
Your team includes non-developers who can own workflows in a visual builder

Consider alternatives when:

You need multi-vendor model routing (use the Claude Agent SDK plus your own router, or LangGraph)
You need full control over the runtime (the Agents SDK or raw Responses API)
Your workflows are so simple a visual builder adds overhead

4. AgentKit vs. the field

Capability	AgentKit	Claude Agent SDK	Magentic-UI	LangGraph
Visual workflow builder	Yes	No (code-first)	Partial	Studio beta
Admin connector registry	Yes	No (per-project)	No	No
Embeddable chat UI	Yes (ChatKit)	No	Built-in	No
Inline evals	Yes	External	No	LangSmith
Auto prompt optimization	Yes	No	No	No
Multi-vendor models	Partial	Partial	Yes	Yes
Open source	No	Yes (SDK)	Yes	Yes
Ideal for	Prod OpenAI teams	Code-first teams	Human-in-loop browser	Flexible pipelines

5. Key takeaways

AgentKit is the first "full stack" agent product from a frontier lab. Builder + connectors + UI + evals in one.
MCP is now first-party on OpenAI. If you weren't building MCP servers for your internal tools, you should be. They now work everywhere.
Inline evals plus auto-optimization is the future. Even if the current version is rough, the ergonomics are where the field is going.
Admin-controlled connectors solve a real enterprise pain. Per-developer keys and per-project connector sprawl is a security problem AgentKit addresses directly.
No markup on model pricing. The stack is free. You pay for inference.