AgentKit: OpenAI's Production Agent Stack, Unpacked

Source: Introducing AgentKit, OpenAI (OpenAI DevDay, October 6, 2025)
Series: The 10 Agent Whitepapers Every Builder Should Read
TL;DR
AgentKit is OpenAI's answer to "how do I go from prototype to production" for agents. It bundles four things: Agent Builder (a visual canvas for multi-agent workflows), Connector Registry (admin-managed data and tool connections, including MCP), ChatKit (embeddable chat UI that looks like your product), and expanded Evals (datasets, trace grading, automated prompt optimization). It ships with GPT-5 underneath and no markup on model pricing. If you've been hand-rolling agent runtimes on top of the Responses API, AgentKit is now OpenAI's opinionated answer.
1. What it is
AgentKit is less a single product and more a coordinated stack. Each piece targets a different layer of the path to prod.
1.1. Agent Builder: the visual canvas
A drag-and-drop flow editor for multi-agent workflows. Nodes are agents, tools, or control-flow primitives. Edges are data flow. Features:
- Compose logic (if/else, loops, parallel fan-out, fan-in)
- Connect tools, pick from built-ins or MCP servers
- Configure guardrails (topic restrictions, PII filters, response shape)
- Versioning: commit snapshots, diff versions, roll back
- Inline evaluations: attach an eval set to any node, run it on save
- Preview runs: test a workflow before exposing it
Status: beta. Included in API pricing.
1.2. Connector Registry: centralized data and tool governance
The enterprise-grade connector layer. A Global Admin Console lets administrators enable specific data sources and MCP servers across the org's OpenAI products (API, ChatGPT Enterprise, Edu).
Pre-built connectors:
- Dropbox
- Google Drive
- SharePoint
- Microsoft Teams
Plus third-party MCP servers. Any MCP server (Slack, GitHub, Postgres, Sentry, Notion) can be registered.
Status: beta rollout to API + ChatGPT Enterprise + Edu customers with Global Admin Console.
1.3. ChatKit: embeddable chat UI
A drop-in chat UI you can embed in your own app that looks native to your product. Handles:
- Streaming responses with tool-call visualization
- File attachments
- Theming (match your brand)
- Auth hand-off
- Multi-session history
The thing ChatKit replaces: the 200 to 500 lines of React plus SSE plumbing every team writes to render agent turns.
Status: generally available.
1.4. Expanded Evals
- Datasets: curated eval sets with ground-truth labels
- Trace grading: score an entire agent trace, not just the final answer
- Automated prompt optimization: the evaluator proposes prompt edits that improve the score
- Third-party model support: evaluate non-OpenAI models against your traces
Status: generally available.
2. Why it matters
Three structural reasons AgentKit moves the ecosystem:
2.1. "From prototype to prod" was the missing middle
The Responses API gives you the turn engine. The agent SDKs (OpenAI Agents SDK, LangChain, CrewAI) give you the loop. Neither gives you:
- Admin-controlled connectors across an org
- Versioned workflows with rollback
- Inline evals baked into the editor
- Embeddable UI that isn't a weekend project
AgentKit fills that gap. For anyone who's been cobbling together LangGraph + Chainlit + Airtable for eval storage + Vercel for the UI, that entire stack collapses into one product.
2.2. First-party MCP adoption
OpenAI integrating MCP into Connector Registry is the strongest signal yet that MCP has won the connector-format war. Three of the four frontier labs (Anthropic, OpenAI, Google) now ship MCP natively. Build an MCP server once, use it across every frontier agent platform.
2.3. Automated prompt optimization at workflow scope
Inline evals plus prompt optimization means the system iterates on prompts for you. The builder pattern becomes:
1. Build a workflow
2. Attach a dataset
3. Click "optimize"
4. Review proposed prompt edits
5. Merge the ones that score higher
This is still early. Don't expect it to beat a skilled prompt engineer yet. But the ergonomics are a preview of where the space is going.
3. How to do it
3.1. Build your first workflow in Agent Builder
Typical flow for a customer-support triage agent:
┌───────────────────────────────────────────────────────────────┐
│ │
│ [Intake agent] ── classifies ticket ─▶ [Router] │
│ │ │
│ ┌─────────────────────┼─────────┐ │
│ ▼ ▼ ▼ │
│ [Billing] [Tech] [Sales] │
│ │ │ │ │
│ └──────────┬──────────┴─────────┘ │
│ ▼ │
│ [Responder agent] │
│ (drafts reply) │
│ │ │
│ ▼ │
│ [Human review node] │
│ │
└───────────────────────────────────────────────────────────────┘
Each node is a GPT-5 agent with a scoped system prompt plus tool set. Edges carry structured state. You version the whole thing like a git repo.
3.2. Register your MCP servers centrally
Admin console → Connector Registry → Add connector → "MCP server" → paste stdio/SSE config. Once added, every workflow in the org can enable it per-node with checkboxes. Permissions are admin-controlled, not developer-controlled.
{
"name": "postgres-readonly",
"type": "mcp",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres",
"postgresql://ro_user:$PW@warehouse:5432/analytics"],
"allowed_roles": ["analyst", "support-lead"]
}
3.3. Embed ChatKit into your app
// React
import { ChatKit } from "@openai/chatkit";
function SupportWidget({ workflowId, userId }) {
return (
<ChatKit
workflowId={workflowId} // from Agent Builder
userId={userId}
theme={{
primaryColor: "#0070f3",
font: "Inter, sans-serif",
}}
onToolCall={(call) => console.log("tool:", call)}
/>
);
}
Under the hood, ChatKit handles SSE streaming, tool-call rendering, attachment uploads, and session persistence.
3.4. Attach an eval to any node
In Agent Builder, open a node → Evaluations → Attach dataset. Datasets can be:
- CSV upload with
input,expected_outputcolumns - Auto-generated from production traces (pick "sampled traces from last 7 days")
- Imported from the OpenAI Evals library
On save, the node runs the dataset and reports pass/fail. For trace grading (evaluating the whole workflow, not one node), attach the dataset at the top-level workflow instead.
3.5. Automated prompt optimization: use it carefully
Enable "Auto-optimize prompt" on a node. The system:
- Runs your current prompt against the dataset
- Proposes N variant prompts using GPT-5
- Scores variants against the dataset
- Shows the top-K with delta vs. baseline
Do not blindly merge the winning variant. The optimizer is aggressive and can over-fit to the dataset. Human review before merge.
3.6. When AgentKit is the right pick
Use AgentKit when:
- You're standardized on OpenAI's models
- You need admin controls on data sources across many developers
- You want embeddable chat UI without building one
- Your team includes non-developers who can own workflows in a visual builder
Consider alternatives when:
- You need multi-vendor model routing (use the Claude Agent SDK plus your own router, or LangGraph)
- You need full control over the runtime (the Agents SDK or raw Responses API)
- Your workflows are so simple a visual builder adds overhead
4. AgentKit vs. the field
| Capability | AgentKit | Claude Agent SDK | Magentic-UI | LangGraph |
|---|---|---|---|---|
| Visual workflow builder | Yes | No (code-first) | Partial | Studio beta |
| Admin connector registry | Yes | No (per-project) | No | No |
| Embeddable chat UI | Yes (ChatKit) | No | Built-in | No |
| Inline evals | Yes | External | No | LangSmith |
| Auto prompt optimization | Yes | No | No | No |
| Multi-vendor models | Partial | Partial | Yes | Yes |
| Open source | No | Yes (SDK) | Yes | Yes |
| Ideal for | Prod OpenAI teams | Code-first teams | Human-in-loop browser | Flexible pipelines |
5. Key takeaways
- AgentKit is the first "full stack" agent product from a frontier lab. Builder + connectors + UI + evals in one.
- MCP is now first-party on OpenAI. If you weren't building MCP servers for your internal tools, you should be. They now work everywhere.
- Inline evals plus auto-optimization is the future. Even if the current version is rough, the ergonomics are where the field is going.
- Admin-controlled connectors solve a real enterprise pain. Per-developer keys and per-project connector sprawl is a security problem AgentKit addresses directly.
- No markup on model pricing. The stack is free. You pay for inference.
Further reading
- GPT-5 System Card, the model AgentKit runs on
- 10 Best MCP Servers, connectors to register first
- Claude Agent SDK Loop, the code-first alternative
- Magentic-UI, Microsoft's human-centered answer