Agent Infrastructure at Scale: The Control Plane

At scale, three structural problems will break your agentic systems. They're not optional concerns. They're infrastructure requirements.

The Three Core Problems

1. Tool Definition Fragmentation

Service owners publish their own MCP servers. Result: inconsistent implementations, definitions that drift from reality, silent failures.

The structural issue: Tool definitions live outside the deployment pipeline. Developers update endpoints. MCP server definitions lag. Agents call deprecated parameters. The LLM hallucinates a recovery that doesn't exist.

Fix: Make tool definitions derivative artifacts computed from source-of-truth schemas, not independent sources.

2. Security at Machine Velocity

Agents operate at microsecond decision latencies. Humans cannot review every execution. Your agents access production systems, sensitive data, and business-critical workflows.

Requirements:

Centralized authorization: Every tool call validated against RBAC policies
PII redaction: Response filtering before agent context window
Mutation gating: Read-only by default; writes require explicit approval
Traceability: Full audit trail of every agent execution

Standard API gateways are insufficient. You're securing an entire execution plan generated ad-hoc by an LLM.

3. Discovery and Quality

Thousands of available tools. The agent must solve: which tools exist, what do they do, which combination solves the task?

Bad descriptions → LLM hallucination. Agent sees "PaymentService.process()" and invokes with non-existent parameters. The model can't distinguish between documented APIs and hallucinated ones.

Requirements:

Schema-derived descriptions: Generated from actual service contracts, not human prose
Workflow-scoped availability: Not all agents need all tools
Parameter constraints: Prevent LLM parameter invention
Quality metrics: Success rate, parameter validity, latency, user satisfaction

The Infrastructure Response

Ad-hoc tooling fails. You need a unified control plane.

Protocol-Driven Tool Generation

Stop requiring service owners to write MCP servers.

Instead:

Introspect service schemas (Protobuf, Thrift, OpenAPI)
Auto-generate MCP tool definitions from schema
Generate natural-language descriptions via LLM tuned for agent understanding
Version and distribute from central registry

Tool definitions stay synchronized with service contracts because they're computed from source of truth.

Constraint: Services must have formalized definitions. If documentation lives in Slack, this fails.

Gateway-Based Control Plane

Single gateway service becomes the enforcement point:

All agent-to-service calls route through gateway
Authentication and authorization centralized
Response filtering (PII redaction) before agent sees data
Complete metrics and observability by default
Rate limiting and circuit breaking protect downstream systems

Gateway is config-driven. Policies live in version control, deployed like code.

Gateway policy requirements:

Tool allowlist per agent
Rate limits (calls/minute, calls/hour)
Parameter constraints (max_files, max_size, etc.)
Data redaction rules (api_keys, secrets, pii)
Approval gating for write operations

Tool Scoping and Refinement

Not all agents need all tools.

Support:

Workflow-specific tool sets: Agent builders select allowed tools explicitly
Parameter overrides: Constrain parameters for known workflows to prevent hallucination
Derived tools: Specialized tool definitions layered on base service tools

Automation gets you to 80%. Explicit scoping and overrides get you to production.

Quality Metrics

Tools have SLAs. Track:

Success rate: % of calls that work vs. fail
Parameter validity: % of calls with invalid parameters
Latency distribution: p50, p99, p99.9
User satisfaction: Correctness of results from agent's perspective

Actions for underperforming tools:

Refine (better descriptions, tighter constraints)
Deprecate (remove from agent access)
Tier by SLA (restrict to high-priority agents)

SLA tracking:

tool_metrics:
  github_create_pull_request:
    success_rate: 0.98
    parameter_validity: 0.95
    p99_latency_ms: 1200
    user_satisfaction: 4.2/5.0

  payment_process:
    success_rate: 0.99
    parameter_validity: 0.98
    p99_latency_ms: 2500
    user_satisfaction: 4.7/5.0

Establish minimum thresholds. Publish to internal SLA dashboard. Action underperformers monthly.

Operationalizing Multiple Agent Surfaces

Different teams consume agents differently. Infrastructure must support all patterns.

No-Code Agent Builders

Product and business teams assemble agents through configuration. Select tools, set scoping rules, deploy. Platform handles orchestration.

Tradeoff: Requires rock-solid tool definitions and scoping. Bad definitions ship fast.

Code-First SDKs

Complex workflows (payment, support, supply chain). Teams write code with access to full tool registry, can override tool definitions, implement domain-specific validation.

SDK is thin: gateway client + registry client. Hard problems (governance, security, discovery) solved in platform layer.

Autonomous Development Agents

Production ML systems (Claude, etc.) generating and executing code changes.

Non-negotiable requirements:

Scope enforcement: What repos can the agent modify?
Approval gating: Which changes require human review before merge?
Rollback capability: Can the agent revert changes if tests fail?
Complete audit trail: Every change, every decision, fully logged

Operating Metrics

At scale:

5,000+ engineers using agentic tools monthly
10,000+ services available as tools
1,500+ active agents in production
60,000+ agent executions per week

At this scale, infrastructure decisions multiply. Flaky tool discovery affects thousands of agents. Security gaps become incidents affecting millions in transaction volume.

The gateway, registry, schema introspection, and observability are not optional. They're the baseline cost.

Requirements from Engineering Leadership

Architectural thinking: This is a platform, not a feature. Expect months to build, years to operationalize.
Cross-team alignment: Service teams, security, platform engineering, and agent builders must align on schemas, policies, tool definitions. Requires governance.
Baseline metrics: Tool quality, agent reliability, security—define and track from day one. You can't improve what you don't measure.
Staged rollout:
- Phase 1: Code-first agents (highest control, lowest blast radius)
- Phase 2: SDKs (broader teams, scoped access)
- Phase 3: No-code builders (only after phases 1-2 are hardened)
Hardened observability: Every execution traceable. Every tool call logged. Every authorization decision auditable. Non-negotiable.

Log requirement:

Agent execution ID
Tool calls with parameters
Authorization decisions (allow/deny)
Response filtering actions
Latency and error codes
Audit trail retention: 2 years minimum
Encryption at rest and in transit

Alert on:

Authorization failures > 0.1%
Tool error rate > 5%
PII redaction failures (zero tolerance)
Unauthorized scope access attempts (zero tolerance)

Unsolved Problems

Hard problems remain unsolved by infrastructure alone:

LLM hallucination in tool selection: Model problem. Better descriptions help. Scoping helps. Model will still invent parameters.
Planning under uncertainty: Agent orchestration across services with partial failures is complex. Infrastructure observes failures; can't auto-recover.
Cost and latency tradeoffs: More executions = more inference = higher costs. More tools available = longer context = higher latency. Resource tradeoffs are fundamental.
Evaluation: "Did the agent do the right thing?" is harder to measure than "did the API respond?" Requires domain expertise and statistical rigor.

Conclusion

This is an infrastructure problem, not an AI problem. The agents are straightforward. The platform keeping them safe, observable, and reliable is the hard part.

Execute:

Derivative tool definitions from schemas
Central gateway for all agent-to-service calls
Tool scoping and parameter constraints
Measured tool quality (success rate, latency, validity)
Multiple consumption patterns with appropriate controls

This is table stakes. Requires engineering discipline and sustained investment. The alternative—agents operating against inconsistent, undocumented services—is catastrophic.

Build the platform first.

References

Agentic AI & Tool Use

Model Context Protocol — Foundation for tool definition standards
Building Effective Agents — Architecture patterns and best practices (Anthropic)
OpenAI Function Calling — Reference implementation for constrained tool invocation
On the Dangers of Stochastic Parrots — Foundational work on model limitations

Platform Engineering & Control Planes

Kong API Gateway Architecture — Production gateway patterns
OPA (Open Policy Agent) — Declarative policy enforcement
Protocol Buffers Best Practices — Formalized service contracts
Gartner Platform Engineering Research — Enterprise platform patterns

Security & Governance at Scale

Google BeyondCorp — Authorization model for distributed systems
OWASP Data Protection — Sensitive data handling patterns
NIST Attribute-Based Access Control — Enterprise RBAC/ABAC reference
OpenTelemetry Observability — Complete observability framework

Tool Quality & Observability

Google SRE Book - Monitoring Distributed Systems — Measuring reliability
Graph Databases for Service Mesh — Finding and cataloging tools at scale
Tail at Scale - Google Research — Understanding p99 latency in distributed systems
Datadog Observability 101 — Industry standard observability patterns

Staged Deployment & Risk Management

Stripe Scaling Through Phases — Staged rollout strategy (video)
Flagger Canary Deployment — Progressive delivery patterns
Emma Tosch - Formal Methods for Systems — Rigorous change analysis (video)

Unsolved Research Problems

AlFworld: Autonomous Agents in Simulated Worlds — Benchmark for agent orchestration
Retrieval-Augmented Generation (RAG) — Grounding agents to external facts
LLM Inference Optimization — Reducing inference costs at scale
HELM: Holistic Evaluation of Language Models — Rigorous agent evaluation framework