Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

Enterprise Agentic AI Pattern: Proximity Over SpecializationPinned

posted on November 28, 2025 | tags: [ AppliedAI, AI Agents, Multi-Tenant, Architecture, Enterprise AI ]
A production-ready pattern for 2025 and beyond – Mark Roxberry
Enterprise Agentic AI Pattern
A deep dive into the architecture pattern that dominated AI agent deployments in 2025: one immutable agent binary, externalized context, and proximity-based execution.

Proximity Over Specialization

In my recent innovation projects, I’ve been designing, building, and validating agentic architectures, focusing on the reference architecture, the proof of concept, and the first working implementation. I’ve found it effective to use a consistent immutable runtime model across all agent types, including orchestrator, triage, responder, and extension agents, creating a common orchestration specification that scales and keeps behavior driven by external context with controlled extensibility.

There are no regional variants or tenant-specific branches, and no one-off forks created for compliance exceptions. The runtime is identical everywhere. The agent runs where the data lives, and all dynamic behavior comes from external artifacts and controlled extension points, not code changes. The design is intentional, preventing the fragmentation that runtime customization creates.

Rationale for the Architecture

Security needs consistency.
A single runtime means one SBOM, one digest, and a clear audit path. When asked what runs in a region, you provide the exact artifact identifier.

Data residency must be automatic.
The same container deploys globally. Each agent instance pulls tenant context from its region and becomes compliant without custom builds.

Upgrades must be predictable.
With only one runtime, global rollouts take minutes. Variation moves into versioned, verified data bundles and extensible modules, not the binary.

Core Components of the Pattern

Core Immutable Runtime with Modular Extensibility

The core runtime never changes per region or tenant. It provides a fixed execution path for stability, observability, and auditability. Flexibility comes from a modular extension layer that supports plugins, adapters, or capability modules. These extensions operate under strict boundaries and are versioned, signed, and controlled.

The core stays immutable for sanity and consistency.
Extensions provide flexibility without compromising architecture integrity.
Behavior remains data-driven, not code-fork driven.

Externalized Context

Policies, product catalogs, escalation rules, and retrieval indexes exist as signed, versioned artifacts. The runtime pulls and caches them at startup. Replacing these artifacts updates behavior without changing the runtime itself.

Deterministic Model Routing

Routing follows a predictable sequence:

  • Cache first
  • Local 8B–70B model next
  • Frontier model only when required

Every escalation is logged and reviewable.

Automated Context Compaction

Enterprise tenants often provide massive policy or reference documents. The runtime triggers a compactor module using a long-context lightweight model. The compactor generates a deterministic low-temperature summary and returns a fingerprint. The agent continues execution with the compressed context.

Flow overview:

RegionDeployment

RegionalDataStores

ContextCompaction

ModelRouter

Request

Context Exceeds Window

Load and Cache

Escalate on Low Confidence

ExternalContext

Policy Bundles Signed Versioned

Domain Data

Escalation Rules

RAG Index Sources

ExtensionLayer

Policy Engine Extension

Domain Adapter

Tool Plugin

User or Client

API or Ingress

Core Immutable Agent Runtime

Response Cache

Local Model

Frontier Model

Large Tenant Context

Compactor SubAgent

Compacted Summary With Fingerprint

Vector Store

Object Store Artifacts Summaries

Logs Metrics Audit

Additional Regions

Engineering Work That Made It Practical

  • Compaction runs deterministically and is precomputed during policy deployment.
  • Every artifact, raw or summarized, carries a stable fingerprint for audit trails.
  • Compaction includes PII stripping; if too much content is removed, deployment halts.
  • Pre-loading compacted blobs reduces cold-start latency.
  • Local-model speculation reduces frontier model calls by more than half.
  • Extension modules are versioned artifacts, not ad-hoc code injections.

Results After Long-Term Use

  • Audits validated through a single runtime digest
  • Global patching consolidated into one predictable rollout
  • Frontier model usage reduced to exceptional cases only
  • No regional forks and no tenant-specific binaries

Closing

A single immutable runtime with externalized context and modular extensibility, deployed close to the data, is the pattern that consistently avoids the failure modes I experienced in other approaches.

References

Credits

Image

  • Image generated with DALL·E (OpenAI); edited by Mark Roxberry
This post and/or images used in it may have been created or enhanced using generative AI tools for clarity and organization. However, all ideas, technical work, solutions, integrations, and other aspects described here are entirely my own.