APIFunnel
BlogFor Developers
DiscordLog in
← Back to Blog
Get Started Now

Why AI Agents Need Code Execution (Not Just Bigger Context Windows)

Why code execution—not context stuffing—is the foundation for scalable AI agent infrastructure

apifunnel.ai Engineering· 2026-01-20· 5 min read

Why AI Agents Need Code Execution (Not Just Bigger Context Windows)

Why code execution—not context stuffing—is the foundation for scalable AI agent infrastructure
Author: apifunnel.ai Engineering Date: January 2026
Code execution as a service is the infrastructure pattern that makes AI agents actually work at scale. The Recursive Language Model (RLM) paper↗ articulates why: instead of stuffing 150,000 tokens into context, agents that execute code can programmatically decompose problems, calling the LLM recursively on smaller chunks. But the RLM paper is theory—this article is about the production infrastructure that makes code execution reliable: progressive discovery, skills, workflows, and the six components every code execution platform needs.

Understanding the Four Concepts

Skills

Skills are packaged execution units. They can be:

  • Instructional markdown (prompt engineering)
  • Deterministic code (Python/JS functions)
  • Composable workflows (skills calling other skills)

The key property of skills is that they're persistent and reusable. Skills are versioned, immutable artifacts that can be shared and composed.

RLM (Recursive Language Models) Pattern

RLM is an inference-time technique for context decomposition, not a special model type.

Instead of:

LLM(10M token context) → answer

You do:

python
def solve(problem):
    if simple(problem):
        return LLM(problem)
    else:
        sub_problems = decompose(problem)
        results = [solve(p) for p in sub_problems]
        return LLM(combine(results))

The pattern is recursive—the LLM breaks down problems and calls itself on smaller chunks.

Workflows

Workflows are collections of skills with data flow: Skill A → Skill B → Skill C, where outputs become inputs.

For example, a reconciliation bot with access to Stripe, QuickBooks, and Shopify:

  • Fetch Stripe payments for the month (Skill A)
  • Pull QuickBooks invoices and Shopify orders (Skill B)
  • Match transactions across all three platforms (Skill C)
  • Flag discrepancies and generate reconciliation report (Skill D)
The critical piece most platforms miss: for workflows to actually work, the infrastructure must capture inputs and outputs explicitly.

If an agent creates a skill but the platform doesn't guide them to specify:

  • What inputs the skill expects
  • What outputs it produces
  • What types those are

...then the vision of composable workflows isn't achievable.

This is one of the most meaningful advantages of code execution as a service: the ability to wire up multiple API calls, build workflows, and save on token costs. But none of this is obtainable if your isolated code execution units can't be composed together.

The infrastructure needs to enforce this. At APIFunnel, when you compile a skill, you specify inputs and outputs explicitly. This makes skills truly composable and enables the agent to chain them together intelligently.

Code Execution

Code execution is the runtime environment for arbitrary logic. It's the substrate beneath everything else.

Sandboxed Python/JavaScript containers with:

  • API access (hundreds of pre-wired integrations)
  • Persistent state (session-based file storage)
  • Scheduling (cron, webhooks, callbacks)

The key property is unrestricted execution—you write the logic, it runs.


How They Relate

RLM, skills, and workflows are different abstractions over code execution.

Code execution is the primitive. Everything else is a pattern built on top of it.

Unfortunately, in reading the fine print, code execution cannot reach its true potential without the proper infrastructure. An infrastructure built with a code execution first mindset, as opposed to an afterthought. CE frameworks must be architected in a way that lets the agent cook with guidance. This way you stay out of the agent's way by design. This principle is embedded into the CE as a service platform of choice. This avoids being boxed into the agent's current abilities.


Progressive Discovery: The Infrastructure Pattern That Scales

Traditional MCP (Model Context Protocol) is essentially serverless for agents:

  • Tools exposed directly to the LLM
  • Stateless inline calls
  • Minimal infrastructure

It's simple and clean, but it breaks down once you move beyond simple use cases.

For complex systems, you need progressive discovery—a three-step architectural pattern that prevents the agent from guessing.

The Three-Step Progressive Discovery Flow

Every agent interaction follows this pattern:

Step 1: List Services
python
servers = await list_api_servers()
# Returns: service IDs, descriptions, tool counts

Called once per task (typically one conversation turn). This gives the agent a map of available integrations.

Step 2: Search Tools
python
tools = await tools_search(
    server_name="stripe_api",
    query="list charges"
)
# Returns: tool names like stripe_api.list_charges

Semantic search across tool descriptions. The agent narrows down to relevant capabilities.

Step 3: Get Tool Info
python
info = await get_tool_info(node_id="tool:stripe_api.list_charges")
# Returns: exact parameter names, types, required vs optional

This is the critical step—you get the exact schema. No guessing parameter names.


Why Progressive Discovery Matters

Any deviation from the three-step flow signals that context engineering is required.

If you see:

  • Agent calls list_api_servers() multiple times in one task → context gap
  • Agent guesses tool parameters without calling get_tool_info() → missing schema context
  • Agent retries code execution 5+ times → discovery context needs work

Coding errors are expected—retries for syntax bugs are normal.

But missing context that forces signature guessing? That's an infrastructure problem, not an agent problem.

Traditional MCP vs Code Execution with Progressive Discovery

FeatureTraditional MCPCode Execution + Discovery
Tool surfaceFixed (compile time)Dynamic (runtime)
Schema discoveryPre-loaded in contextProgressive (3-step flow)
StateStatelessSession-persistent
Context managementAll tools loaded upfrontOn-demand discovery
Recursive workflowsNot supportedNative
Error recoveryManual retryAutomatic with context

Traditional MCP loads all tool schemas into context upfront. With 5-10 servers, this works fine. With 50+ APIs, you hit context limits. Progressive discovery solves this by fetching schemas on-demand.


Code Execution Infrastructure: Six Key Components

Reliable code execution requires:

  1. Discovery Layer — list_api_servers(), tools_search(), get_tool_info() for schema inspection
  2. Sandboxed Execution — Isolated containers with resource limits and network policies
  3. Session Persistence — State survives across executions (/tmp/artifacts/, session IDs)
  4. Seamless Authentication — OAuth flows, token refresh, and credential injection
  5. Skill Composition — Explicit input/output wiring enables skills to call other skills
  6. Scheduling & Durability — Cron, webhooks, and human-in-the-loop patterns

In our experience, code execution represents the purest form of agency we have seen to date.



Related Reading

  • Task-Specific AI Agents — Why focused, stateless workers beat general-purpose assistants

APIFunnel↗ implements these patterns with progressive discovery, sandboxed execution, session persistence, OAuth orchestration, skill composition, and scheduling.
Previous
Why Your Stripe Payout Doesn't Match QuickBooks (And How to Fix It)
Next
Task-Specific AI Agents: Why One-Job AI Outperforms General-Purpose Assistants
APIFunnel logoAPIFunnel

AI assistants for accounting and construction workflows. Built for real teams, with no technical setup required.

support@apifunnel.aiinfo@mvp2o.com

Product

  • LedgerBot
  • BuilderBot
  • For Developers

Resources

  • Blog
  • FAQ
  • Documentation

Community

  • Discord

Legal

  • Privacy Policy
  • Terms of Service
  • EULA

© 2026 APIFunnel. All rights reserved.

PrivacyTermsEULALegal contact