Towards a Modern Agent Architecture
Intelligent agent systems require more nuance than contemporary frameworks provide. In this post, I argue that operating systems are an excellent model for how to think about building agents, and propose Tokio as a foundational primitive for cognitive scheduling
The recent popularity explosion of LLMs and the subsequent funding of startups and companies that use them, in addition to the massive push for adoption of AI system across almost every sector of the economy (particularly in tech), has created an artificial urgency to build agent systems that can operate across a wide range of domains. Most of the common "agent" frameworks and libraries - such as aisdk, LangChain, AutoGPT, etc. - rely on an architecture that resembles the following pattern:
async function agentLoop(goal: string): Promise<string> {
const context = [{ role: "user", content: goal }];
while (true) {
const response = await llm.chat(context);
if (response.done) {
return response.answer;
}
const result = await executeTool(response.tool_call);
context.push({ role: "tool", content: result });
}
}
This is called the ReAct pattern, described by Yao et al., 2023 - an LLM generates a "thought", selects an action, observes the result, and repeats.
For the most part, this code works. It solves a lot of problems and behaves in a way that adheres to some expectations of what "agents" should do. Unfortunately, this approach has no answer to the following questions:
- What happens when the user cancels mid-execution?
- What happens when the context exceeds the model's context window?
- What happens when two tool calls conflict?
- What happens when the system must interrupt one goal for a higher-priority goal?
- How does the agent learn from its mistakes?
These are fundamental operating conditions of any system that runs longer than a single request-response cycle.
Current agent architectures are built, primarily, following the mental model of web applications: stateless request-response cycles with external state management. But agents are not web applications, they are operating systems: stateful processes with resource constraints, preemption requirements, and multiple concurrent activities that must be scheduled in an intelligent manner.
I believe that the Tokio async runtime can provide a correct primitive foundation for building agent systems, and that the architecture of such systems should be grounded in cognitive science rather than web application patterns. LLMs are excellent engines, but in order to drive, you must first design the rest of the car.
What is an Agent?
Before we proceed, we need to define terms. The word "agent" has acquired significant baggage in the current discourse, often meaning little more than "an LLM that can call tools." This, in my opinion, is an inadequate definition.
Cognitive science and artificial intelligence research, dating back to the 1950's, has defined an Agent as a system that:
- Perceives its environment (sensors)
- Maintains an internal state (memory)
- Selects actions towards goals (decision-making)
- Acts on its environment (effectors)
- Learns from experience (adaptation)
Agents are characterized by their functional architecture, not their components. LLMs are one possible implementation of certain agent capabilities - specifically, they excel at one modality of perception (understanding natural language input) and certain kinds of decision-making (selecting actions given context). But LLMs are not agents. The confusion between "LLM" and "agent" has led to architectures where the LLM is asked to perform functions it is not suited for: maintaining consistent long-term state, managing resources constraints, handling interruption and resumption, and scheduling concurrent activities. These are systems programming problems, not language modeling problems.
Evidence from Cognitive Science
The study of agent systems is not new. Cognitive Science has been building and analyzing agent systems since the 1950s, with particularly rigorous work beginning in the 1980s.
Two early architectures deserve specific attention: SOAR (Laird, Newell, & Rosenbloom, 1987) and ACT-R (Anderson, 1993). These systems were developed independently but converged on remarkably similar structures - a convergence that suggests something fundamental about what agent architectures require.
SOAR emerged from Allen Newell's work on unified theories of cognition. Its central insight is the Problem Space Hypothesis: all goal-directed behavior can be understood as search through a space of possible states. SOAR Agents operate through a cycle of proposing operators (possible actions), selecting among them, and applying the selected operator to the current state. When the agent cannot make progress - a condition called an impasse - it automatically creates a subgoal to resolve the impasse.
ACT-R, developed by John Anderson, focuses on modeling human cognitive processes with quantitative precision. It distinguishes between procedural memory (skills, encoded as production rules), declarative memory (facts and episodes), and working memory (the currently active context). Critically, ACT-R models the timing of cognitive processes: memory retrieval takes time proportional to an item's activation level, and production rules fire in approximately 50ms.
Later architectures explored dimensions these early systems underemphasized. CLARION (Sun, 2002) introduced a fundamental distinction between implicit and explicit cognition. Where ACT-R separates procedural from declarative memory, CLARION argues that both can exist in implicit (sub-symbolic, neural-network-like) and explicit (symbolic, rule-based) forms. The architecture models how skills often begin as explicit knowledge and become implicit through practice - and conversely, how implicit expertise can be extracted into explicit rules. CLARION also incorporates motivational and metacognitive subsystems that modulate the action-centered reasoning which addresses the question of why an agent acts, not just how.
LIDA (Franklin et al., 2007) implements the Global Workspace Theory, which is a psychological theory of consciousness. Its key insight is that cognition operates through repeated cycles (~10Hz) in which specialized processors compete for access to a "global workspace." The winning coalition gets broadcast to all other processors to enable coordination without central control. LIDA models attention as this competitive process, with "attention codelets" selecting what becomes conscious. This provides an explicit mechanism for the limited-capacity, selective nature of attention that other architectures have often left implicit.
OntoAgent (McShane & Nirenburg, 2012) approaches the problem primarily from language understanding. Where most architectures treat language as just another input modality, OntoAgent treats deep semantic understanding as central to intelligent behavior. Agents translate perceived inputs into an unambiguous, ontologically-grounded knowledge representation before reasoning over them. This insight addressed a gap in architectures that assumed clean symbolic inputs: real agents need to construct meaning from ambiguous, context-dependent signals.
Sigma (Rosenbloom, Demski, & Ustun, 2016) takes a different approach to unification. Rather than building separate modules for different cognitive functions and connecting them, Sigma attempts to design a single computational substrate that could implement all of them. This architecture is built on probabilistic graphical models - specifically, a generalization of factor graphs - which provide a uniform representation for symbolic rules, probabilistic inference, and (more recently) neural network computations. This means that in Sigma, memory retrieval, decision-making, learning, and perception all emerge from the same underlying inference mechanism. The architecture is driven by four explicit desiderata: grand unification (spanning cognitive and non-cognitive capabilities like perception and motor control), generic cognition (applicable to both natural and artificial minds), functional elegance (broad capability from minimal mechanisms), and sufficient efficiency.
In 2017, Laird, Lebiere, and Rosenbloom proposed what they called a "Standard Model of the Mind" - now called the Common Model of Cognition. The model synthesizes structural agreements across SOAR, ACT-R, and Sigma. The Common Model specifies a hybrid of symbolic and statistical processing, distinct long-term memories for procedural, declarative, and working memory content, parallel processing within modules but a serial bottleneck for action selection, and perceptual and motor systems that interface with the cognitive core. Importantly, fMRI studies (Stocco et al, 2021) have found that this architecture outperforms alternative brain organization models when predicting patterns of neural connectivity across diverse cognitive tasks.
For our purposes, what matters here is not which architecture is "correct." What matters is what they agree on.
Each of these architectures share several properties that current LLM-based agent systems lack:
Multiple memory systems with different characteristics. All of these architectures distinguish between a small, fast working memory and larger, slower long-term stores. Items compete for attention based on activation levels that decay over time and increase with use. The context window of an LLM is not working memory; it has no activation dynamics, no principled capacity limits, no competition for attention.
Structured action selection. Rather than generating actions through unconstrained text generation, these architectures use constrained selection mechanisms: production rules in Soar and ACT-R, competing codelets in LIDA, dual implicit/explicit processing in CLARION, graphical inference in Sigma. When multiple actions are possible, explicit conflict resolution determines which fires.
Goal-directed behavior with interruption. Goals can be suspended when subgoals are needed (such as in Soar's impasse handling) or when higher-priority goals preempt current activity. This is the equivalent to preemption in operating systems terminology. Current agent loops have no mechanism for one goal to cleanly interrupt another.
Discrete cognitive cycles. Processing occurs in repeated cycles - roughly 50ms in ACT-R, 100ms in LIDA - each involving perception, memory retrieval, selection, and action in a structured sequence. In contrast to the existing model of "run until done," this approach enables systems to remain responsive to environmental changes
Attention as a limited resource. LIDA's global workspace, ACT-R's activation spreading, and CLARION's metacognitive monitoring all model attention as something that must be allocated, not an unlimited resource.
Each of these features has emerged independently across decades of research because they solve problems that adaptive agent systems must solve for: How do you stay responsive while reasoning? How do you handle conflicting goals? How do you decide what to think about when you cannot think about everything?
Currently, LLM-based agent architectures do not - and can not - address these problems. They have no schedulers, memory management, or resource constraints. They call an LLM in a loop and hope for the best.
The Three-Component Model
With this background, we can decompose agent systems into three granular components:
Affordances are the agent's interfaces to the world; its sensors and effectors. In the language of the current discourse, these are tools, APIs, and perception modules. Affordances are asynchronous (the world takes time to respond), fallible (actions can fail), and rate-limited (external systems have throughput constraints). The design of affordances determines what the agent can perceive and what actions it can take.
Knowledge is what the agent knows and remembers. This is not a single store but multiple memory systems with vastly different characteristics:
- Working Memory: Small capacity, fast access; holds the current context. Items compete for limited slots based on activation levels that decay over time and increase with use. This is NOT akin to an LLM context window - working memory has principled capacity limits, activation dynamics, and eviction policies.
- Procedural Memory: Skills and learned behaviors. In classical architectures, these are production rules (condition-action pairs). In hybrid systems like CLARION, procedural knowledge can be implicit (sub-symbolic, neural) or explicit (symbolic rules).
- Declarative memory: Facts, concepts, and events. This is often further dvided into semantic memory (general knowledge, organized as conceptual networks) and episodic memory (autobiographical events - what happened, when, and in what context).
Orchestration is how the components get coordinated. This encompasses several sub-concerns:
- Goal Management: Maintaining a structure of goals that can be pushed (when sub-goaling), popped (when achieved), and preempted (when higher-priority goals arise).
- Action selection: Determining which skills/affordances apply to the current situation and resolving conflicts when multiple actions are possible.
- Attention: Controlling what enters working memory and what gets processing time. This is the limited resource that prevents agent systems (and their underlying computational substrates) from becoming overwhelmed.
- Resource Allocation: Budgeting time, physical constraints, tool availability, etc., and ensuring that the agent system operates within its given constraints.
Current agent frameworks focus almost entirely on affordances (e.g., adding more tools, more integrations, more ways to interact with external systems). They treat knowledge as an append-only context buffer with limited search capabilities, no activation dynamics, or capacity management. They have minimal orchestration capacity beyond "ask the LLM what to do next."
This is almost like building a car by connecting the wheels directly to the engine.
Agents Are Operating Systems, Not Web Applications
Many of the problems discussed so far - decision cycles, capacity limits and eviction policies, resource constraints, parallel & concurrent execution requirements - have clear parallels in problems operating systems have been solving for 60 years.
Current agent frameworks instead follow the web application model: stateless request handlers, external state management, infrastructure-level resource control, retry-based failure recovery. This model assumes that each request is independent, short-lived, and can fail without consequence beyond returning an error code.
Agent systems violate every one of these assumptions.
Long-lived processes. An agent working on a complex task may run for minutes, hours, or days. It maintains state across many interactions.
Complex internal state. The agent's beliefs, goals, and plans form an intricate data structure that cannot be reconstructed from external storage without significant computation.
Failure requires semantic cleanup. If an agent is interrupted while modifying a document, the document may be in an inconsistent state. Simply retrying is not sufficient; the agent must understand what was partially done and either complete or roll back the operation.
Concurrent activities. An agent may be pursuing multiple goals simultaneously, or executing multiple tool calls in parallel. These activities must be scheduled, and they may compete for shared resources.
Resource constraints. Token budgets, tool rate limits, memory capacity, and time deadlines are internal constraints that the agent must manage.
The category error of treating agents as web applications leads to systems that fail in predictable ways: they exhaust token budgets without completing tasks, they lose important context to truncation, they cannot recover from interruption, and they cannot prioritize among competing demands.
Why Tokio?
Tokio is Rust's most widely used async runtime. In short, it provides:
- A work-stealing scheduler that efficiently distributes tasks across CPU cores
- Cooperative multitasking through the Future trait
- Structured concurrency primitives:
select!(first to complete wins),join!(wait for all), and task spawning - Cancellation through token-based signaling and drop semantics
- Synchronization primitives: channels, semaphores, and mutexes suitable for async contexts
- Time management: timeouts, intervals, and deadlines
| Cognitive Component | Tokio Primitive |
|---|---|
| Cognitive cycle | Async function with timeout |
| Attention (focus selection) | select! |
| Parallel execution | join! / JoinSet |
| Goal interruption | CancellationToken |
| Subgoal creation | spawn |
| Memory retrieval | Async trait method |
| Inter-module communication | mpsc / broadcast channels |
| Resource quota | Semaphore + atomic counters |
Tokio was designed for building concurrent systems that must handle interruption gracefully, which is a key trait required for complex cognitive architectures. Futures in rust are passive, which means they don't execute until .await() is called (until they are polled). This is the opposite of promises in JavaScript or tasks in C#, which begin executing immediately upon creation. The passivity of Rust futures means that the runtime controls when execution occurs.
The Cognitive Cycle (from Rust's Perspective)
A cognitive cycle proceeds through phases: perception, memory retrieval, action selection, and action execution. Each phase has a time and resource budget. The specific timing varies across architectures, but the phased structure is consistent.
Here's a sketch of what this might look like:
use std::sync::Arc;
use std::time::{Duration, Instant};
use tokio::time::timeout;
pub struct Agent {
working_memory: WorkingMemoryHandle,
long_term_memory: Arc<dyn LongTermMemory>,
action_selector: Arc<dyn ActionSelector>,
goals: GoalStack,
perception: PerceptionModule,
action: ActionModule,
}
#[async_trait]
pub trait ActionSelector: Send + Sync {
async fn select(
&self,
wm: &WMSnapshot,
goal: &Goal,
) -> Option<SelectedAction>;
}
/// The cognitive cycle abstraction. Any system that processes
/// perception → retrieval → selection → action can implement this.
#[async_trait]
pub trait Cognition: Send + Sync {
async fn cognitive_cycle(&mut self, budget: Duration) -> CycleOutcome;
}
#[async_trait]
impl Cognition for Agent {
async fn cognitive_cycle(&mut self, budget: Duration) -> CycleOutcome {
let _deadline = Instant::now() + budget;
let perception_budget = Duration::from_millis(5);
let retrieval_budget = Duration::from_millis(15);
let selection_budget = Duration::from_millis(10);
let action_budget = Duration::from_millis(20);
let goal = match self.goals.current() {
Some(g) => g.clone(),
None => return CycleOutcome::Idle,
};
// Phase 1: Perception
let percepts = tokio::select! {
biased;
_ = goal.cancelled() => return CycleOutcome::Cancelled,
result = timeout(perception_budget, self.perception.gather()) => {
result.unwrap_or_default()
}
};
self.working_memory.integrate(percepts).await;
// Phase 2: Memory Retrieval
let cue = self.build_retrieval_cue(&goal).await;
let retrieved = tokio::select! {
biased;
_ = goal.cancelled() => return CycleOutcome::Cancelled,
result = timeout(retrieval_budget,
self.long_term_memory.retrieve(&cue, 5)) => {
result.ok().flatten().unwrap_or_default()
}
};
self.working_memory.integrate_retrieved(retrieved).await;
// Phase 3: Action Selection
let wm_snapshot = self.working_memory.snapshot().await;
let selector = self.action_selector.clone();
let goal_clone = goal.clone();
let selected = tokio::select! {
biased;
_ = goal.cancelled() => return CycleOutcome::Cancelled,
result = timeout(selection_budget, async move {
selector.select(&wm_snapshot, &goal_clone).await
}) => result.ok().flatten()
};
let action = match selected {
Some(a) => a,
None => return CycleOutcome::Impasse(ImpasseType::NoApplicableAction),
};
// Phase 4: Action Execution
tokio::select! {
biased;
_ = goal.cancelled() => {
action.abort();
CycleOutcome::Cancelled
}
result = timeout(action_budget, self.action.execute(action)) => {
match result {
Ok(r) => CycleOutcome::Action(r),
Err(_) => CycleOutcome::BudgetExceeded { phase: Phase::Action },
}
}
}
}
}
Let's call out a few important pieces of this code.
Every phase can be cancelled. The biased keyword in select! causes the cancellation branch to be checked first. If the goal has been interrupted, the cycle exits immediately.
Every phase has a budget. In this example, budget is time, and the timeout function enforces time limits. If a phase exceeds its budget, the cycle continues with default or partial results.
Action selection is abstracted. The ActionSelector trait allows different implementations: production rules for SOAR-style systems, neural network inference for CLARION-style implicit processing, graphical model inference for Sigma-style architectures, or codelet competition for LIDA-style systems. The cognitive cycle structure is independent of the selection mechanism.
Notice that LLMs are not mentioned. This architecture represents the cognitive cycle, where an LLM is one possible implementation of certain functions (e.g., action proposal during impasse resolution); it's not orchestrating the system.
Cancellation-Correctness
One of the promises of Rust is that important invariants can be encoded in the type system and enforced at compile time. Most properties of Rust code can be understood through local reasoning.
Async cancellation directly violates this property. Rain (sunshowers.io) has written about this extensively; futures can be cancelled at any await point, and determining whether a cancellation causes problems requires examining not just the immediate code but the entire chain of callers. This is what Rain calls the distinction between cancel safety (a local property of individual futures) and cancel correctness (a global property of system correctness under cancellation).
In agent systems, cancellation is a normal, and frequent, operating condition:
- The user interrupts a long-running task
- A higher-priority goal preempts the current goal
- A tool call times out
- A resource budget is exhausted
- The system is shutting down
Each of these requires graceful handling, and "graceful" in an agent context means something more than releasing resources. It means semantic cleanup; restoring invariants in the agent's beliefs about the world.
Consider this example:
async fn execute_plan(plan: Plan, memory: &mut WorkingMemory) {
for step in plan.steps {
let resource = acquire_resource(&step).await;
// ← Cancellation here: resource leaked
let result = perform_work(&resource).await;
// ← Cancellation here: memory doesn't know work was done
memory.record_completion(&step, &result).await;
// ← Cancellation here: resource still held
release_resource(resource).await;
}
}
If this function is cancelled after acquire_resource but before release_resource, the resource is leaked. If it's cancelled after perform_work but before record_completion, the agents memory is inconsistent with reality. It won't know that work was performed.
In a web application, resource leaks are problematic but bounded; eventually the process restarts or the connection pool reaps idle resources. In an agent, a corrupted world model causes cascading failures. The agent makes future decisions based on incorrect beliefs.
There are a few patterns for addressing this.
Reserve then commit. Split operations so that the cancellable portion doesn't lose data, and the portion that could lose data is infallible:
let permit = resource_queue.reserve().await; // Cancellable, no side effects
permit.commit(resource); // Infallible, has side effects
Tasks for must-complete operations. Operations that must complete regardless of caller cancellation should be spawned as tasks:
let (tx, rx) = oneshot::channel();
tokio::spawn(async move {
let result = must_complete().await;
let _ = tx.send(result);
});
// Caller can be cancelled; task runs to completion
let result = rx.await;
Explicit compensation. When cancellation occurs, run cleanup logic:
tokio::select! {
result = operation.run() => result,
_ = token.cancelled() => {
operation.compensate().await; // Undo partial work
OperationResult::Cancelled
}
}
[To be continued...]
References
Anderson, J. R. (1993). Rules of the Mind. Lawrence Erlbaum Associates.
Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? Oxford University Press.
Franklin, S., Ramamurthy, U., D'Mello, S. K., McCauley, L., Negatu, A., Silva, R., & Datla, V. (2007). LIDA: A computational model of global workspace theory and developmental learning. In AAAI Fall Symposium: AI and Consciousness.
Laird, J. E. (2012). The Soar Cognitive Architecture. MIT Press.
Laird, J. E., Lebiere, C., & Rosenbloom, P. S. (2017). A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Magazine, 38(4), 13–26.
Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33(1), 1–64.
McShane, M., & Nirenburg, S. (2012). A knowledge representation language for natural language processing, simulation and reasoning. International Journal of Semantic Computing, 6(1), 3–23.
Rosenbloom, P. S., Demski, A., & Ustun, V. (2016). The Sigma cognitive architecture and system: Towards functionally elegant grand unification. Journal of Artificial General Intelligence, 7(1), 1–103.
Stocco, A., Sibert, C., Steine-Hanson, Z., Koh, N., Laird, J. E., Lebiere, C. J., & Rosenbloom, P. (2021). Analysis of the human connectome data supports the notion of a "Common Model of Cognition" for human and human-like intelligence across domains. NeuroImage, 235, 118035.
Sun, R. (2002). Duality of the Mind: A Bottom-up Approach Toward Cognition. Lawrence Erlbaum Associates.
sunshowers. (2025). Cancelling async Rust. https://sunshowers.io/posts/cancelling-async-rust/
sunshowers. (2025). In defense of lock poisoning in Rust. https://sunshowers.io/posts/on-poisoning/
Oxide Computer Company. RFD 397: Challenges with async/await in the control plane. https://rfd.shared.oxide.computer/rfd/397
Oxide Computer Company. RFD 400: Dealing with cancel safety in async Rust. https://rfd.shared.oxide.computer/rfd/400
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. In International Conference on Learning Representations (ICLR).