Agent Programs in AI — Software That Wants

What Makes a Program an Agent

Looking back from 2040, the confusion was understandable. In the early days of the LLM era, people called everything an “agent.” Chatbots were agents. Cron jobs were agents. Bash scripts with an API call were agents. The word had become marketing paste, spread thinly over every product that did anything automatically. But there was always a precise definition, if anyone cared to look.

Russell and Norvig's Artificial Intelligence: A Modern Approach (first edition, 1995) established the distinction that still holds. An agent is anything that perceives its environment through sensors and acts upon that environment through actuators. An agent function is the abstract mathematical mapping from percept sequences to actions. An agent program is the concrete implementation that runs on actual hardware and realizes that mapping.

The distinction between function and program is not pedantic. The agent function for a chess player maps every possible board state to the optimal move — a lookup table with roughly 10⁴⁷ entries. The agent program uses heuristic search, evaluation functions, and time limits to approximate that function on hardware with finite memory. The gap between the ideal function and the practical program is where all the interesting engineering lives.

So what separates an agent program from a regular program? Three things. Perception: the program reads from its environment at runtime, not just from its input arguments at invocation. Decision: the program selects among alternative actions based on what it has perceived. Continuity: the program operates over multiple perception-action cycles, not as a single pass from input to output. A script that runs curl and pipes the output to a file has no perception loop, no decision point, no continuity. It is a tool. An agent program that monitors an API, detects anomalies, decides whether to alert or auto-remediate, and adjusts its thresholds based on outcomes — that is an agent.

The PEAS Framework: Defining Agent Programs Properly

Before you write a single line of agent code, Russell and Norvig argued, you should specify four things. They called it PEAS: Performance measure, Environment, Actuators, and Sensors. This framework turned out to be prophetic. Most agent program failures in the 2024-2028 period trace back to under-specifying one of these four dimensions.

Performance Measure

How do you know the agent is succeeding? For a monitoring agent: uptime percentage and mean time to recovery. For a coding agent: test pass rate and code review approval rate. For a trading agent: risk-adjusted return. Without an explicit performance measure, you cannot evaluate agent behavior — and you certainly cannot build a utility function.

Environment

What does the agent operate in? Key environment properties: fully observable vs. partially observable, deterministic vs. stochastic, static vs. dynamic, discrete vs. continuous, single-agent vs. multi-agent. A DevOps agent operates in a partially observable, stochastic, dynamic, continuous, multi-agent environment — one of the hardest combinations. Environment classification directly determines which type of agent program you need.

Actuators

What can the agent do? API calls, shell commands, database writes, file modifications, network requests, message sends. The actuator set defines the agent's action space. A narrower action space means fewer failure modes. The safest agent programs have the smallest actuator sets that still accomplish the goal.

Sensors

What can the agent perceive? Log files, webhook events, API responses, file system watchers, metrics endpoints, message queues. Sensor design determines how much of the environment is observable. Under-sensing creates blind spots. Over-sensing creates noise. The best agent programs match their sensor resolution to their decision granularity.

Here is the thing that became clear only in retrospect: PEAS is not just a design framework. It is an infrastructure specification. The performance measure tells you what to monitor. The environment tells you what resources the agent needs. The actuators tell you what permissions to grant. The sensors tell you what data feeds to provision. Every PEAS specification maps directly to an infrastructure configuration. The teams that understood this in 2025 built agent programs that ran for months. The teams that skipped PEAS built agent programs that crashed in hours.

The Five Types of Agent Programs in AI

Russell and Norvig did not just define what an agent program is. They classified the species. Five types of agent programs in artificial intelligence, each more capable than the last, each demanding more from its infrastructure. This taxonomy survived the LLM revolution largely intact, which is remarkable for a classification invented thirty years before GPT-4.

Type 1: Simple Reflex Agents

The simplest agent program. It maps the current percept directly to an action using condition-action rules. No memory of past percepts. No model of the environment. No goals. Just: if this, then that.

Simple Reflex Agent — Python

def simple_reflex_agent(percept):
    cpu_usage = percept["cpu_percent"]
    if cpu_usage > 90:
        return "scale_up"
    elif cpu_usage < 20:
        return "scale_down"
    else:
        return "no_action"

# This is the entire agent. No state, no memory,
# no model. It sees CPU at 92%, it scales up.
# It sees CPU at 15%, it scales down.
# It does this forever.

Simple reflex agents are brittle but predictable. The CPU monitor above has exactly three behaviors. You can test all three in 30 seconds. You can predict its resource usage to the byte. It will never hallucinate, never enter an infinite loop, never consume unexpected memory. Its failure modes are finite and enumerable: the rule is wrong, the sensor is broken, or the actuator fails.

Infrastructure cost: Negligible. 10-50 MB RAM. Can run hundreds on a single $5/month VPS. The simple reflex agent is the reason people underestimate agent infrastructure costs — they think all agents are this cheap.

Type 2: Model-Based Reflex Agents

The model-based agent adds internal state. It maintains a model of the world — how the environment works, what the current state is, and how its own actions change that state. This solves the critical problem of partial observability: when you cannot see everything at once, you need memory to fill in the gaps.

Model-Based Reflex Agent — Python

class ModelBasedAgent:
    def __init__(self):
        self.cpu_history = []
        self.last_action_time = None
        self.scaling_cooldown = 300  # seconds

    def act(self, percept):
        self.cpu_history.append(percept["cpu_percent"])
        # Keep last 60 readings (5 min at 5s intervals)
        self.cpu_history = self.cpu_history[-60:]

        avg_cpu = sum(self.cpu_history) / len(self.cpu_history)
        trend = self.cpu_history[-1] - self.cpu_history[0]

        if self._in_cooldown():
            return "no_action"
        if avg_cpu > 85 and trend > 0:
            self.last_action_time = time.time()
            return "scale_up"
        elif avg_cpu < 25 and trend < 0:
            self.last_action_time = time.time()
            return "scale_down"
        return "no_action"

# Now the agent remembers. It tracks trends.
# It avoids flapping with a cooldown.
# The same CPU reading at 91% might produce
# different actions depending on history.

The model-based agent above makes better decisions than the simple reflex version. It does not scale up on a single spike — it waits for a sustained trend. It does not flap between scaling up and down because it enforces a cooldown. The tradeoff: it has state that must survive restarts.

Infrastructure cost: Low but non-trivial. 50-200 MB RAM for the internal model. Requires state persistence (disk or database) to survive crashes without losing context. A model-based agent that loses its state on restart effectively becomes a simple reflex agent — which is why crash recovery matters even at this level.

Type 3: Goal-Based Agents

This is where agent programs cross a qualitative threshold. A goal-based agent does not just react to conditions or track state — it has an explicit representation of what it is trying to achieve, and it searches for action sequences that reach that goal. The goal-based agent is where software learned to want things.

Goal-Based Agent — Python

class GoalBasedAgent:
    def __init__(self, goal):
        self.goal = goal  # e.g., {"response_time_p99_ms": 200}
        self.world_model = WorldModel()
        self.planner = ActionPlanner()

    def act(self, percept):
        self.world_model.update(percept)
        current_state = self.world_model.get_state()

        if self._goal_satisfied(current_state):
            return "no_action"

        # Search for action sequence that reaches goal
        plan = self.planner.search(
            current_state=current_state,
            goal_state=self.goal,
            available_actions=[
                "scale_up", "scale_down",
                "add_cache", "optimize_query",
                "enable_cdn", "rate_limit"
            ]
        )
        return plan[0] if plan else "no_action"

# The agent has desire. It wants p99 < 200ms.
# It considers multiple paths to get there.
# Scaling up might work. Adding cache might work.
# The planner evaluates options and picks one.

Goal-based agents introduced a problem that still haunts us in 2040: the planning search can be computationally unbounded. A simple reflex agent runs in O(1) time. A goal-based agent runs in O(who-knows) time, because the search space depends on the number of available actions, the depth of the plan, and the branching factor of the environment model. In 2025, people built goal-based LLM agents that would plan for 45 minutes, consuming $12 in API calls, before producing a 3-line shell command. The plan was correct. The cost was absurd.

Infrastructure cost: Highly variable. 200 MB to 8 GB RAM depending on the world model complexity and planner depth. CPU usage spikes during planning phases. The agent may be idle for hours, then consume 100% CPU for 10 minutes during a planning cycle. This bursty resource pattern is why goal-based agents need dedicated resources — they are terrible neighbors on shared infrastructure.

Type 4: Utility-Based Agents

A goal-based agent asks: “Does this action sequence achieve my goal?” A utility-based agent asks: “Of all the action sequences that achieve my goal, which one maximizes my expected utility?” This is not a minor upgrade. It is the difference between finding a path and finding the best path.

Utility-Based Agent — Python

class UtilityBasedAgent:
    def __init__(self):
        self.world_model = WorldModel()
        self.utility = lambda state: (
            0.4 * state["uptime_score"] +
            0.3 * (1 - state["cost_normalized"]) +
            0.2 * state["performance_score"] +
            0.1 * state["security_score"]
        )

    def act(self, percept):
        self.world_model.update(percept)
        current = self.world_model.get_state()
        best_action = None
        best_utility = self.utility(current)

        for action in self.available_actions:
            predicted = self.world_model.simulate(
                current, action
            )
            expected_utility = self.utility(predicted)
            if expected_utility > best_utility:
                best_utility = expected_utility
                best_action = action

        return best_action or "no_action"

# The agent weighs tradeoffs. Scaling up improves
# performance (+0.2) but increases cost (-0.3).
# Adding rate limiting improves security (+0.1)
# but might hurt performance (-0.2). The utility
# function resolves these tradeoffs explicitly.

Utility-based agents are where most sophisticated LLM agents actually operate, even if their creators do not call them that. When an LLM agent evaluates multiple approaches to a coding task and selects the one that balances correctness, readability, and performance — it is performing utility maximization, with the utility function embedded in the model's training rather than coded explicitly.

Infrastructure cost: Substantial. 1-16 GB RAM for the world model, simulation engine, and utility computation. Each action evaluation requires a full simulation of the environment response, which may involve LLM inference calls costing $0.01-0.50 each. A utility-based agent evaluating 20 possible actions makes 20 simulation calls per decision cycle. At one decision per minute, that is 28,800 simulation calls per day. Infrastructure must support sustained high-throughput inference.

Type 5: Learning Agents

The learning agent is the apex of the taxonomy. It has everything the other four types have — reflexes, models, goals, utility — plus a learning element that modifies the agent's own decision procedure based on experience. A learning agent does not just adapt its actions to the environment. It adapts itself.

Learning Agent Architecture

class LearningAgent:
    def __init__(self):
        self.performance_element = UtilityBasedAgent()
        self.critic = PerformanceCritic()
        self.learning_element = PolicyUpdater()
        self.problem_generator = ExplorationEngine()

    def act(self, percept):
        # 1. Performance element selects action
        action = self.performance_element.act(percept)

        # 2. Critic evaluates outcome
        feedback = self.critic.evaluate(
            percept, action, self.performance_standard
        )

        # 3. Learning element updates performance element
        if feedback.score < feedback.threshold:
            self.learning_element.update(
                self.performance_element,
                feedback
            )

        # 4. Problem generator suggests exploration
        if self.problem_generator.should_explore():
            action = self.problem_generator.explore(
                action, exploration_rate=0.1
            )

        return action

# Four components working together:
# - Performance element: the actual decision-maker
# - Critic: judges how well decisions worked
# - Learning element: modifies the decision-maker
# - Problem generator: forces exploration of unknowns

Russell and Norvig decomposed the learning agent into four components, and this decomposition turned out to be exactly right for the LLM era. Modern LLM agents that use reinforcement learning from human feedback (RLHF) map directly: the LLM is the performance element, human evaluators are the critic, the RLHF training loop is the learning element, and temperature/sampling are the problem generator. The textbook anticipated the architecture by three decades.

Infrastructure cost: The highest of all five types. 4-64 GB RAM for model weights, training buffers, and experience replay. GPU resources for online learning updates. Persistent storage for experience logs that may grow to hundreds of gigabytes. The learning agent is the most expensive agent to host, the most difficult to monitor (because its behavior changes over time), and the most dangerous to leave unsupervised (because learning can diverge). It is also the most capable by far.

The Cron Job Test: Script or Agent?

Here is a practical test I have used for fifteen years to settle the “is this an agent?” debate. Look at your program. Now ask: if I changed the environment, would the program change its behavior?

A Cron Job — Not an Agent

# /etc/cron.d/cleanup
0 3 * * * root find /tmp -mtime +7 -delete

# This runs at 3 AM every day.
# It deletes files older than 7 days from /tmp.
# If /tmp has 0 files: runs, deletes nothing.
# If /tmp has 10 million files: runs, deletes them all.
# If the disk is full: runs anyway.
# If the server is under heavy load: runs anyway.
# It does not perceive. It does not decide.
# It executes a fixed instruction at a fixed time.

An Agent Program — Doing the Same Job

class CleanupAgent:
    """Manages /tmp with awareness of system state."""
    def __init__(self):
        self.model = SystemModel()

    def run_cycle(self):
        disk = self.model.get_disk_usage("/tmp")
        load = self.model.get_system_load()
        files = self.model.get_tmp_files()

        if disk.percent < 50 and load.avg_1m > 4.0:
            # Disk is fine, system is busy. Do nothing.
            return "defer_cleanup"

        if disk.percent > 90:
            # Emergency: delete largest files first
            targets = sorted(files, key=lambda f: -f.size)
            return self._cleanup(targets, aggressive=True)

        if disk.percent > 70:
            # Normal: delete files older than 3 days
            targets = [f for f in files if f.age_days > 3]
            return self._cleanup(targets, aggressive=False)

        # Disk is fine. Archive old files instead.
        old = [f for f in files if f.age_days > 14]
        return self._archive(old)

# Same job: manage /tmp. Completely different program.
# It perceives disk usage, system load, file metadata.
# It decides between cleanup, archival, and deferral.
# Its behavior changes based on environment state.

The cron job and the agent program both manage /tmp. The cron job does it in 1 line. The agent does it in 30 lines. But the agent is adaptive — it defers when the system is busy, gets aggressive when the disk is full, and archives instead of deleting when there is room. This adaptiveness is what makes it an agent, and it is why it needs different infrastructure. The cron job needs crond. The agent needs a supervision tree.

The Infrastructure Spectrum

Here is what nobody told the first generation of agent builders, and what cost them millions in debugging time: the five types of agent programs do not just differ in capability. They differ in infrastructure requirements by orders of magnitude.

Simple Reflex Agents

10-50 MB RAM. No state persistence needed. Restart cost: zero (no state to lose). Can share resources with hundreds of siblings. Infrastructure: crond or a basic process manager. Monthly cost: <$1 per agent.

Model-Based Reflex Agents

50-500 MB RAM. State must survive restarts. Restart cost: moderate (must reload internal model from checkpoint). Needs basic process supervision. Monthly cost: $1-5 per agent.

Goal-Based Agents

200 MB - 8 GB RAM. Planning search can spike CPU to 100%. Restart cost: high (may need to re-plan from scratch). Bursty resource usage makes shared hosting dangerous. Needs health checks that detect stuck planning. Monthly cost: $5-30 per agent.

Utility-Based Agents

1-16 GB RAM. Continuous simulation and evaluation. Restart cost: high (loses utility model calibration). May make thousands of inference calls per day. Needs cost monitoring, rate limiting, and inference budget controls. Monthly cost: $15-100 per agent.

Learning Agents

4-64 GB RAM. GPU for online learning. Persistent storage growing over time. Restart cost: very high (may lose learning progress). Behavior changes unpredictably as learning progresses. Needs drift detection, rollback capability, and audit logging. Monthly cost: $50-500+ per agent.

The 500x cost ratio between the cheapest and most expensive agent types explains a persistent confusion in the market. When someone says “agent hosting is cheap,” they are thinking of simple reflex agents. When someone says “agent hosting is prohibitively expensive,” they are thinking of learning agents. Both are correct. They are just talking about different species of the same genus.

Why Agent Programs Need Infrastructure That Regular Software Does Not

A web server processes a request and forgets it. A database stores data and retrieves it. A batch job runs to completion and exits. These are well-understood software patterns with well-understood infrastructure needs. Agent programs break every one of these assumptions.

Monitoring is not optional — it is existential. A web server that returns HTTP 500 is clearly broken. An agent program that is running at full CPU, consuming increasing memory, and producing no output might be: (a) performing a deep planning search, (b) stuck in an infinite reasoning loop, or (c) experiencing a memory leak. These three states are indistinguishable from the outside without semantic health checks — checks that query the agent's internal state, not just its process liveness. Standard monitoring tools like Prometheus or Datadog track the symptoms (CPU, memory, response time). Agent monitoring must track the disease (planning depth, decision confidence, state consistency).

Health checks must be semantic, not syntactic. A health check that pings /healthz and gets 200 OK tells you the process is alive. It does not tell you the agent is functioning. An agent stuck in an infinite loop will pass liveness checks indefinitely while consuming resources and producing nothing. Agent health checks must verify: (1) the perception-action loop is cycling at the expected rate, (2) internal state is consistent, (3) the agent has produced meaningful output within its expected latency window, and (4) resource consumption is within expected bounds for the current workload. This is why standard container orchestration health checks are insufficient for agent programs.

Crash recovery is state recovery. When a web server crashes, you restart it. It picks up new requests as if nothing happened. When an agent program crashes, you lose its internal model, its planning state, its accumulated context. A model-based agent that has been running for 6 hours has an internal state that represents 6 hours of observation. If that state is not checkpointed, a crash sends the agent back to minute zero. For goal-based agents mid-plan, the cost is worse: hours of planning computation, potentially hundreds of dollars in inference calls, all lost. Crash recovery for agent programs means checkpointing internal state at regular intervals so the agent can resume from where it left off, not from the beginning.

Resource isolation prevents cascade failures. In 2025, a single runaway goal-based agent on a shared server consumed 14 GB of RAM during an unusually deep planning search, triggering the Linux OOM killer. The OOM killer did not just kill that agent — it killed 23 other agents on the same server. The incident took down a production deployment for 47 minutes. This failure mode does not exist with traditional software because traditional software has predictable resource consumption. Agent programs, especially types 3-5, do not. Resource isolation — dedicated cgroups, memory limits, CPU quotas — is not a luxury for agent programs. It is a requirement.

Supervision Trees: The Erlang Pattern That Agents Needed

In 1986, Joe Armstrong at Ericsson faced a problem that would become eerily relevant forty years later: how do you keep thousands of concurrent processes running reliably when any one of them can crash at any time for unpredictable reasons? His answer was the supervision tree, and it turned out to be the ideal architecture for agent programs.

The concept is simple. Every process has a supervisor. When the process crashes, the supervisor decides what to do: restart it (one-for-one strategy), restart all its siblings (one-for-all), or escalate to a higher-level supervisor. Supervisors form a tree, with the root supervisor at the top. If even the root supervisor fails, the entire application restarts from a known good state.

For agent programs, the supervision tree maps naturally to the agent taxonomy. Simple reflex agents get a basic one-for-one supervisor with immediate restart — they have no state to preserve, so restart is trivial. Model-based agents get a supervisor that checkpoints state before restart and restores it after. Goal-based agents get a supervisor that can detect stuck planning and apply a timeout-based restart with plan resumption. Utility-based agents get a supervisor that monitors inference costs and can throttle or pause the agent when budgets are exceeded. Learning agents get the most sophisticated supervision: drift detection, learning rate monitoring, and the ability to roll back to a previous version of the learned policy if performance degrades.

Supervision Tree — NixOS Systemd Units

# /etc/nixos/agents/monitoring-agent.nix
{
  systemd.services.monitoring-agent = {
    description = "Production monitoring agent (goal-based)";
    after = [ "network.target" ];
    wantedBy = [ "multi-user.target" ];

    serviceConfig = {
      ExecStart = "/opt/agents/monitor/run";
      Restart = "always";
      RestartSec = 3;
      WatchdogSec = 30;        # Must ping within 30s
      MemoryMax = "4G";        # Hard memory ceiling
      CPUQuota = "200%";       # Max 2 cores
      StateDirectory = "monitoring-agent";
      # Checkpoint directory survives restarts
      ExecStartPre = "/opt/agents/monitor/restore-state";
      ExecStopPost = "/opt/agents/monitor/save-state";
    };

    # NixOS: atomic rollback if health checks fail
    # after deployment
    unitConfig.OnFailure = "agent-rollback@monitor.service";
  };
}

The NixOS configuration above implements a supervision tree for a goal-based agent. The systemd watchdog ensures the agent is responding (not just alive). The memory limit prevents runaway planning from consuming all RAM. The state directory and pre/post hooks implement crash recovery. And the OnFailure directive triggers NixOS atomic rollback if the agent fails after a deployment. This is what agent infrastructure looks like when it takes the taxonomy seriously.

How osModa Treats Every Agent as a First-Class Process

osModa was built on a premise that now seems obvious but was radical in 2025: every agent program, regardless of type, deserves its own supervision tree. Not a container. Not a slot in a shared process manager. A genuine supervision tree with resource isolation, semantic health checks, state persistence, and crash recovery tailored to the agent's type.

Self-healing watchdog: Every agent gets a watchdog that goes beyond process liveness. The watchdog queries the agent's internal health endpoint every 3-6 seconds, checking perception-action cycle rate, internal state consistency, output production rate, and resource consumption trends. An agent that passes liveness checks but fails semantic health checks gets a structured restart: state checkpoint, graceful shutdown, state restore, cold start. Recovery takes under 6 seconds for most agent types.

NixOS atomic rollback: When a deployment changes the agent's code, model, or configuration, osModa runs the new version under observation. If health checks degrade within the observation window (configurable, default 5 minutes), SafeSwitch reverts the entire system to the previous NixOS generation. This is not container-level rollback — it rolls back the server state, the agent code, the dependencies, and the configuration as a single atomic operation. No partial states. No dependency mismatches.

SHA-256 audit ledger: Every action an agent takes is recorded in a tamper-evident hash chain. For learning agents, this is critical: when the agent's behavior changes over time, you need a complete record of what it did, when, and what the outcome was. The audit ledger enables post-hoc analysis of agent drift and provides the evidence trail that compliance frameworks require.

Dedicated resources with type-aware scaling: osModa provisions resources based on the agent type. A simple reflex agent gets minimal allocation with fast restart. A learning agent gets reserved GPU time, persistent storage that grows with the experience buffer, and a supervisor configured for the longer restart cycles that stateful agents require. Plans start at $29/month on dedicated Hetzner servers, which means no noisy neighbors, no OOM kills from other tenants, and no shared-CPU throttling during planning spikes.

The Software That Learned to Want Things

There is a philosophical thread running through the taxonomy that Russell and Norvig did not emphasize, but that becomes unmissable from the vantage point of 2040. Each type of agent program adds something that looks increasingly like desire.

The simple reflex agent has no inner life. It is a thermostat — pure mechanism. The model-based agent has memory, which gives it something resembling belief about the world. The goal-based agent has something resembling desire — an explicit representation of a state it wants to reach. The utility-based agent has something resembling preference — it does not just want to reach a state, it wants to reach the best state. And the learning agent has something resembling growth — it changes what it wants based on what it experiences.

I want to be careful here. I am not claiming these programs are conscious or that they genuinely experience desire. I am pointing out that the architectural features we add to make agent programs more capable are, structurally, the same features that philosophers have traditionally associated with mental life: belief (internal models), desire (goal representations), preference (utility functions), and learning (experience-driven self-modification). Whether this structural similarity is superficial or deep remains one of the genuinely open questions of our field.

What I can say with confidence is that the infrastructure implications are real regardless of the philosophical answer. A program that has goals, preferences, and the ability to modify its own behavior is a fundamentally different operational challenge than a program that executes fixed instructions. Whether or not the agent “wants” things in any meaningful sense, it behaves as if it does, and the infrastructure must be built for that behavior. That is the practical legacy of the agent program taxonomy: a classification of software by how much it acts like it has a mind, and a corresponding classification of the infrastructure required to keep that mind running.

Frequently Asked Questions

What is an agent program in AI?

An agent program is the concrete implementation of an agent function — the actual code that maps percept sequences to actions. Russell and Norvig defined this in their 1995 textbook: the agent function is the abstract mathematical mapping, while the agent program is the software that runs on physical hardware and realizes that mapping. In practice, an agent program is any running process that perceives its environment through sensors (APIs, file systems, network sockets), selects actions through some decision procedure (rules, models, utility functions), and executes those actions through actuators (API calls, file writes, shell commands). The key distinction from a regular program is the perception-action loop: agent programs continuously sense and respond rather than executing a fixed sequence.

What are the five types of agent programs in AI?

The canonical taxonomy from Russell and Norvig identifies five types: simple reflex agents (act on current percept only, using condition-action rules), model-based reflex agents (maintain internal state to handle partial observability), goal-based agents (use goal representations to choose actions that achieve desired states), utility-based agents (use a utility function to choose among multiple goal-satisfying options based on expected value), and learning agents (improve their own performance element over time through experience). Each type builds on the previous — a learning agent incorporates utility, goals, models, and reflexes. In production systems, most modern LLM agents are goal-based or utility-based with learning components.

What is the difference between an agent program and a regular script?

A regular script executes a predetermined sequence of instructions and terminates. It has no perception loop, no internal model of its environment, and no capacity to select among alternative actions based on environmental state. An agent program, by contrast, runs continuously (or is invoked repeatedly), perceives its environment at each cycle, and selects actions based on what it perceives. A cron job that runs 'rm /tmp/*.log' every midnight is a script — it does the same thing regardless of whether there are 0 or 10,000 files. An agent program that monitors disk usage, decides whether to clean logs or archive them based on available space, and escalates to an operator when it encounters files it cannot classify — that is an agent.

What is the PEAS framework for agent programs?

PEAS stands for Performance measure, Environment, Actuators, and Sensors — the four components you must specify to fully define an agent program. The performance measure defines success (e.g., uptime percentage, task completion rate, cost per action). The environment specifies what the agent operates in (fully or partially observable, deterministic or stochastic, static or dynamic). Actuators are the agent's output channels (API calls, database writes, system commands). Sensors are the input channels (log files, webhooks, API responses, file system watchers). PEAS forces you to think about the operational context before writing code — a practice that prevents the common failure mode of building an agent that works in testing but fails in production because the environment was under-specified.

Why do agent programs need different infrastructure than regular software?

Agent programs are long-running, stateful, and non-deterministic — three properties that regular software either lacks or deliberately avoids. A web server processes requests statelessly and returns deterministic responses. An agent program accumulates state over hours or days, makes decisions that depend on that accumulated state, and can enter failure modes that are impossible to predict from the code alone. This combination demands process supervision (to restart crashed agents), health checking (to detect agents that are running but stuck), state persistence (to recover context after crashes), and resource isolation (to prevent one runaway agent from affecting others). Standard deployment tools like Docker and systemd provide some of these, but agent-specific infrastructure must add semantic health checks and intelligent recovery strategies.

What is a supervision tree for agent programs?

A supervision tree is an architectural pattern from Erlang/OTP where every process is monitored by a supervisor process, which itself is monitored by a higher-level supervisor. When a process crashes, its supervisor decides the recovery strategy: restart just that process, restart all sibling processes, or escalate to the next level. This pattern is ideal for agent programs because agents crash in diverse ways — a simple reflex agent might segfault from bad input, while a goal-based agent might enter an infinite planning loop. The supervision tree provides structured recovery for each failure mode. osModa implements this pattern at the OS level using NixOS systemd units, where each agent gets its own supervisor with configurable restart policies, health check intervals, and escalation thresholds.

Can simple reflex agents be useful in production?

Absolutely, and they are still the workhorse of most production systems. A simple reflex agent that monitors CPU temperature and triggers cooling when it exceeds 80C is more reliable than a goal-based agent that tries to optimize thermal efficiency. The key advantage is predictability — simple reflex agents have exactly one failure mode (the condition-action rule is wrong), while goal-based agents have open-ended failure modes (the goal representation is wrong, the search is intractable, the model is stale). In 2026, approximately 70% of production agent deployments are simple reflex or model-based reflex agents, according to infrastructure surveys. The remaining 30% — the goal-based and learning agents — consume 85% of the infrastructure budget.

How does osModa handle agent program deployment and monitoring?

osModa treats every agent program as a first-class process with its own NixOS-managed supervision tree. Each agent gets dedicated resources (no shared tenancy), a systemd-level supervisor that monitors process health every 3 seconds, automatic crash recovery with configurable restart policies, SHA-256 audit logging of all agent actions, and NixOS atomic rollback if a deployment introduces regressions. The platform supports all five types of agent programs — from simple reflex agents running as lightweight daemons to learning agents requiring GPU resources and persistent state stores. Plans start at $29/month on dedicated Hetzner servers, with the supervision infrastructure included at every tier.

What Makes a Program an Agent

The PEAS Framework: Defining Agent Programs Properly

Performance Measure

Environment

Actuators

Sensors

The Five Types of Agent Programs in AI

Type 1: Simple Reflex Agents

Simple Reflex Agent — Python

def simple_reflex_agent(percept):
    cpu_usage = percept["cpu_percent"]
    if cpu_usage > 90:
        return "scale_up"
    elif cpu_usage < 20:
        return "scale_down"
    else:
        return "no_action"

# This is the entire agent. No state, no memory,
# no model. It sees CPU at 92%, it scales up.
# It sees CPU at 15%, it scales down.
# It does this forever.

Type 2: Model-Based Reflex Agents

Model-Based Reflex Agent — Python

class ModelBasedAgent:
    def __init__(self):
        self.cpu_history = []
        self.last_action_time = None
        self.scaling_cooldown = 300  # seconds

    def act(self, percept):
        self.cpu_history.append(percept["cpu_percent"])
        # Keep last 60 readings (5 min at 5s intervals)
        self.cpu_history = self.cpu_history[-60:]

        avg_cpu = sum(self.cpu_history) / len(self.cpu_history)
        trend = self.cpu_history[-1] - self.cpu_history[0]

        if self._in_cooldown():
            return "no_action"
        if avg_cpu > 85 and trend > 0:
            self.last_action_time = time.time()
            return "scale_up"
        elif avg_cpu < 25 and trend < 0:
            self.last_action_time = time.time()
            return "scale_down"
        return "no_action"

# Now the agent remembers. It tracks trends.
# It avoids flapping with a cooldown.
# The same CPU reading at 91% might produce
# different actions depending on history.

Type 3: Goal-Based Agents

Goal-Based Agent — Python

class GoalBasedAgent:
    def __init__(self, goal):
        self.goal = goal  # e.g., {"response_time_p99_ms": 200}
        self.world_model = WorldModel()
        self.planner = ActionPlanner()

    def act(self, percept):
        self.world_model.update(percept)
        current_state = self.world_model.get_state()

        if self._goal_satisfied(current_state):
            return "no_action"

        # Search for action sequence that reaches goal
        plan = self.planner.search(
            current_state=current_state,
            goal_state=self.goal,
            available_actions=[
                "scale_up", "scale_down",
                "add_cache", "optimize_query",
                "enable_cdn", "rate_limit"
            ]
        )
        return plan[0] if plan else "no_action"

# The agent has desire. It wants p99 < 200ms.
# It considers multiple paths to get there.
# Scaling up might work. Adding cache might work.
# The planner evaluates options and picks one.

Type 4: Utility-Based Agents

Utility-Based Agent — Python

class UtilityBasedAgent:
    def __init__(self):
        self.world_model = WorldModel()
        self.utility = lambda state: (
            0.4 * state["uptime_score"] +
            0.3 * (1 - state["cost_normalized"]) +
            0.2 * state["performance_score"] +
            0.1 * state["security_score"]
        )

    def act(self, percept):
        self.world_model.update(percept)
        current = self.world_model.get_state()
        best_action = None
        best_utility = self.utility(current)

        for action in self.available_actions:
            predicted = self.world_model.simulate(
                current, action
            )
            expected_utility = self.utility(predicted)
            if expected_utility > best_utility:
                best_utility = expected_utility
                best_action = action

        return best_action or "no_action"

# The agent weighs tradeoffs. Scaling up improves
# performance (+0.2) but increases cost (-0.3).
# Adding rate limiting improves security (+0.1)
# but might hurt performance (-0.2). The utility
# function resolves these tradeoffs explicitly.

Type 5: Learning Agents

Learning Agent Architecture

class LearningAgent:
    def __init__(self):
        self.performance_element = UtilityBasedAgent()
        self.critic = PerformanceCritic()
        self.learning_element = PolicyUpdater()
        self.problem_generator = ExplorationEngine()

    def act(self, percept):
        # 1. Performance element selects action
        action = self.performance_element.act(percept)

        # 2. Critic evaluates outcome
        feedback = self.critic.evaluate(
            percept, action, self.performance_standard
        )

        # 3. Learning element updates performance element
        if feedback.score < feedback.threshold:
            self.learning_element.update(
                self.performance_element,
                feedback
            )

        # 4. Problem generator suggests exploration
        if self.problem_generator.should_explore():
            action = self.problem_generator.explore(
                action, exploration_rate=0.1
            )

        return action

# Four components working together:
# - Performance element: the actual decision-maker
# - Critic: judges how well decisions worked
# - Learning element: modifies the decision-maker
# - Problem generator: forces exploration of unknowns

The Cron Job Test: Script or Agent?

A Cron Job — Not an Agent

# /etc/cron.d/cleanup
0 3 * * * root find /tmp -mtime +7 -delete

# This runs at 3 AM every day.
# It deletes files older than 7 days from /tmp.
# If /tmp has 0 files: runs, deletes nothing.
# If /tmp has 10 million files: runs, deletes them all.
# If the disk is full: runs anyway.
# If the server is under heavy load: runs anyway.
# It does not perceive. It does not decide.
# It executes a fixed instruction at a fixed time.

An Agent Program — Doing the Same Job

class CleanupAgent:
    """Manages /tmp with awareness of system state."""
    def __init__(self):
        self.model = SystemModel()

    def run_cycle(self):
        disk = self.model.get_disk_usage("/tmp")
        load = self.model.get_system_load()
        files = self.model.get_tmp_files()

        if disk.percent < 50 and load.avg_1m > 4.0:
            # Disk is fine, system is busy. Do nothing.
            return "defer_cleanup"

        if disk.percent > 90:
            # Emergency: delete largest files first
            targets = sorted(files, key=lambda f: -f.size)
            return self._cleanup(targets, aggressive=True)

        if disk.percent > 70:
            # Normal: delete files older than 3 days
            targets = [f for f in files if f.age_days > 3]
            return self._cleanup(targets, aggressive=False)

        # Disk is fine. Archive old files instead.
        old = [f for f in files if f.age_days > 14]
        return self._archive(old)

# Same job: manage /tmp. Completely different program.
# It perceives disk usage, system load, file metadata.
# It decides between cleanup, archival, and deferral.
# Its behavior changes based on environment state.

The Infrastructure Spectrum

Simple Reflex Agents

Model-Based Reflex Agents

50-500 MB RAM. State must survive restarts. Restart cost: moderate (must reload internal model from checkpoint). Needs basic process supervision. Monthly cost: $1-5 per agent.

Goal-Based Agents

Utility-Based Agents

Learning Agents

Why Agent Programs Need Infrastructure That Regular Software Does Not

Supervision Trees: The Erlang Pattern That Agents Needed

Supervision Tree — NixOS Systemd Units

# /etc/nixos/agents/monitoring-agent.nix
{
  systemd.services.monitoring-agent = {
    description = "Production monitoring agent (goal-based)";
    after = [ "network.target" ];
    wantedBy = [ "multi-user.target" ];

    serviceConfig = {
      ExecStart = "/opt/agents/monitor/run";
      Restart = "always";
      RestartSec = 3;
      WatchdogSec = 30;        # Must ping within 30s
      MemoryMax = "4G";        # Hard memory ceiling
      CPUQuota = "200%";       # Max 2 cores
      StateDirectory = "monitoring-agent";
      # Checkpoint directory survives restarts
      ExecStartPre = "/opt/agents/monitor/restore-state";
      ExecStopPost = "/opt/agents/monitor/save-state";
    };

    # NixOS: atomic rollback if health checks fail
    # after deployment
    unitConfig.OnFailure = "agent-rollback@monitor.service";
  };
}

Agent Programs: The Software That Learned to Want Things

What Makes a Program an Agent

The PEAS Framework: Defining Agent Programs Properly

The Five Types of Agent Programs in AI

Type 1: Simple Reflex Agents

Type 2: Model-Based Reflex Agents

Type 3: Goal-Based Agents

Type 4: Utility-Based Agents

Type 5: Learning Agents

The Cron Job Test: Script or Agent?

The Infrastructure Spectrum

Why Agent Programs Need Infrastructure That Regular Software Does Not

Supervision Trees: The Erlang Pattern That Agents Needed

How osModa Treats Every Agent as a First-Class Process

The Software That Learned to Want Things

Frequently Asked Questions

What is an agent program in AI?

What are the five types of agent programs in AI?

What is the difference between an agent program and a regular script?

What is the PEAS framework for agent programs?

Why do agent programs need different infrastructure than regular software?

What is a supervision tree for agent programs?

Can simple reflex agents be useful in production?

How does osModa handle agent program deployment and monitoring?

Infrastructure for Every Type of Agent Program

Agent Programs: The Software That Learned to Want Things

What Makes a Program an Agent

The PEAS Framework: Defining Agent Programs Properly

The Five Types of Agent Programs in AI

Type 1: Simple Reflex Agents

Type 2: Model-Based Reflex Agents

Type 3: Goal-Based Agents

Type 4: Utility-Based Agents

Type 5: Learning Agents

The Cron Job Test: Script or Agent?

The Infrastructure Spectrum

Why Agent Programs Need Infrastructure That Regular Software Does Not

Supervision Trees: The Erlang Pattern That Agents Needed

How osModa Treats Every Agent as a First-Class Process

The Software That Learned to Want Things

Frequently Asked Questions

What is an agent program in AI?

What are the five types of agent programs in AI?

What is the difference between an agent program and a regular script?

What is the PEAS framework for agent programs?

Why do agent programs need different infrastructure than regular software?

What is a supervision tree for agent programs?

Can simple reflex agents be useful in production?

How does osModa handle agent program deployment and monitoring?

Infrastructure for Every Type of Agent Program