Most “build an AI agent” tutorials hand you a ChatGPT wrapper and call it done. That is not an agent. That is a function call with a personality. An agent is a system — one that perceives, reasons, acts, and persists across time. Building one that actually works in production requires thinking at the systems level: what architecture holds the reasoning loop, what framework manages the complexity, and what infrastructure keeps the whole thing alive when you are not watching.
This guide covers the full stack. We start with what an agent actually is, move through the framework landscape as it exists in early 2026, address the infrastructure question that most tutorials ignore entirely, and end with a practical deployment walkthrough. If you have shipped a web application before, you have the skills. The concepts are different, but the engineering discipline is the same.
What an AI Agent Actually Is
Strip away the marketing language and an AI agent is a software system that implements a continuous loop with three stages:
1. Perception
The agent reads its environment. This might mean polling an inbox, listening on a WebSocket, watching a file system, querying a database, or receiving webhook events. Perception is the sensor layer — without it, the agent is blind.
2. Reasoning
The agent processes what it perceived through an LLM (or chain of LLMs) to decide what to do. This is where prompting, context management, and tool selection happen. The reasoning stage transforms observations into action plans. In graph-based frameworks like LangGraph, this stage can involve branching, loops, and conditional routing.
3. Action
The agent executes its decision: calls an API, writes a file, sends a message, modifies a database, or triggers another agent. Action is where the agent affects the world. The results of the action become new perceptions, and the loop continues.
A chatbot implements a degenerate version of this loop: perception is the user message, reasoning is a single LLM call, action is the response. What makes an agent an agent is autonomy — the ability to continue the loop without waiting for human input at every step. The agent decides when to act, what tools to use, and when it has finished.
This autonomy is what creates the infrastructure challenge. A chatbot can run in a serverless function because it responds and stops. An agent needs a persistent environment because it runs, watches, decides, and acts across unbounded time horizons.
The Minimal Agent: 40 Lines of Python
Before reaching for a framework, understand the primitive. Here is a minimal agent that monitors a directory for new files and processes them:
import os, time, json
from openai import OpenAI
client = OpenAI()
WATCH_DIR = "/data/inbox"
DONE_DIR = "/data/processed"
tools = [{
"type": "function",
"function": {
"name": "save_summary",
"parameters": {
"type": "object",
"properties": {
"filename": {"type": "string"},
"summary": {"type": "string"}
}
}
}
}]
def process(filepath):
content = open(filepath).read()
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":f"Summarize:\n{content}"}],
tools=tools
)
# Execute tool calls from the response
for call in resp.choices[0].message.tool_calls or []:
args = json.loads(call.function.arguments)
with open(f"{DONE_DIR}/{args['filename']}", "w") as f:
f.write(args["summary"])
# The perception-action loop
while True:
for fname in os.listdir(WATCH_DIR):
process(os.path.join(WATCH_DIR, fname))
os.rename(f"{WATCH_DIR}/{fname}", f"{DONE_DIR}/{fname}")
time.sleep(5)That is a complete agent. It perceives (watches a directory), reasons (calls an LLM with tool definitions), and acts (writes summaries and moves files). It loops indefinitely. Everything that a framework gives you — memory, multi-agent coordination, error recovery — is an answer to limitations you will hit as you scale beyond this pattern.
The 2026 Framework Landscape
The agentic framework ecosystem has consolidated significantly. In 2024, there were dozens of half-baked options. By early 2026, a few have emerged as genuine production tools, each with a distinct philosophy. Here is how they compare.
LangGraph — The Directed Graph Engine
LangGraph models agent workflows as directed graphs with explicit nodes, edges, and state objects. You define each step as a node (a function), connect them with edges (including conditional edges that branch based on state), and let the graph runtime handle execution. This gives you full visibility into what the agent is doing and why.
The trade-off is verbosity. A LangGraph agent requires significantly more code than CrewAI for the same task. But when you need branching logic, error recovery paths, or human-in-the-loop approval steps, the graph model makes these patterns explicit rather than hidden inside framework magic. LangGraph integrates with LangSmith for tracing and observability — critical for debugging agents that make opaque multi-step decisions.
CrewAI — The Role-Based Orchestrator
CrewAI takes the opposite approach. Where LangGraph hands you building blocks, CrewAI hands you a pre-assembled team. You define agents with roles (“Researcher,” “Writer,” “Editor”), assign them tasks, and the framework handles delegation and coordination. With over 44,000 GitHub stars, it is the most popular high-level agent framework.
CrewAI excels at multi-agent workflows where agents have distinct responsibilities. It supports memory, tool integration, and hierarchical task delegation out of the box. The cost is flexibility — when your workflow does not fit the crew-and-task model, you fight the framework instead of working with it.
AG2 (Formerly AutoGen) — The Research Workhorse
Microsoft's AutoGen, rebranded as AG2, specializes in multi-agent conversation and asynchronous task execution. Its defining feature is built-in human-in-the-loop oversight — agents can pause, request human approval, and resume. This makes it well-suited for enterprise workflows where full autonomy is not yet trusted.
AG2's v0.4 release adopted graph-based execution patterns, converging with the direction LangGraph pioneered. It supports nested agent groups, code execution sandboxes, and Azure-native monitoring. The ecosystem is smaller than LangGraph or CrewAI, but the Microsoft backing provides enterprise legitimacy.
OpenAI Agents SDK — The First-Party Option
OpenAI released their Agents SDK in early 2025 as a lightweight alternative to community frameworks. It provides a minimal API for defining agents with tools, handoffs between agents, and guardrails. If you are already committed to OpenAI models, this removes a dependency layer.
The limitation is model lock-in. The SDK works with OpenAI models. If you need to swap in Claude or Gemini for specific tasks (common in production for cost optimization), you need a framework that abstracts the model layer.
Emerging: Google ADK and Pydantic AI
Google's Agent Development Kit (ADK) entered the field in late 2025, offering Gemini-native agent development with Vertex AI integration. Pydantic AI, from the creators of Pydantic, brings type-safe agent development with structured outputs — a strong fit for teams that value correctness guarantees. Both are early but worth watching if their ecosystems align with your stack.
Framework Comparison
| Feature | LangGraph | CrewAI | AG2 | OpenAI SDK |
|---|---|---|---|---|
| Abstraction level | Low (graphs) | High (roles) | Medium | Low (minimal) |
| Multi-agent | Yes | Native | Native | Via handoffs |
| Model flexibility | Any LLM | Any LLM | Any LLM | OpenAI only |
| Human-in-the-loop | Yes | Limited | Native | Via guardrails |
| MCP support | Yes | Yes | Yes | Partial |
| Observability | LangSmith | Enterprise tier | Azure native | OpenAI dashboard |
| Learning curve | Steep | Gentle | Moderate | Minimal |
MCP: The Emerging Standard for Agent Tools
One of the most consequential developments in the agent ecosystem is the Model Context Protocol (MCP), introduced by Anthropic in November 2024. MCP provides a standardized interface for how agents connect to external tools and data sources — reading files, executing functions, managing context.
Before MCP, every framework implemented tool integration differently. LangGraph had its own tool schema, CrewAI had another, and switching frameworks meant rewriting all your tool integrations. MCP is converging the ecosystem toward a universal protocol — similar to how LSP standardized code editor integrations. By early 2026, every major framework supports MCP to some degree. When choosing a framework, MCP compatibility should be a factor. See our MCP vs A2A protocol guide for a deep comparison.
The Infrastructure Question Nobody Answers
Every framework tutorial ends with the agent running in a Jupyter notebook or a terminal session. Then what? You close your laptop and the agent dies. This is the gap between “I built an agent” and “I deployed an agent.”
AI agents have infrastructure requirements that distinguish them from traditional web services:
Persistent Runtime
Agents run indefinitely. Serverless functions time out after 5–15 minutes. You need a VM, a dedicated server, or a platform that provides always-on compute.
State Persistence
Agents accumulate context: conversation history, task state, learned preferences. This state must survive crashes and restarts. You need persistent storage — a file system or database that is not ephemeral.
Process Supervision
Agents crash. LLM API calls fail. Generated code throws exceptions. Memory leaks accumulate. You need a supervisor (systemd, Docker, or a watchdog) that detects failures and restarts the agent automatically.
Isolation
Agents that execute generated code or use tools that modify the file system need isolation. You do not want an agent that rm -rf's your production database because the LLM hallucinated a cleanup command.
Gartner predicts that over 40% of agentic AI projects will fail by 2027 because legacy systems cannot support modern AI execution demands. The framework is 20% of the problem. Infrastructure is the other 80%. For a deeper analysis, see our guide on AI agents and the types of AI agents that each infrastructure model supports.
Deploying Your Agent on osModa
osModa was built specifically for the infrastructure requirements that AI agents impose. Instead of cobbling together a VPS, systemd units, monitoring, and log rotation, you get a dedicated NixOS server with all of it pre-configured.
Step 1: Push your agent code. Connect your Git repository or upload your agent directly. osModa supports Python, Node.js, Go, Rust — anything that runs on Linux.
Step 2: Define your environment. Declare dependencies in a Nix flake or use the platform's auto-detection. NixOS guarantees reproducible environments — your agent runs identically every time, on every deployment.
Step 3: Deploy. The platform provisions a dedicated Hetzner server, configures 9 Rust daemons for process management, enables the watchdog with health checks, and starts your agent. Sub-6-second crash recovery is automatic.
Step 4: Monitor. The SHA-256 audit ledger logs every agent action. The watchdog performs external health checks independent of the agent process. If a deployment causes failures, SafeSwitch triggers a NixOS atomic rollback to the last working state.
Plans start at $14.99/month on dedicated hardware — not shared VMs, not serverless containers. See the frameworks integration guide for specific setup instructions for LangGraph, CrewAI, and AG2 on osModa, or the deployment quickstart to get running in minutes. If you need your agent running around the clock, see our guide on running AI agents 24/7.
Five Mistakes That Kill Agent Projects
After watching hundreds of agent deployments, these patterns recur:
1. Over-engineering the reasoning layer
Teams build elaborate multi-agent systems when a single agent with well-designed tools would suffice. Start with one agent. Add agents only when you hit a concrete limitation that multi-agent coordination solves.
2. Ignoring infrastructure until production
The agent works in your notebook. Then you deploy and discover it crashes every 4 hours, fills the disk with logs, and costs 10x what you budgeted in API calls. Test in production-like environments early.
3. No error handling for LLM failures
LLM APIs fail. They rate-limit. They return malformed JSON. They hallucinate tool calls that do not exist. Every LLM call needs retry logic, output validation, and a fallback path.
4. Unbounded context growth
Agents that append every message to context without trimming eventually exceed the model's context window or consume excessive memory. Implement a sliding window or summarization strategy from the start.
5. No audit trail
When an agent makes a bad decision at 3 AM, you need to know what it perceived, what it reasoned, and what it did. Without structured logging of each loop iteration, debugging is guesswork.
How to Choose Your Stack
The decision tree is simpler than the framework landscape suggests:
If your agent does one thing (monitors a feed, processes documents, answers questions from a knowledge base), start with raw Python and the LLM SDK of your choice. No framework needed.
If your agent coordinates multiple capabilities with clear role separation (researcher + writer + reviewer), use CrewAI. Its mental model maps directly to team-based workflows.
If your agent has complex branching logic with conditional paths, loops, and error recovery routes, use LangGraph. The graph model makes these patterns explicit and debuggable.
If you need enterprise approval workflows with human-in-the-loop at specific steps, consider AG2 for its native support.
For infrastructure, the question is whether you want to manage it yourself (VPS + systemd + monitoring stack) or use a purpose-built platform. osModa exists because we believe agent infrastructure should be invisible — you think about the agent, not the server. Explore the frameworks page for integration details.
Deploy Your Agent on osModa
Dedicated NixOS servers with self-healing watchdog, SHA-256 audit logging, and sub-6-second crash recovery. From $14.99/month on Hetzner hardware.
Launch on spawn.os.modaFrequently Asked Questions
What is the fastest way to create an AI agent in 2026?
The fastest path is CrewAI or OpenAI's Agents SDK. CrewAI lets you define agent roles and tasks in under 50 lines of Python, with built-in tool integration and memory. OpenAI's Agents SDK offers a minimal API for single-agent workflows. Both get you from zero to working agent in under an hour. However, speed of creation and speed of production deployment are different problems — the framework that launches fastest may not be the one that survives contact with real users.
Do I need to know machine learning to build an AI agent?
No. Modern AI agents use pre-trained LLMs (GPT-4o, Claude, Gemini) via API calls. You do not train models — you orchestrate them. The core skills you need are software engineering: API integration, state management, error handling, and systems design. Understanding prompt engineering helps, but the real challenge is building reliable infrastructure around unreliable model outputs.
What is the difference between LangGraph and CrewAI?
LangGraph gives you low-level control by modeling agent workflows as directed graphs with explicit nodes, edges, and state transitions. You build everything yourself but can express any workflow topology. CrewAI gives you high-level abstractions: define agent roles, assign tasks, and let the framework handle orchestration. LangGraph suits complex, branching workflows. CrewAI suits multi-agent teams with clear role separation. LangGraph has a steeper learning curve but more flexibility; CrewAI is faster to start but harder to customize beyond its patterns.
Why do AI agents need dedicated infrastructure?
AI agents maintain long-running state, make expensive API calls, execute generated code, and need to be reachable 24/7. Serverless functions (Lambda, Cloud Functions) have execution time limits (typically 15 minutes), no persistent file systems, cold start latency, and no way to maintain WebSocket connections or background processes. A dedicated server or VM gives your agent a persistent environment where it can maintain state, run indefinitely, and be monitored by external watchdogs.
How much does it cost to run an AI agent in production?
Infrastructure costs range from $5-15/month for a basic VPS to $50-200/month for dedicated servers with monitoring and self-healing. But infrastructure is typically only 20-40% of total cost. LLM API calls dominate: a moderately active agent using GPT-4o can generate $100-500/month in API costs. The total cost depends on call volume, model choice, and whether you can use smaller models for simpler tasks. osModa's plans start at $14.99/month for dedicated Hetzner infrastructure with built-in watchdog and audit logging.
Can I create an AI agent without a framework?
Yes. A minimal agent is just a while loop: perceive (read input), reason (call an LLM), act (execute tool calls), repeat. You can build this in raw Python with the OpenAI or Anthropic SDK in about 100 lines. Frameworks add value when you need multi-agent coordination, persistent memory, complex tool management, or production observability. If your agent does one thing well, raw code may be simpler. If it coordinates multiple capabilities, a framework saves significant engineering time.
What is the perception-reasoning-action loop?
It is the fundamental architecture of any autonomous agent. Perception: the agent observes its environment (reads messages, checks APIs, monitors file systems). Reasoning: the agent processes observations through an LLM to decide what to do next. Action: the agent executes the decision (calls a tool, writes a file, sends a message). The loop repeats continuously. Every AI agent, from a simple chatbot to a complex autonomous system, implements some version of this loop. The differences lie in the sophistication of each stage.
How do I deploy an AI agent to production?
Step 1: Containerize or package your agent with all dependencies. Step 2: Provision a persistent server (VPS, dedicated server, or managed platform like osModa). Step 3: Set up process supervision (systemd, Docker with restart policies). Step 4: Configure health checks and monitoring. Step 5: Implement graceful shutdown and state checkpointing. Step 6: Set up log rotation and alerting. osModa handles steps 2-6 automatically — you provide the agent code, and the platform provides the self-healing infrastructure.