The Year Agents Became Colleagues
If you had surveyed a hundred CTOs in January 2024 about AI agents, sixty would have called them “promising but experimental.” Run that same survey in March 2026 and the language is different. The word that keeps appearing is “operational.” Not futuristic, not aspirational — operational. Gartner now estimates that 33% of enterprise software will incorporate agentic AI by year-end 2028, up from less than 1% in 2024. McKinsey reports that companies deploying agents saw a 20-30% improvement in task throughput within six months. The market for AI agents is projected to reach $47 billion by 2030.
But aggregate numbers obscure the actual texture of what is happening. The agents of AI are not a monolith. A coding agent that writes and tests software has almost nothing in common — architecturally, economically, or infrastructurally — with a customer support agent that resolves tickets or a trading agent that executes microsecond arbitrage. To understand the 2026 agent landscape, you need to look at it species by species. That is what this field guide attempts.
1. Coding Agents
The fastest-growing and most technically demanding category. Coding agents do not just suggest code — they write it, test it, debug it, and ship it. The leap from autocomplete to autonomous software engineering happened faster than almost anyone predicted.
Devin (Cognition AI)
CodingThe system that broke the dam. Devin launched in early 2024 as the first AI “software engineer” — not a copilot, but a fully autonomous agent that receives a task in natural language, plans an implementation, writes the code, runs tests, and iterates until the build passes. Each Devin session spins up a dedicated cloud sandbox with shell access, file system, editor, and browser. By 2026, Cognition reports thousands of engineering teams using Devin for production pull requests, with an average session length of 24 minutes.
Infrastructure: Dedicated sandbox per session, 8–16 GB RAM, shell + browser access, persistent file system. Scale: Thousands of concurrent sessions across enterprise customers.
Cursor Agent Mode
CodingCursor evolved from a smart editor into a full agent IDE. Its agent mode can read your codebase, propose multi-file changes, run terminal commands, and iterate based on compiler errors and test output. Unlike Devin's cloud sandbox approach, Cursor runs locally in the developer's environment, which means faster iteration but shared resources. Anysphere (the company behind Cursor) reportedly reached $300 million in annual recurring revenue by early 2025 — a staggering number for a developer tool less than two years old.
Infrastructure: Local execution on developer machines, API calls to hosted models, 8–32 GB RAM recommended. Scale: Millions of developers, with agent-mode adoption accelerating.
Claude Code (Anthropic)
CodingAnthropic's terminal-native coding agent. No IDE wrapper — Claude Code operates directly in the command line, which gives it access to the full Unix toolchain. It reads and edits files, runs shell commands, manages git operations, and iterates on test failures. The terminal-first approach appeals to infrastructure engineers and backend developers who live in the shell. By 2026, Claude Code has become a standard tool in DevOps and platform engineering workflows, particularly for codebases too large for IDE-based agents to index effectively.
Infrastructure: Terminal execution, API calls to Claude models, minimal local resources. Scale: Integrated into CI/CD pipelines at hundreds of organizations.
Where coding agents are heading: The trajectory is toward full autonomy for well-specified tasks. GitHub reports that Copilot already writes 46% of code in files where it is enabled. The next frontier is agents that handle entire features end-to-end: read the ticket, understand the codebase, write the implementation, author tests, open the PR, and respond to code review comments. We are not there yet, but we are closer than most people realize.
2. Research Agents
Research agents gather, synthesize, and analyze information at scales no human team can match. They do not just search — they read, cross-reference, evaluate source credibility, and produce structured analysis. The best ones have replaced entire preliminary research phases.
Perplexity
ResearchPerplexity transformed from an “answer engine” into a full research agent. Its Pro Search executes multi-step research plans: it formulates sub-questions, searches the web, reads full pages, synthesizes findings, identifies contradictions, and produces cited reports. Perplexity reached 15 million monthly active users by late 2024 and has continued growing. Its API serves hundreds of applications that need real-time, sourced research capabilities.
Infrastructure: Cloud-hosted, web crawling + LLM inference, massive bandwidth for real-time source fetching. Scale: Tens of millions of queries per day.
Google Deep Research
ResearchGoogle's Deep Research agent, built on Gemini, takes a topic and autonomously creates a multi-step research plan, browses the web, reads dozens of sources, and compiles a comprehensive report with citations. A single query can trigger 5-10 minutes of autonomous research. It represents Google's answer to the question of what happens when you give a search engine agency — it stops returning links and starts returning answers. OpenAI's Deep Research offers a comparable product, often spending 5-30 minutes on a single query to produce report-grade output.
Infrastructure: Heavy compute per query (minutes of LLM inference + concurrent web browsing), 16–64 GB RAM equivalent per session. Scale: Millions of research sessions daily across Google and OpenAI platforms.
Where research agents are heading: Toward continuous monitoring rather than one-shot queries. The next generation of research agents will watch domains persistently — tracking regulatory changes, competitor movements, academic publications — and surface insights proactively. This shifts the infrastructure requirement from burst compute to always-on persistent processes with scheduled wake cycles.
3. Customer-Facing Agents
The most widely deployed category by raw instance count. Customer agents handle support tickets, answer questions from knowledge bases, route complex issues to human teams, and increasingly handle the entire resolution lifecycle autonomously. The economics are compelling: a support agent that resolves 50-80% of tier-1 tickets autonomously pays for itself within weeks.
Intercom Fin
CustomerFin is arguably the most successful customer-facing AI agent in production. It ingests a company's entire help center, knowledge base, and historical ticket data, then handles incoming support conversations autonomously. Intercom reports that Fin resolves over 50% of support queries without human intervention for its customers, with some teams seeing resolution rates above 70%. Fin does not just retrieve articles — it reasons across multiple sources to construct personalized answers, and it knows when to escalate.
Infrastructure: RAG pipeline + LLM inference, vector database for knowledge retrieval, 24/7 uptime mandatory. Scale: Millions of conversations per month across Intercom's customer base.
Zendesk AI Agents
CustomerZendesk's AI agents handle ticket triage, automated responses, and full resolution across email, chat, and social channels. Their pre-trained on billions of customer service interactions, which gives them domain-specific understanding that general-purpose LLMs lack. Zendesk reports that their AI resolves up to 80% of customer interactions without human involvement, with resolution quality scores matching or exceeding human agents for routine queries. Salesforce's Agentforce represents the same pattern at even larger scale — over 380 million transactions processed.
Infrastructure: Multi-channel ingestion, intent classification, knowledge graph, 24/7 with sub-second response times. Scale: Hundreds of millions of interactions monthly across both platforms.
Where customer agents are heading: Toward proactive engagement. Current agents are reactive — they wait for a ticket. The next generation monitors user behavior and intervenes before a problem becomes a ticket. Imagine an agent that notices a user struggling with a checkout flow and offers help before they abandon cart. The infrastructure shift is significant: from responding to events to processing real-time behavioral streams.
4. Trading & Finance Agents
Financial agents operate under the most unforgiving constraints of any category. Latency is measured in milliseconds. Downtime has direct monetary cost. Errors compound. And the regulatory environment demands audit-grade logging of every decision. These agents have existed in simpler forms for decades (algorithmic trading is not new), but the LLM-powered generation adds natural language reasoning over market sentiment, news events, and unstructured data.
Alpaca Markets
FinanceAlpaca provides the API infrastructure that thousands of AI trading agents use to execute trades. Their commission-free trading API has become the default execution layer for autonomous trading bots. The agents themselves run on customer infrastructure — Alpaca provides the brokerage rails. By 2026, LLM-powered agents using Alpaca's API are incorporating sentiment analysis from earnings calls, SEC filings, and social media into their trading strategies, moving beyond pure technical analysis.
Infrastructure: Low-latency connections to exchange APIs, 24/7 uptime during market hours, 16–64 GB RAM for backtesting. Scale: Thousands of autonomous trading agents executing millions of trades.
QuantConnect
FinanceQuantConnect provides the full stack for algorithmic and AI-powered trading: backtesting engine, live trading infrastructure, and data feeds across equities, options, futures, forex, and crypto. Their LEAN engine runs thousands of community-built algorithms, and LLM-augmented strategies are now among the fastest-growing category on the platform. What makes QuantConnect notable is that it provides reproducible infrastructure for financial agents — the same code runs in backtest and live mode, which addresses the simulation-to-production gap that plagues trading agent development.
Infrastructure: GPU for model inference, high-frequency data feeds, isolated execution per strategy, complete audit logging. Scale: Over 280,000 registered quants, thousands of live algorithms.
Where trading agents are heading: Multi-modal reasoning. The current generation reads numeric data and text. The next generation will watch earnings calls on video, interpret CEO body language, read satellite imagery of retail parking lots, and cross-reference shipping container data with inventory filings. The compute requirements will increase by an order of magnitude.
5. DevOps & Infrastructure Agents
DevOps agents monitor systems, detect anomalies, diagnose root causes, and increasingly fix problems autonomously. They represent the most direct application of agents to infrastructure itself — agents that manage the systems that other agents run on. The recursive implications are not lost on anyone paying attention.
PagerDuty AIOps
DevOpsPagerDuty's AIOps agents correlate alerts across hundreds of monitoring sources, suppress noise (reducing alert volume by up to 98% according to their data), identify probable root causes, and route incidents to the right team with full context. The agent layer reduces mean time to resolution by surfacing relevant runbooks, past incidents, and probable fixes alongside the alert. In 2026, their agents increasingly execute remediation steps autonomously — restarting services, scaling resources, rolling back deployments — when confidence is high enough.
Infrastructure: 24/7 uptime (non-negotiable), event stream ingestion from monitoring tools, isolated from the systems being monitored. Scale: Processing millions of events per hour across thousands of customers.
Self-Healing Infrastructure (osModa, Ciroos)
DevOpsA newer category that goes beyond monitoring into autonomous infrastructure management. osModa's NixOS-based platform uses declarative system configurations that agents can reason about deterministically — if the actual system state diverges from the declared state, the system self-corrects in under six seconds. This is not ML-based anomaly detection but deterministic self-healing: reproducible builds mean the “desired state” is mathematically defined, not inferred. It represents a fundamentally different approach to reliability.
Infrastructure: NixOS for declarative configuration, process supervision, cgroup isolation, SHA-256 audit logging. Scale: Growing rapidly as agent hosting demands purpose-built infrastructure.
Where DevOps agents are heading: Full closed-loop operations. The vision is an agent that detects a performance regression in production, identifies the offending commit, writes a fix, runs the test suite, and deploys the patch — all without waking a human at 3 AM. The pieces exist individually. Connecting them requires multi-agent orchestration across coding agents, monitoring agents, and deployment agents.
6. Creative Agents
Creative agents generate images, video, music, and 3D assets. They are the most GPU-hungry category and the most publicly visible — their output is literally what people see. The infrastructure requirements are substantially different from text-based agents because generation involves heavy neural network inference rather than API calls.
Midjourney
CreativeMidjourney remains the dominant image generation platform with over 16 million users. What started as a Discord bot has evolved into a full creative agent with its own web platform, style memory, and iterative refinement workflows. A single image generation takes 10-60 seconds of GPU compute. Multiply that by millions of daily generations and the infrastructure bill becomes extraordinary. Midjourney reportedly runs one of the largest GPU clusters outside the major hyperscalers.
Infrastructure: Massive GPU clusters (thousands of A100/H100 GPUs), image storage and CDN, queue management for burst load. Scale: Millions of images generated per day.
Runway (Gen-3 Alpha)
CreativeRunway's video generation models represent the cutting edge of creative agents. Gen-3 Alpha generates 10-second video clips from text prompts with remarkable coherence. Video generation is orders of magnitude more compute-intensive than image generation — each second of output requires generating 24-30 individual frames with temporal consistency. Suno occupies a parallel position in audio, generating full songs with vocals from text descriptions. These agents are pushing the boundary of what “content creation” means.
Infrastructure: GPU clusters with high VRAM (80+ GB per accelerator), video encoding pipelines, storage-intensive. Scale: Hundreds of thousands of video generations per day and growing.
Where creative agents are heading: Toward multi-modal pipelines. A single creative brief will trigger an agent that generates the concept art (image model), produces the video ad (video model), composes the soundtrack (audio model), and writes the copy (language model). Orchestrating these heterogeneous models requires infrastructure that can schedule across GPU types and manage complex dependency chains.
7. Personal Agents
The most ambitious and least mature category. Personal agents attempt to be the universal interface between a human and their digital life — managing calendar, email, tasks, shopping, and information retrieval through a single conversational surface. They are the category with the highest expectations and, so far, the widest gap between promise and delivery.
Apple Intelligence
PersonalApple's approach to personal agents is the most cautious of the major players — and possibly the smartest. Apple Intelligence runs a tiered model: lightweight tasks execute on-device for privacy, while complex tasks route to Apple's Private Cloud Compute infrastructure. The agentic capabilities are evolving from Siri's traditional assistant mode toward true cross-app agency, with the ability to take actions across apps, compose emails based on context, and summarize information across the device. The distribution advantage is overwhelming: over 2 billion active Apple devices.
Infrastructure: Hybrid on-device + cloud, Apple Silicon Neural Engine for local inference, Private Cloud Compute for heavier tasks. Scale: Billions of potential users, with rollout across iOS, macOS, iPadOS.
Rabbit R1 & Hardware Agents
PersonalThe Rabbit R1 represented the first serious attempt at a dedicated hardware device for AI agents. Its Large Action Model (LAM) was designed to interact with apps on behalf of the user, not through APIs but through learned interface navigation. The initial launch was rocky — limited functionality, slow responses, missing features. But the concept of a physical device dedicated to agent interaction influenced the broader industry. Humane's AI Pin pursued a similar vision. Both revealed a fundamental truth: the agent infrastructure problem is harder than the hardware problem. You can build a beautiful device, but if the agent backend cannot reliably execute complex tasks, the device is a paperweight.
Infrastructure: Cloud backend for LAM inference, persistent user context storage, API integrations with third-party services. Scale: Hundreds of thousands of devices sold, but active usage significantly lower.
Where personal agents are heading: The winner will likely not be a dedicated device but an ambient layer across existing devices. Google's Project Astra, Apple Intelligence, and Meta AI are all pursuing this: an agent that knows your context across phone, laptop, watch, and home devices. The infrastructure requirement is persistent, personalized state storage with strict privacy controls — effectively a private database per user that the agent can query but no one else can access.
Infrastructure Requirements by Agent Category
Every category of agent has different compute, memory, uptime, and special requirements. Here is the 2026 reality for each:
| Agent Category | RAM | GPU | Uptime | Special Needs |
|---|---|---|---|---|
| Coding | 8–32 GB | Optional | Per-session | Sandboxed shell + FS |
| Research | 16–64 GB | Recommended | Burst / Scheduled | Web access, storage |
| Customer | 8–16 GB | Optional | 24/7 | Vector DB, low latency |
| Trading | 16–64 GB | Required | 24/7 | Low latency, audit logs |
| DevOps | 8–16 GB | Optional | 24/7 | Isolated from monitored systems |
| Creative | 32–128 GB | Required | Burst / Queue | High VRAM, CDN |
| Personal | 4–16 GB | On-device NPU | Always-on | Privacy, cross-device sync |
The Infrastructure Implication Nobody Is Talking About
Here is what becomes obvious when you survey the entire landscape at once: every single category of agent needs real computers. Not ephemeral containers. Not serverless functions. Not sandboxed playgrounds. Actual, persistent, reliable computing environments with process supervision, resource isolation, network access, and audit logging.
The cloud infrastructure we built over the past fifteen years was designed for stateless web applications. Request comes in, response goes out, scale horizontally. Agents break this model completely. They are stateful. They are long-running. They need shell access and file systems. They consume resources unpredictably — a coding agent might sit idle for ten minutes and then spike to 100% CPU for thirty seconds while running a test suite. They need to be monitored not just for uptime but for output quality.
This is why we are seeing the emergence of agent-native infrastructure. Platforms built from first principles for persistent, stateful, autonomous software agents rather than for web servers repurposed to run agents. The distinction matters. A web server does not need self-healing at the platform level because a load balancer can route around it. An agent running a 30-minute coding session cannot be restarted from scratch if the underlying server hiccups. The agent needs its server to be self-healing, not replaceable.
Consider the numbers. If 82% of enterprises are running agents and the median count is twelve agents per organization, and each agent needs persistent compute with 4-64 GB of RAM, the total infrastructure demand for agents alone is measured in exabytes of RAM and millions of dedicated cores. That is a new market. And the companies that build infrastructure purpose-built for this workload — rather than stretching Kubernetes to fit — will capture it.
Looking back at this moment from the vantage point of even a few years from now, the surprise will not be that agents became ubiquitous. It will be how long we tried to run them on infrastructure designed for something else entirely. The agents of AI deserve infrastructure built for agents. In 2026, that infrastructure is finally arriving.
Frequently Asked Questions
What are the main types of agents in AI as of 2026?
The major categories are coding agents (Devin, Cursor, Claude Code), research agents (Perplexity, Google Deep Research), customer-facing agents (Intercom Fin, Zendesk AI), trading and finance agents (Alpaca, QuantConnect), DevOps and infrastructure agents (PagerDuty AIOps, Ciroos), creative agents (Midjourney, Runway, Suno), and personal agents (Apple Intelligence, Rabbit R1). Each category has distinct infrastructure requirements, from sandboxed shells for coding agents to low-latency connections for trading agents to 24/7 uptime for customer support.
How many AI agents are running in production in 2026?
Conservative estimates place the number of active AI agent instances in the hundreds of millions. Salesforce alone reports over 380 million Agentforce transactions. Intercom's Fin resolves over 50% of support queries for its customer base. GitHub Copilot has over 77 million developers using its agent capabilities. The number is difficult to pin down precisely because many organizations run custom internal agents that never appear in public counts, and the definition of 'agent' versus 'automation' remains fuzzy at the edges.
What is the difference between an AI agent and an AI assistant?
An assistant responds to prompts. An agent takes actions. The practical distinction in 2026 is autonomy and persistence: an assistant generates text when asked and stops. An agent maintains state across interactions, makes decisions about what tools to use, executes multi-step plans, and can operate without human input for extended periods. A chatbot that answers questions is an assistant. A system that monitors your codebase, identifies bugs, writes fixes, runs tests, and opens pull requests while you sleep is an agent.
Which AI agent category is growing the fastest?
Coding agents are experiencing the most explosive growth. GitHub Copilot grew from 1.8 million to over 77 million users in under three years. Devin, Cursor Agent, Claude Code, and Windsurf have collectively processed hundreds of millions of coding sessions. The reason is straightforward economics: a coding agent that saves a developer even 30 minutes per day justifies its cost within the first week. Enterprise adoption is accelerating because the ROI is immediately measurable in commits, PRs merged, and tickets closed.
What infrastructure do AI agents need to run reliably?
At minimum: a persistent compute environment (not serverless), 4-64 GB RAM depending on the category, process supervision for automatic restart, structured logging for debugging, and network access for API calls. Agents running local models additionally need GPU VRAM. The requirements that teams underestimate are reliability infrastructure: health checking, memory limits via cgroups, log rotation, and external monitoring. A crashed agent that nobody notices for six hours is a business-critical failure, not a minor inconvenience.
Can AI agents run on serverless or shared hosting?
Simple, stateless agents that make a single API call and return a result can run on serverless. But the agents that matter in 2026 are stateful and long-running. A coding agent session lasts 10-45 minutes. A customer support agent maintains conversation context across multiple exchanges. A trading agent monitors markets continuously. Serverless cold starts of 500ms-5s break real-time interactions, and function timeouts of 5-15 minutes cannot accommodate extended agent sessions. Production agents need persistent, dedicated compute.
How do coding agents like Devin and Claude Code actually work?
Coding agents operate in sandboxed environments with shell access, a code editor, file system access, and often a browser. They receive a task, decompose it into sub-steps, generate code, execute it, observe the output, and iterate until tests pass. The architecture is an LLM-based agent loop: plan, act, observe, revise. Each session requires its own isolated compute environment, which is why coding agents have the highest per-session infrastructure cost of any agent category. Devin runs each session in a dedicated cloud sandbox; Claude Code operates directly in the developer's terminal.
What will the AI agent landscape look like by 2027?
Three trends are converging. First, agent-to-agent communication protocols like MCP and A2A will enable agents from different vendors to collaborate without human orchestration. Second, the infrastructure layer will consolidate around platforms purpose-built for agents rather than repurposed cloud VMs. Third, most knowledge workers will have 3-5 agents operating on their behalf at any given time, handling tasks from email triage to meeting preparation to code review. The agents of AI in 2027 will not be remarkable individually but collectively they will constitute a parallel workforce.