How SSE/HTTP deploy works on osModa
1
Deploy transports

Production Streamable HTTP and SSE transports on dedicated NixOS servers.

2
Transport supervision

Watchdog keeps connections alive. Auto-restart on broken transports.

3
Manage from Telegram

"Restart SSE endpoint" — OpenClaw handles the full transport lifecycle.

Deploy MCP TransportsFrom $14.99/mo · full root SSH

MCP SSE and Streamable HTTP Deployment: Production Transport Guide

Deploy MCP servers with streamable HTTP endpoints for production workloads. This guide covers the transport layer transition from SSE to streamable HTTP, connection management, session handling, scaling strategies, and monitoring. Whether you are deploying your first remote MCP server or migrating an existing SSE deployment, this is your implementation reference.

The MCP transport landscape changed significantly in 2025. The March 2025 spec update introduced streamable HTTP as a new transport option. The June 2025 update deprecated SSE entirely. Streamable HTTP uses standard HTTP POST requests with chunked transfer encoding, replacing the persistent SSE connections that were difficult to scale and secure. Microsoft launched Azure Functions support for MCP with built-in streamable HTTP in January 2026. Google Cloud Run added FastMCP integration with streamable HTTP as the default transport. The ecosystem has converged on streamable HTTP as the production standard, and osModa's mcpd daemon has supported it since the spec was finalized.

Last updated: March 2026

TL;DR

  • • SSE was deprecated in June 2025 -- streamable HTTP is now the production standard for remote MCP servers
  • • Streamable HTTP uses standard HTTP POST with chunked encoding, compatible with load balancers, proxies, and API gateways
  • • Supports both stateless (simple tool calls) and stateful (multi-turn sessions) modes with configurable session management
  • • Migration from SSE is straightforward -- transport is a config change, your MCP server code stays the same
  • • osModa's mcpd daemon handles TLS, session management, connection pooling, and health monitoring automatically

SSE vs Streamable HTTP: What Changed and Why

The original MCP remote transport used Server-Sent Events (SSE) for server-to-client streaming and a separate HTTP POST endpoint for client-to-server messages. This dual-channel approach worked for simple deployments but created significant problems at scale.

AspectSSE (deprecated)Streamable HTTP
Connection modelPersistent, long-livedRequest-response
DirectionServer-to-client onlyBidirectional
Streaming supportNative (event stream)Chunked transfer encoding
Load balancingDifficult (sticky)Standard HTTP LB
API gateway compatLimitedFull
Corporate proxyOften blockedWorks natively
Traffic inspectionDifficultStandard HTTP tools
Session modelConnection-boundStateless or stateful
Serverless compatNoYes (stateless mode)

The move to streamable HTTP is described by the MCP team as adopting a "sealed letter" model: each request is a self-contained message that can be inspected, authenticated, and routed independently. This enables standard HTTP security practices (CORS, CSP, rate limiting) and integrates with existing infrastructure without special configuration.

Deploying MCP with Streamable HTTP on osModa

osModa's mcpd daemon abstracts the complexity of streamable HTTP deployment. You define your MCP server and its transport configuration in mcpd.toml, and the daemon handles endpoint creation, TLS termination, session management, authentication, and connection lifecycle.

Step 1: Define your MCP server in mcpd.toml

[server.api-tools]
command = "python"
args = ["-m", "my_mcp_server"]
transport = "streamable-http"
port = 8443
health_check = "/health"
health_check_interval = "30s"

[server.api-tools.http]
chunked_encoding = true
max_request_size = "10MB"
request_timeout = "300s"
keepalive_timeout = "60s"
max_concurrent_requests = 100

[server.api-tools.session]
enabled = true
timeout = "3600s"
max_per_client = 5
storage = "memory"  # or "redis" for distributed

[server.api-tools.tls]
auto_cert = true  # Let's Encrypt
domain = "mcp.example.com"

Step 2: Deploy with osModa

# SSH into your osModa server
ssh root@your-server.os.moda

# Place your MCP server code
git clone your-mcp-server /opt/mcp/api-tools

# Install dependencies
cd /opt/mcp/api-tools && pip install -r requirements.txt

# mcpd picks up the config and starts the server
systemctl restart mcpd

# Verify the endpoint is live
curl https://mcp.example.com/health
# {"status": "ok", "server": "api-tools", "transport": "streamable-http"}

mcpd handles TLS certificate provisioning via Let's Encrypt, sets up the streamable HTTP endpoint with proper headers (HSTS, CSP, X-Frame-Options, CORS), and begins monitoring the server process with the watchdog daemon. No nginx, no Caddy, no manual reverse proxy configuration required.

Connection Management and Session Handling

Streamable HTTP in MCP supports two session modes: stateless and stateful. Choosing the right mode depends on whether your tools need context across multiple requests.

Stateless Mode

Each request is independent. No session state is maintained between calls. The server processes the tool call and returns the result. This mode is simplest to scale because any server instance can handle any request. Ideal for pure functions like calculations, data lookups, or API proxies that do not need context.

In mcpd.toml, set session.enabled = false.

Stateful Mode

The server assigns a session ID on the first request. The client includes this session ID in subsequent requests, and the server maintains state (database connections, caches, conversation context) across the session. Required for tools that need multi-step workflows or persistent context.

mcpd handles session creation, validation, timeout, and cleanup automatically. Sessions can be stored in memory (single instance) or Redis (distributed).

Connection Pooling

mcpd maintains a pool of HTTP keepalive connections between the gateway and the MCP server process. This eliminates per-request TCP and TLS handshake overhead. The pool size is configurable based on expected concurrent requests. Idle connections are pruned after the keepalive timeout to prevent resource exhaustion.

Graceful Shutdown

During server restarts (deployment, crash recovery, rollback), mcpd drains in-flight requests before terminating the old process. New requests are routed to the new process. This ensures zero dropped requests during deployments. The drain timeout is configurable, defaulting to 30 seconds.

Scaling MCP Server Deployments

As enterprise MCP deployments scale to thousands of daily requests, the infrastructure must scale with them. The approach depends on your session model and request volume.

Vertical Scaling

The simplest approach. Upgrade your osModa plan to a server with more CPU and RAM. Effective up to thousands of concurrent connections. osModa plans range from Starter ($14.99/mo) to Enterprise ($125.99/mo), each with increasing resources. NixOS atomic switching makes plan upgrades seamless -- the new configuration applies instantly with no downtime.

Horizontal Scaling (Stateless)

For stateless MCP servers, deploy multiple instances behind a standard HTTP load balancer. Each instance handles requests independently. Streamable HTTP's request-response model makes this straightforward -- any instance can serve any request. This is the approach used by Azure Functions and Cloud Run for MCP hosting. osModa supports multi-instance deployment with automatic load balancer configuration.

Horizontal Scaling (Stateful)

For stateful MCP servers, horizontal scaling requires session affinity. Requests from the same session must reach the same server instance. osModa supports two approaches: sticky sessions (routing based on session ID in the load balancer) and shared session stores (Redis-backed sessions accessible from any instance). The shared store approach is more resilient because it survives instance failures.

For most teams, vertical scaling on osModa is sufficient. A single osModa Enterprise server can handle thousands of concurrent MCP tool calls. Horizontal scaling is relevant for high-traffic production APIs or multi-tenant MCP deployments. See pricing plans for server specifications at each tier.

Monitoring and Observability

Production MCP deployments require monitoring at three layers: transport health, server process health, and tool execution metrics. osModa's mcpd daemon provides observability at all three layers out of the box.

Transport Metrics

  • Request rate (requests per second by endpoint)
  • Response latency (p50, p95, p99 by tool)
  • Error rate (4xx and 5xx by endpoint)
  • Active connections and session count
  • Connection pool utilization
  • TLS certificate expiry countdown

Process Health

  • MCP server process status (running, crashed, restarting)
  • CPU and memory usage per server
  • Restart count and last crash timestamp
  • Health check pass/fail history
  • Watchdog recovery time (target: under 6 seconds)

Tool Execution

  • Tool call count by tool name and client
  • Tool execution duration histogram
  • Tool error rate and error types
  • OAuth scope usage and denial count
  • Rate limit hit count by client

Integration

mcpd exposes metrics in Prometheus format on a configurable metrics endpoint. Structured logs are written in JSON format compatible with ELK, Datadog, and Splunk. Health check endpoints work with any uptime monitoring service. Full SSH access means you can integrate with any monitoring stack your team uses.

Beyond real-time monitoring, every metric is also captured in the tamper-proof audit ledger for historical analysis and compliance reporting. For a deeper look at observability capabilities, see Watchdog Auto-Restart.

Migrating from SSE to Streamable HTTP

If you have existing MCP servers deployed with SSE transport, migration to streamable HTTP is straightforward. The MCP server code itself does not change -- transport is handled at the framework layer. Here is the migration path.

  1. 1

    Update your MCP SDK

    Ensure you are using an MCP SDK version released after June 2025 that supports streamable HTTP. The official TypeScript and Python SDKs both support it. If you are using FastMCP, update to the latest version.

  2. 2

    Change transport in mcpd.toml

    Switch transport = "sse" to transport = "streamable-http". Configure the HTTP-specific settings (chunked encoding, request timeout, max request size). osModa can run both SSE and streamable HTTP endpoints simultaneously during migration.

  3. 3

    Update client configurations

    Point MCP clients (Claude Desktop, Cursor, custom agents) to the new streamable HTTP endpoint. Most clients auto-detect the transport type. During the transition, keep the SSE endpoint running for clients that have not updated.

  4. 4

    Deprecate SSE endpoint

    Once all clients have migrated, remove the SSE transport configuration. Monitor the SSE endpoint for any remaining connections before shutdown. NixOS atomic rollback provides a safety net -- if any client breaks, roll back to the dual-transport configuration instantly.

osModa vs Cloud Platforms for MCP Deployment

You can deploy MCP servers with streamable HTTP on various platforms. Here is how osModa compares to popular cloud alternatives.

FeatureosModaCloud RunAzure Func
Streamable HTTPNativeFastMCP pluginBuilt-in
Persistent processYesCold startsCold starts
Stateful sessionsBuilt-inExternal storeExternal store
Audit loggingTamper-proofCloud LoggingApp Insights
Root SSH accessYesNoNo
Atomic rollbacksNixOSRedeploySlot swap
Pricing modelFlat ratePer requestPer execution

Cloud Run and Azure Functions work well for stateless MCP tool calls. osModa is the better choice when you need persistent processes, stateful sessions, tamper-proof audit trails, full infrastructure control, or predictable flat-rate pricing. For a broader infrastructure comparison, see osModa vs Traditional VPS.

Frequently Asked Questions

What is streamable HTTP in MCP?

Streamable HTTP is the production transport for remote MCP servers, introduced in the March 2025 MCP spec update and established as the standard in the June 2025 update when SSE was deprecated. It uses standard HTTP POST requests with chunked transfer encoding to support both request-response and streaming communication patterns. Unlike SSE, streamable HTTP works with standard HTTP infrastructure (load balancers, API gateways, CDNs) and supports bidirectional communication without long-lived connections.

Why did MCP deprecate SSE in favor of streamable HTTP?

SSE (Server-Sent Events) required persistent, long-lived connections that were difficult to load-balance, impossible to route through standard API gateways, and problematic behind corporate proxies that terminate idle connections. SSE connections are also unidirectional (server to client only), requiring a separate HTTP channel for client-to-server messages. Streamable HTTP solves all these issues by using standard HTTP POST requests that work with existing infrastructure, support bidirectional streaming via chunked encoding, and enable traffic inspection and CORS enforcement.

Should I still support SSE alongside streamable HTTP?

During the transition period, supporting both is reasonable. Many existing MCP clients still use SSE, and backward compatibility prevents breaking existing integrations. osModa's mcpd daemon supports both transports simultaneously on separate endpoints, so you can serve legacy SSE clients while new clients use streamable HTTP. The MCP specification officially deprecated SSE in June 2025, so new deployments should prioritize streamable HTTP.

How does session management work with streamable HTTP?

Streamable HTTP supports both stateless and stateful modes. In stateless mode, each request is independent -- ideal for simple tool calls that don't need context. In stateful mode, the server assigns a session ID that the client includes in subsequent requests, enabling multi-turn interactions and persistent state. osModa's mcpd daemon handles session creation, validation, and cleanup automatically, with configurable session timeouts and maximum concurrent sessions per client.

How do I handle connection pooling for MCP servers?

osModa's mcpd daemon manages connection pooling at the transport layer. For streamable HTTP, it maintains a pool of keepalive connections to reduce TLS handshake overhead. Connection pool size, idle timeout, and maximum lifetime are configurable in mcpd.toml. The daemon also handles graceful connection draining during server restarts, ensuring in-flight requests complete before the old server process exits.

Can I deploy MCP with streamable HTTP on serverless platforms?

Yes, but with caveats. Streamable HTTP's stateless mode works on serverless platforms like AWS Lambda and Azure Functions. However, stateful sessions require persistent processes which serverless platforms cannot guarantee. You also lose process supervision, watchdog restart, persistent state, and co-located audit logging. osModa provides dedicated server hosting specifically to avoid these serverless limitations while still supporting the streamable HTTP transport.

How do I monitor MCP server health in production?

osModa's mcpd daemon exposes health check endpoints for each MCP server, reports metrics (request latency, error rate, active connections, session count) to standard monitoring systems, and writes structured logs compatible with log aggregation tools. The watchdog daemon monitors process health and restarts crashed servers within 6 seconds. All health events are recorded in the audit ledger for forensic review.

What scaling strategies work for MCP servers?

MCP servers can be scaled vertically (larger server with more CPU/RAM) or horizontally (multiple instances behind a load balancer). Streamable HTTP's stateless mode enables horizontal scaling with standard HTTP load balancers. Stateful sessions require sticky sessions or a shared session store. osModa supports vertical scaling across its pricing tiers and can deploy multiple MCP server instances with session-aware routing when horizontal scaling is needed.