When LLMs Go Down, Ensure Your Agents Stay Up

Large Language Models (LLMs) like OpenAI provide the reasoning and language for your AI agents. When you connect your agent directly to a single LLM provider, that provider’s uptime and performance dictate yours. A delay or outage at the model layer immediately impacts your agents, making them appear broken or unresponsive to your customers.

LLM outages happen at least once or twice a quarter. That’s a lot less often than one year ago, but still often enough to make business leaders and their customers nervous about the reliability of agents.

The solution: a resilient failover plan.

Here’s how Salesforce ensures LLM outages don’t silence agents built with Agentforce.

Agentforce’s failover solution

Keep your agents “always-on” with seamless failover. Agentforce provides automated failover to an equivalent LLM model in Azure OpenAI, ensuring resilience against API errors, latency spikes, or upstream outages.

At Salesforce, we built this at the gateway layer. It’s a provider-level failover system designed for scale.

This isn’t just abstract routing logic. It’s purpose-built infrastructure that rapidly detects failures, intelligently reroutes traffic, and restores service — all without requiring any changes on the client side.

Gateway failovers: Azure OpenAI backs up OpenAI

When a request to an OpenAI model fails with a 401, 403, 404, or server-side 5xx error, the gateway checks whether failover conditions are met. If so, the request is retried against the same model served via Azure OpenAI.

Agentforce supports two failover modes:

Soft failover: Retry happens at the individual request level when a 4xx or 5xx error happens from the primary model provider.
Circuit breaker: If 40% or more of OpenAI traffic fails within a 60-second window, Agentforce platform bypasses retries entirely and routes all traffic to the equivalent model on Azure OpenAI. The circuit resets after 20 minutes if OpenAI recovers.

This protects Agentforce during both isolated failures and sustained outages.

Handling delays: Smart latency retries ensure fast responses

Speed matters in agentic interactions — every millisecond counts. Agentforce’s retry mechanism handles both latency and server-side errors to ensure your agents stay responsive.

If a request stalls, Agentforce automatically retries it, keeping your agents responsive even when upstream models stall. If the request fails completely with a 4xx or 5xx error on the server side, failover kicks in and serves the request from a secondary LLM provider.

A race for answers: Delayed Parallel Retries

Traditional LLM requests rely on a lone call with no option but to stop and wait if there’s a problem.

But we use a method called delayed parallel retries to keep agents fast and responsive if there’s a delay or other issue with the LLM model.

Here’s how it works: When your agent sends a request to an LLM, a primary callout starts. If that callout doesn’t get a response within a certain time, that delay triggers a second, parallel callout. Both requests then “race” to finish, and we use whichever one returns an answer first.

Our method solves two problems that delay traditional single-thread performance: strict sequential processing and long waits.

We bypass sequential processing by creating parallelism. Reactive frameworks combine the primary and delayed calls, and cancel the slower one the moment a response arrives.

We end long waits with a separate, “elastic” thread that schedules the retry timer and manages the “race.”

Agentforce boasts 99.99% availability thanks to this failover design.

Observability: Log and monitor all errors

We use observability tools to understand and improve how our systems perform.

Agentforce monitors every turn of the agentic interaction, capturing everything from a user’s initial request to the AI agent’s final response.

We log every error that occurs, along with its error path.
We monitor failover events and correlate them with planning and reasoning behavior. This helps us adjust our system’s thresholds. For example, we determine how many conversation turns to maintain and pass to the new LLM to ensure a seamless transition without losing context.
We use the circuit breaker mechanism to avoid flip-flopping during partial recovery.
We track model-specific Service Level Indicators (SLIs) to detect degradation in a specific AI model as early as possible, and switch to a backup model.

All of these model interactions are visible to customers in their audit logs.

Agentforce has failover protection built right in

LLM failures, whether due to API errors, latency, or upstream outages, aren’t just noise. They block task progression and undermine the system contract, making customers question their trust in your agents.

Successful agents depend on real-time, multi-turn LLM execution to drive planning and reasoning decisions and system actions.

What we’ve built right into Agentforce is failure-aware routing, scoped to known failure modes and tuned for real latency and error behavior.

It isn’t a general-purpose abstraction layer that merely returns a generic failover answer when things go wrong. Our solution is a proactive, targeted infrastructure to make Agentforce reliable under real-world failure conditions.

Our failover solution currently handles OpenAI and Azure OpenAI models. We’ll soon expand it to other providers, including Anthropic and Gemini. Eventually, you’ll be able to use adaptive model fallback and vendor selection for even more control. But this core principle holds: Model-level volatility is no excuse for system-level fragility.

At Salesforce, Trust is our number one value. We won’t let LLM failures undermine your customers’ trust in your AI agents. Agentforce bakes in the reliability you need by ensuring your agents work even when LLM providers fail.

What’s your agentic AI strategy?

Our playbook is your free guide to becoming an agentic enterprise. Learn about use cases, deployment, and AI skills, and download interactive worksheets for your team.

The future starts now

Frequently Asked Questions (FAQs)

Agentforce’s failover solution addresses the problem of AI agents becoming unresponsive or appearing broken due to delays or outages from a single Large Language Model (LLM) provider.

Agentforce provides automated failover to an equivalent LLM model in Azure OpenAI at the gateway layer, rapidly detecting failures and intelligently rerouting traffic without client-side changes.

Agentforce supports “soft failover,” which retries individual requests upon errors, and “circuit breaker,” which routes all traffic to a backup model if a significant percentage of requests fail within a specific time frame.

Delayed parallel retries initiate a second, parallel call to an LLM if the primary call is delayed. Both calls “race” to finish, and the first response is used, preventing long waits and improving agent speed.

Agentforce uses observability tools to log every error and its path, monitor failover events, and track model-specific Service Level Indicators (SLIs). All these model interactions are visible to customers in their audit logs.