Skip to main content

AWS SageMaker SLA Credits & Refunds Guide

How the AWS SageMaker SLA works: uptime tiers, exclusions, claim windows, and how to recover the credits you're owed when SageMaker goes down.

AWS SageMaker SLA Credits & Refunds

The SageMaker SLA is one of the more nuanced commitments AWS publishes, partly because AI/ML services have multiple availability tiers depending on how you deploy them. This guide breaks down which SageMaker configurations qualify for credits, the calculation method AWS uses, and the operational data you'll need to win a claim.

What this guide covers

  • The official AWS SageMaker uptime commitment and credit tiers
  • Which incidents qualify (and which exclusions silently disqualify claims)
  • How to file a SageMaker credit request inside the AWS claim window
  • Why manual claim recovery typically leaves money on the table

Frequently asked questions about AWS SageMaker SLAs

What is the typical SLA uptime guarantee for AWS SageMaker?

SageMaker has component-specific commitments: AWS targets a 99.95% Monthly Uptime Percentage for Online Inference endpoints and 99.9% for Batch Transform jobs, with separate (lower or undefined) commitments for Training Jobs and Notebook instances. Service credits scale at 10%, 25%, and 100% as availability falls below the relevant component threshold.

How do I claim AWS SageMaker SLA credits after an outage?

Open a billing case in the AWS Support Center within 60 days of the affected billing period (the exact window is in the SageMaker SLA itself). The case needs: the affected resource IDs, timestamps of the disruption in UTC, your monitoring evidence (CloudWatch metrics, error logs, or third-party uptime monitoring) cross-referenced against the AWS Health Dashboard, and your calculation of the Monthly Uptime Percentage. AWS reviews the case manually and applies any granted credit to your next invoice rather than refunding cash. Teams that file these regularly automate the evidence-gathering step because it's the most error-prone — a claim missing the wrong field gets denied and has to be refiled.

What exclusions apply to the AWS SageMaker SLA?

SageMaker specifically excludes failures originating in your custom container images or inference code — if your container fails to start or returns 5xx responses because of a model artifact issue, that does not count as SageMaker unavailability.

Why is it difficult to get refunds for SageMaker outages manually?

AI/ML SLAs are still maturing, and SageMaker carries some of the most nuanced terms in the cloud catalog. Rate limits, queue depths, and model availability all get measured differently, and the SLA often excludes throttling that the provider deems "expected." Teams that successfully claim SageMaker credits do so by capturing per-request latency and error-code data and matching it precisely against the published terms.

Related AWS SLA guides

Other AWS services that share the same claim window and Support Center workflow:

Stop leaving AWS credits unclaimed

The hardest part of recovering SageMaker credits isn't the SLA — it's the lag between an outage and the moment somebody on your team has the bandwidth to file the case. By the time the FinOps team gets around to it, the evidence has rolled out of CloudWatch and the billing window is closing.

Next Signal watches AWS Health and your own observability data, detects SageMaker SLA breaches in real time, assembles the evidence package the way AWS expects it, and files the billing case for you. See how it works or start a free trial.