Reclaim Leverage With The Cloud Oligarchs
Posted
In Statista’s February 2025 report on the Global Cloud Infrastructure Market, Felix Richter estimates that the combined market share of the big three public cloud providers at 63%. In their look back at Q4 of 2024, CRN estimates an even more robust 68% share. Amazon has long been the leader, but the delta between the top three has shrunk considerably over the last several years while the distance between them and the rest of the pack has widened. This isn’t likely to change any time soon. Due to the extreme costs, the benefits of efficiency and economies of scale, and the benefits of consolidating expertise, this is the direction the cloud infrastructure market is headed for the foreseeable future. At least it isn’t a monopoly… All hail the cloud oligarchy!
The asks of the cloud provider are fairly straightforward: infrastructure that is easy to deploy and scale, that performs optimally when you need it, and that doesn’t break you financially. Let’s give some kudos to the Oligarchy. Straightforward doesn’t mean easy to deliver, and largely, they’ve been successful at meeting market needs. That’s reflected in their monstrous growth rates over the last decade. But size creates its own set of challenges and when they do fail you, it can be frustrating and costly to be “the little guy.” And no matter how big you are, compared to the Oligarchy, you are the little guy.
How do they fail you?
When failure occurs, it tends to be most impactful when it occurs in one of these three theaters.
- Performance
- Support / Customer Service
- Billing
Performance is a black box because the underlying infrastructure is a black box. That’s intentional, and actually part of the value prop – they handle it so you don’t have to. That said, when something goes wrong, it is incumbent upon the customer to first prove “it’s not at our end” and to do so with a hand tied behind the back. Sometimes this is easy and sometimes it is incredibly difficult. See: degraded and not fully down service.
Support is expensive and the quality of the support tends to vary wildly. AI tools are changing the landscape of self-service, but are incredibly frustrating when any degree of complexity is introduced. When (or if) you can get to real support, you’ll be navigating everything from communication problems to imperfect information on their end, to delays in response and follow up. Many companies struggle with this tough choice. Do you pay the expensive support contract knowing that when you need it most, when there is a big failure on the provider side, it is likely to be at its worst?
Billing falls third on the priority list, but is in some ways the craziest. As an example, AWS supports over 240 fully featured services, not including 3rd party services you purchase from the marketplace but appear on your AWS bill. Many of these services are billed on per minute increments. Take a look at your most recent Cost and Usage Report (CUR) from Amazon and it could be tens of thousands to millions of rows of data. What’s your confidence level that across those thousands of rows, and hundreds of services that you are being billed accurately? No need to answer that, it isn’t high, and that’s why you are paying Flexera or Apptio. Intentionally or not, your cloud provider makes it very difficult to understand exactly what you are buying and paying for.
Unsurprisingly, the three are often related. Commonly, a significant performance problem generates a support case that you ask to be remedied through billing. For better or worse, it has to be remedied through billing.
What you can (or can’t) do about it
Let’s look at your levers. It seems like you have a few available, but they don’t carry equal value and you have probably ignored one of them.
A better deal. Every one to three years, you negotiate pricing with your cloud provider. Theoretically, the more you spend, the more leverage you have. This opportunity presents itself seldomly, so you want to make the most out of it, but the deck is stacked against you. The reality is that your sales rep has a range of discounts they are empowered to offer which are derived from a formula based on your spend and their need to maintain margin. That rep may be able to request more from management, but the next person up the chain has their own (slightly) expanded range of what they can offer. Unless something extraordinary occurs, your pricing will live within this box. And beware: you negotiate at most one of these deals each year. Your team at the Cloud provider negotiates dozens each year and so they are very practiced at maximizing their opportunities.
The threat to leave. The ultimate stick you carry is the threat to take your business elsewhere. If you have ever tried to wave this stick, did you worry that the sales rep was smiling on the inside? There are many forces in play to keep you where you are.
You may have invested in proprietary technology and moving out of it means fundamental changes to your product code or architecture. You may have mountains of data stored, extracting and shipping that data requires time and labor in addition to data transfer fees. Maybe your staff is trained or optimized to operate in Azure and there is some learning curve to picking up and operationalizing the GCP suite.
Moving is a huge project that likely does little for your growth prospects, but threatens your bottom line. You likely need to operate in two places at once for an extended period, doubling your spend until you cut over. When you do cut over, you risk data loss and downtime. Leaving happens, but as a last resort, and typically when relationships are fractured beyond repair (or even after).
Invoke the Service Level Agreement (SLA). Nearly every service you purchase from your cloud provider has a Service Level Agreement associated with it. Thresholds for service availability are aligned to calendar months, are progressive in that they grant more credit for more downtime, and are your primary remedy when performance fails. In my last post, I detailed the reasons your cloud provider doesn’t expect to be held accountable. In short, companies don’t tend to exercise an SLA claim because they don’t expect the results to be worth the effort.
Why you should embrace the SLA
While each of these providers offers an SLA for each service that you purchase, customers have historically not operationalized their use of it. Clearly, if you don’t expect “the juice to be worth the squeeze,” you aren’t going to spend any time squeezing. Here’s why the juice is worth the squeeze.
- The SLA is the easiest stick to wield and offers the most measurable return.
- It sends a clear signal that they messed up, that you noticed, and that you expect better behavior.
- The money can add up - a 0.01% failure (~4.5 mins down) can get you a 10% credit. At intervals or thresholds, more severe failures can earn you more credit.
- At renewal time, your history of SLA requests can give you additional bargaining power.
- You stop paying for downtime!
Next Signal makes it easy
Next Signal is the FinOps tool that your team isn’t using, but should be. We’ve found there is often a disconnect between the Cloud Engineering teams who know when they’ve been impacted and the Finance teams responsible for paying bills (or refusing to pay them). Next Signal bridges that gap by tracking downtime at the big three public cloud providers, tracking the SLAs in place for hundreds of services and, at the end of each month, telling customers when their SLAs haven’t been met. In cases when they haven’t, Next Signal helps customers make credit claims. If your finance organization wants more visibility into outage events and their impact, and wants to be empowered to make claims, and your Cloud Engineering team wants to remain focused on building, reach out to Next Signal and schedule a conversation.