Adam Ochayon
Solution Architect
Published on
March 27, 2025
Service availability is the lifeblood of today’s hybrid enterprises. Yet Cloudflare’s March 21, 2025 outage proved that even top-tier providers can stumble on a seemingly simple task: rotating credentials.
This global misstep, which took down major Cloudflare services, is a sharp reminder of what identity and security teams already know too well: key rotation can be trickier than it looks. When it goes wrong, the fallout can be expensive, disruptive, and damaging to trust.
In this post, we’ll break down the Cloudflare outage, revisit other high-profile incidents caused by secret mismanagement, and share best practices for safe, disruption-free credential rotations.
On March 21, 2025, Cloudflare’s R2 Object Storage encountered an elevated error rate for 1 hour and 7 minutes, causing total write failures and partial read failures globally. The root cause stemmed from credential rotation errors in the R2 Gateway, the component responsible for authenticating Cloudflare’s gateway worker to the storage backend.
Cloudflare’s team didn’t have real-time visibility into which credentials were actually in use. Even though they had a rotation process in place, it was missing a critical step: verification. This meant old keys were deleted, without confirming whether they were still being used. Introducing serious risk and, ultimately, triggering the outage.
The incident report highlighted a manual process, which made it even easier for the lack of verification to turn into a full-blown outage. Without robust automation and guardrails, mistakes in DevOps pipelines could quickly escalate into major incidents.
The Production R2 Gateway also depended on multiple underlying services and credentials, but without a clear map of these dependencies, misconfigurations could go undetected until it was too late.
Mismanaged rotation isn’t unique to Cloudflare; organizations across industries have grappled with credential-related breaches from exploited, exposed and unrotated secrets, or from outages due to unmonitored credentials expiring without warning.
Some recent examples include:
Cloudflare’s team faced a fundamental challenge: operational complexity became the biggest barrier to security. Without a structured, automated rotation process and real-time visibility into which credentials were actively used in production, they introduced unnecessary risk - not just to security but to business continuity.
Key rotation and other NHI lifecycle tasks are vital not only for mitigating risk, maintaining compliance, and bolstering security posture, but also for ensuring overall system resilience.
At Oasis, we know managing non-human identities is about more than security - it’s about keeping systems running smoothly. Rotating secrets in a multi-cloud environment is risky when you lack full visibility, and no one wants a rotation that breaks production. Drawing on lessons from Cloudflare and customers facing similar challenges, Oasis has built an identity-centric approach that delivers continuous security without guesswork or disruptions.
Instead of relying on hope and manual processes, Oasis automatically discovers every secret, token, and identity across cloud and on-prem environments. But visibility alone isn’t enough. We map out exactly how these credentials interact - who owns them, where they’re used, and what they have access to, so teams aren’t caught off guard by dependencies they didn’t know existed.
When it comes to rotation, we take a policy-driven and automated approach. Whether it’s enforcing a strict 30-day cycle or triggering a rotation when an IT employee leaves, Oasis ensures that credentials aren’t just replaced, but verified. If an old key is still in use, we flag it immediately so nothing gets shut off before it’s safe to do so. We give you the context to decide - eliminating the “pull it and see who screams” approach. No guesswork, no surprises.
Going beyond just rotation, Oasis keeps an eye out for orphaned identities, expired secrets, and missing owners, so security teams can stay ahead of risks instead of scrambling to fix them later.
At the end of the day, managing NHIs isn’t just about locking things down—it’s about keeping businesses running without friction. Oasis makes sure security and operations work together, so teams can focus on building, not firefighting.
The Cloudflare outage on March 21, 2025 offers a sobering reminder that mismanaged key rotation can sabotage even the most resilient infrastructures. The misstep of deploying new credentials to a dev environment, then deleting the old credentials prematurely, caused over an hour of production disruption. And Cloudflare is hardly alone - rotation-driven mishaps have plagued Microsoft, Dropbox, and countless others.
But it doesn’t have to be this way. By embracing identity-centric rotation practices, automated discovery, contextual mapping, staged deployments, and robust validation, teams can keep keys fresh without taking the business offline. Oasis’s NHI Security Cloud is designed precisely for this mission: ensuring safer, more controlled rotations, even across sprawling multi-cloud topologies.
Ready to transform your rotation process from a high-stakes guesswork exercise into an automated, interruption-free workflow? Get in touch with Oasis today and leave the fear of rotation-induced outages behind.
Further Reading & References
Have questions or comments? Reach out to our team at Oasis Security - we’re here to help.