A key part of the Cantilever brand is achieving greater stability with their websites. On our own site, we advertise that by working with us, the client’s website will become more stable. Delivering on this promise is paramount to our success.
Principles
- Downtime is a P0: Any downtime incident is serious, and resolving it should take priority over any other work. This goes across teams – even if you don’t typically work on a specific site, if it is down, please focus on fixing that ahead of any other tasks.
- Reluctant Acceptance: Downtime is a major problem for our clients and goes against our brand promises, so we must work hard to minimize it. At the same time, all websites experience downtime and sometimes it is not in our control. We must do everything in our control to prevent it while not promising that we can head of all risk.
- Planning & Transparency: When planned downtime must happen, we communicate openly to the client about it. When a site goes down unexpectedly, we are transparent with the client about why.
📏 Rules
- All sites on Core Coverage must have uptime monitoring through StatusCake or equivalent.
- Whenever there is downtime on a site with Core Coverage, the Strategist must send a downtime report to the client within 24 hours (ideally much sooner) outlining the cause of the downtime. If the downtime was preventable, we should send a remediation plan for the fix. This should be a planned project which is free or discounted if the original problem was our fault.
- If there is downtime more than once in a month on a site, the team should meet to ensure we know the cause and have/will solve the underlying problem. Normal work for this site may not continue until the downtime is fully resolved.
- If we ever must have downtime on a site, such as for a large deploy where downtime is impossible to avoid, we must inform the client well in advance and have their consent for the downtime window.