💡 How Ewake reduced SRE operational toil for Booksy: read the article!
All articles

Case-study
·
Dec 11, 2025
·
6
min read
Context
Booksy is a global, cloud-based marketplace that connects professionals and clients in the beauty, health, and wellness sectors. Originating in Poland, the company has expanded internationally, providing a robust platform that helps entrepreneurs manage appointments, streamline operations, and grow their businesses. With a focus on reliability, scalability, and user experience, Booksy enables service providers to focus on their craft while empowering clients to easily access and book trusted professionals. The company’s mission is to create technology that supports sustainable growth across local and global communities.

A Shared Vision for Reliability
In February 2025, Ewake’s founding team reached out to Booksy’s SRE leadership, with a shared vision: to reduce the operational burden on Site Reliability Engineering (SRE) teams and improve overall service resilience. Booksy, a technology-driven organization with a large, global engineering organization with over a dozen teams, operates a complex and high-traffic platform handling a high-traffic platform handling billions of monthly requests.
At the time, Booksy maintained a strong DevOps culture and an internal group of SRE Evangelists dedicated to democratizing reliability practices among developers. However, as the platform scaled, so did operational challenges — from increasing system noise to the growing effort required to manage incidents effectively. During the initial discussions, the team shared a high volume of concurrent alerts, reflecting the high volume of signals and the toll on SREs.
Booksy’s near-term priorities included reducing alert fatigue, improving incident response efficiency, and ultimately lowering Mean Time to Recovery (MTTR). A long-term goal was to further increase DevOps maturity across teams while maintaining their strong alignment with their observability stack.
Collaboration duration | Investigations completed | Alert handled |
|---|---|---|
6 months | 1,439 | 748 |
Implementation & Collaboration
Building on the initial discussions, Ewake was integrated into Booksy’s existing observability and operations ecosystem. The goal was to fit seamlessly within their established workflows — leveraging their observability platform as the primary signal source, their alerting system for escalation, and their central collaboration platform as the collaboration layer. The integration also connected to their CI/CD deployment tool and feature flag management system—all without introducing friction or additional overhead for engineering teams.

To support a smooth engineering handoff and ensure Ewake fit naturally into Booksy’s operating model, the teams collaborated in two-week iteration cycles with regular syncs to capture feedback and surface emerging pain points. This rhythm enabled iterative changes to the automation logic, alert-analysis workflows, and integration touchpoints. The collaboration remained highly cross-functional, involving SRE leadership, SRE Evangelists, platform engineers, and developers to validate both technical fit and day-to-day usability. As part of the evolving tools and integration workflow, the Ewake team first integrated directly into Booksy’s alert-response flow, then—based on insights gathered—extended the integration into the canonical SRE channel to help developers quickly access reliability context and answers to SRE-related questions.
Results & Impact
Use Case 1 : SRE Support
Metrics | Value |
|---|---|
Satisfaction ratio | 64% |
Number of investigations | 691 |
Scope | SRE Team |
Adoption | Ewake Globally available for the engineering teams, to help them investigate production issues faster, and answers their day-to-day questions. |
Average response time | <80 seconds |
Direct developer support emerged as one of the most impactful applications.
“The second use case provided the most measurable relief for our SRE team. Previously, our SREs spent a significant portion of their day answering repetitive developer questions. Ewake, acting as a support layer, was highly effective at addressing these common software engineer queries and accessing our internal knowledge base, often citing or linking to the precise documentation needed. By taking over 700 of these interactions, Ewake dramatically reduced our operational toil, freeing our SREs to focus on strategic reliability projects instead of being constantly diverted”.
Booksy SRE Team
The second use case emerged naturally during the collaboration. In one of the early discovery sessions, Booksy’s teams identified a recurring pain point: SREs were stretched thin, constantly switching contexts between handling incidents and answering day-to-day developer questions. From debugging pipeline failures to explaining deployment issues, the operational toil was high — and it often diverted focus from more strategic reliability work.
To address this, Ewake was invited to join the public SRE communication channels, becoming a direct support layer for developers. Whenever a question or issue arose — “Why did my pipeline fail?” or “What’s causing this alert?” — Ewake would step in, analyze the context, and provide data-driven answers or guidance, freeing up valuable SRE time.
Use Case 2 : Agentic Alert Response
Metrics | Value |
|---|---|
Satisfaction ratio | 52% |
Number of investigations | 748 |
Scope | Faulty deployment alerts |
Adoption | Ewake automatically sends a message to the Slack alert message received by the engineering team. The message contains the initial triage information gathered by the AI agents, aiming to reduce the investigation toil for the developers. |
Average response time | <120 seconds |
“When it came to automated alert response, the value was in the filtration of noise. Ewake delivered initial, low-level triage for hundreds of alerts, instantly correlating deployment data and bringing context together. However, this was not a ready-to-implement solution; for our senior SREs, this output was often too basic for critical incidents. Still, by handling the initial context-gathering for all deployment-related alerts, Ewake proved effective at significantly reducing alert fatigue across the team”.
Booksy SRE Team
Booksy’s goal was to have Ewake automatically respond to faulty deployment alerts — cases where a specific monitor indicated that a new release had triggered a failure. When such an alert fired, Ewake would instantly launch an investigation, gathering context from the observability tool, correlating with recent deployment activity, and pinpointing the likely cause of the issue.
By turning reactive firefighting into proactive diagnosis, Ewake helped the Booksy team start cutting through the noise — transforming alert floods into focused, data-backed investigations.
Lessons Learned
During the six-month Proof of Value, Ewake was deployed across Booksy’s operational workflows, automating alert investigations, supporting developers, and expanding SRE bandwidth.
By the end of the engagement, Ewake achieved a 60% satisfaction rate, meaning six out of ten responses were viewed as useful or actionable by Booksy engineers.
This confirmed:
The product’s fit within complex, high-scale environments
Its ability to reduce operational toil
Its value in spreading reliability best practices
Its potential to serve as a real reliability multiplier across teams
Together, Booksy and Ewake demonstrated how AI-driven reliability assistance can convert operational data into immediate human value enabling teams to work smarter, respond faster, and build more resilient systems.