Resources
SAP’s Blueprint for Faster, Higher-Quality Releases
How SAP Delivers Reliable Mission-Critical Software its Customers Can Trust
Overview
SAP, a global leader in enterprise software, is renowned for helping businesses operate more effectively through cutting-edge technology. At the heart of SAP’s infrastructure is the HANA Cloud database team, which builds the high-performance, cloud-native systems that power some of the most demanding enterprise workloads.
To uphold the highest standards of reliability and speed—especially in such a mission-critical environment—SAP needed a more efficient way to uncover and fix hard-to-reproduce bugs in their large-scale multithreaded C++ codebase before they hit customers.
The Challenge
SAP HANA engineers are responsible for a cloud-native data management system with:
- High concurrency (dozens of CPU cores)
- Massive memory footprints (terabytes of RAM)
- Zero tolerance for failure
Despite rigorous stress testing, issues would sometimes appear that were:
- Difficult to reproduce
- Hard to root-cause
- Time-consuming to debug
A typical scenario involved a test that triggered a crash. But when developers attempted to replicate the issue—even with the same workload—they would either:
- Fail to reproduce the bug entirely, or
- Encounter a completely different issue
This made debugging unpredictable, expensive, and frustrating.
The Solution
SAP integrated Undo’s time-travel debugging and thread fuzzing technology directly into its test infrastructure. This allowed them to:
- Record system behavior during tests
- Debug the recording by traveling backwards in time in the recording to quickly locate the root cause of bugs in one single debug cycle
Key Results
Faster Root-Cause Analysis
- Engineers are able to cut debugging cycles short and troubleshoot hard-to-reproduce issues within hours, not months
- Engineers have complete visibility into what the program is doing over time
Accelerated Development Cycles
- New features can be developed faster, with fewer interruptions from complex defects
- Developers spent more time shipping value, less time stuck in reproducibility loops
Enhanced Developer Productivity
- By integrating Undo into their testing farm, SAP uses automation by letting the machines do the work and run nonstop to provoke corner-case failures
- When issues were detected, developers received recordings, not mystery puzzles
In mission-critical systems with demanding data management workloads, debugging bottlenecks can stall innovation. SAP’s adoption of Undo provided a step change in their development process—removing friction, reducing risk, and unlocking engineering velocity.
Undo helps you to be way more productive: the creation of new features and new software is dramatically accelerated.
Alexander Boehm, Distinguished Engineer, SAP HANA Cloud Team
A Trusted Partnership
SAP views Undo not just as a tool vendor, but as a long-term engineering partner. Undo’s team worked closely to adapt to SAP’s environment and ensure that the integration supports the scale and complexity of their systems.
Conclusion
With Undo’s debugging technology embedded in their toolchain, SAP now releases with greater speed, reliability, and confidence. For teams building the foundational software of global enterprises, that’s a game-changer.