Files
Abstract
Retry---the re-execution of a task on failure---is a common mechanism to enable resilient software systems. Yet, despite its commonality and long history, retry remains difficult to implement and test.
Guided by our study of real-world retry issues, we propose a novel suite of static and dynamic techniques to detect retry problems in software. We find that the ad-hoc nature of retry implementation poses challenges for traditional program analysis but can be well suited for large language models; and that carefully repurposing existing unit tests can, along with fault injection, expose various types of retry problems.