Automated test generation for REST APIs: no time to rest yet
Published in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022), 2022
Modern web services routinely provide REST APIs for clients to access their functionality. These APIs present unique challenges and opportunities for automated testing, driving the recent development of many techniques and tools that generate test cases for API endpoints using various strategies.
Resources:
Problem Statement
Understanding how REST API testing techniques compare to one another is difficult, as they have been evaluated on different benchmarks and using different metrics. This creates a gap in understanding the landscape of automated REST API testing and makes it challenging to guide future research in this area.
Our Approach
We performed a comprehensive empirical study to understand the landscape in automated testing of REST APIs. Our study involved:
- Systematic Tool Selection: Through a thorough literature search, we identified 10 state-of-the-art REST API testing tools, including both academic and practitioners’ tools
- Comprehensive Benchmark: We applied these tools to 20 real-world open-source RESTful services
- Multi-faceted Evaluation: We analyzed performance in terms of:
- Code coverage achieved (lines, branches, methods)
- Unique failures triggered (500 errors, failure points, library failure points)
Tools Evaluated
The study included 10 tools with diverse testing strategies:
White-box Tool
- EvoMasterWB: Uses evolutionary algorithms with code coverage feedback
Black-box Tools
- EvoMasterBB: Random testing with stateful request generation
- RESTler: Dependency-based algorithm with dictionary-based fuzzing
- RestTestGen: Dependency-based with mutation and dynamic value generation
- RESTest: Model-based testing with constraint solving
- Schemathesis: Property-based testing
- Dredd: Sample-value-based testing
- Tcases: Model-based combinatorial testing
- bBOXRT: Robustness testing
- APIFuzzer: Random-mutation-based fuzzing
Key Findings
Coverage Results
- Overall, all tools achieved relatively low coverage on many benchmarks
- Best performer (EvoMasterWB) achieved less than 53% line coverage on average
- Best black-box tool (EvoMasterBB) achieved 45.41% line coverage
Key Limitations Identified
- Parameter Value Generation: Tools generate many invalid requests that are rejected by services due to:
- Domain-specific value requirements
- Data format restrictions
- Lack of sophisticated input generation strategies
- Operation Dependency Detection: Most tools either:
- Do not account for producer-consumer dependencies between operations
- Use simple heuristics leading to false positives/negatives
- Fail to generate stateful tests effectively
- Specification-Implementation Mismatch: Discrepancies between API specifications and actual implementations hinder tool effectiveness
Fault Detection
- Strong positive correlation between code coverage and number of faults exposed
- Failures in library methods often have more serious consequences than failures in service code
- Exercising operations with different parameter combinations and input types helps reveal more faults
Implications for Future Research
Based on our findings, we identify several promising directions:
- Better Input Parameter Generation:
- Leverage information embedded in API specifications
- Extract sample values from parameter descriptions using NLP
- Use symbolic execution for white-box approaches
- Apply more sophisticated testing techniques (e.g., higher-level combinatorial testing)
- Improved Stateful Testing:
- Develop more accurate dependency detection mechanisms
- Use static analysis when source code is available
- Apply NLP techniques to specification descriptions
- Leverage machine learning for dependency inference
- Enhanced Validation:
- Analyze server logs and error messages for guidance
- Better handling of dynamic responses
- Improved matching of fields across different types
Proof-of-Concept Results
We implemented preliminary prototypes to validate our suggestions:
- Parameter Description Analysis: Identified developer-suggested values for 32% of parameters in two services
- NLP-based Dependency Detection: Automatically detected 8 of 12 unique inter-parameter dependencies using dependency parsing
- Textual Similarity for Dependencies: Correctly identified almost 80% of operations involved in dependency relationships using top-3 textual matches
Key Contributions
- A comprehensive empirical study of 10 REST API testing tools on 20 benchmarks
- Analysis of strengths and weaknesses of existing techniques and their underlying strategies
- Concrete suggestions for improvement with proof-of-concept evaluations
- Implications for future research in REST API testing
- Publicly available artifact with tools, benchmarks, and experimental infrastructure
BibTeX
@inproceedings{kim2022restapi,
author = {Kim, Myeongsoo and Xin, Qi and Sinha, Saurabh and Orso, Alessandro},
title = {Automated test generation for REST APIs: no time to rest yet},
year = {2022},
isbn = {9781450393799},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3533767.3534401},
doi = {10.1145/3533767.3534401},
booktitle = {Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages = {289–301},
numpages = {13},
keywords = {RESTful APIs, Automated software testing},
location = {Virtual, South Korea},
series = {ISSTA 2022}
}
Recommended citation: Myeongsoo Kim, Qi Xin, Saurabh Sinha, and Alessandro Orso. 2022. Automated test generation for REST APIs: no time to rest yet. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 289–301.
Download Paper
