Enhancing REST API Testing with NLP Techniques
Published in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023), 2023
RESTful services are commonly documented using OpenAPI specifications. Although numerous automated testing techniques have been proposed that leverage the machine-readable part of these specifications to guide test generation, their human-readable part has been mostly neglected.
Resources:
Problem Statement
Natural-language descriptions in OpenAPI specifications often contain relevant information, including:
- Example values for parameters
- Inter-parameter dependencies
- Parameter constraints
- Format specifications
This information can significantly improve test generation, but existing REST API testing tools ignore the human-readable descriptions and only use the machine-readable portions of specifications.
Our Approach: NLPtoREST
We propose NLPtoREST, an automated approach that applies natural language processing techniques to assist REST API testing. The approach consists of three main phases:
1. NLP-based Rule Extraction
Vocabulary Terms Identification:
- Uses a custom Word2Vec model (restW2V) trained on 1,875,607 text sets from 4,064 REST API specifications
- Identifies sentences containing OpenAPI vocabulary terms using semantic similarity
- Matches 55 mappings between search terms and OpenAPI keywords
Value and Parameter Name Detection:
- Uses regular expressions for enumerated or quoted strings
- Applies constituency parse tree analysis as fallback
- Extracts parameter names and values from natural language descriptions
Rule Generation:
- Creates OpenAPI-compliant rules from extracted information
- Supports four rule categories:
- Parameter type/format rules
- Parameter constraint rules
- Parameter example rules
- Operation constraint rules (inter-parameter dependencies)
2. Rule Validation
Static Pruning:
- Analyzes rule combinations for syntactic compatibility
- Discards combinations incompatible with OpenAPI standard
- Dramatically reduces combinations requiring dynamic checking
Dynamic Checking:
- Generates validation test cases for rule combinations
- Executes tests against deployed API instance
- Identifies maximal combination of valid rules
Fine Tuning:
- Further validates potentially valid rules through mutations
- Repairs inter-parameter dependency rules (Or, OnlyOne, AllOrNone, ZeroOrOne)
- Validates 22 out of 26 supported rule types with precise strategies
3. Enhanced Specification Generation
- Adds validated rules to original OpenAPI specification
- Uses OpenAPI-supported keywords and extensions
- Creates enhanced specification transparently usable by any REST API test generator
Key Features
Custom NLP Model:
- Pre-trained Word2Vec model (restW2V) specific to OpenAPI terminology
- Trained on extensive dataset from thousands of API specifications
- Provides flexible matching beyond hard-coded patterns
Comprehensive Rule Support:
- Extracts 4 categories and 26 types of rules
- Handles complex inter-parameter dependencies
- Supports various constraint and format specifications
Validation Pipeline:
- Static analysis eliminates ~75% of false positives
- Dynamic validation against actual service implementation
- Fine-tuning phase repairs and validates remaining rules
Evaluation Results
Rule Extraction Effectiveness
- Recall: 94% (313 out of 333 rules extracted)
- Precision before validation: 50% (313 TP, 314 FP)
- Precision after validation: 79% (304 TP, 79 FP)
- Overall improvement: 58% increase in precision with only 3% decrease in recall
Comparison with RestCT
NLPtoREST significantly outperformed RestCT, a pattern-matching-based approach:
- NLPtoREST: Identified 15 out of 19 inter-parameter dependencies
- RestCT: Identified 0 out of 19 inter-parameter dependencies
Impact on Testing Tools
Enhanced specifications significantly improved performance of 8 state-of-the-art REST API testing tools:
Coverage Improvements:
- Branch coverage: +103% (11.35% → 23.10%)
- Line coverage: +50% (24.96% → 37.52%)
- Method coverage: +52% (22.13% → 33.54%)
Request Success Rates:
- Successful requests (2XX): +20% (21.8% → 26.2%)
- Rejected requests (4XX): -7% (59.9% → 56.0%)
- Server errors (5XX): +2% (18.3% → 18.6%)
- Unique server errors: +4% on average (up to +98.9% maximum)
Tools Evaluated
The enhanced specifications improved performance across 8 testing tools:
- EvoMasterBB: Evolutionary algorithm-based testing
- bBOXRT: Robustness testing
- Morest: Model-based testing
- RESTest: Constraint-based testing
- RESTler: Stateful fuzzing
- RestTestGen: Dependency-based testing
- Schemathesis: Property-based testing
- Tcases: Combinatorial testing
Key Contributions
- A novel NLP-based technique for extracting rules from natural-language descriptions in OpenAPI specifications
- A validation approach that significantly improves rule accuracy through static and dynamic checking
- Enhanced OpenAPI specifications that existing REST API testing tools can transparently use
- Comprehensive empirical evaluation on 9 real-world services demonstrating significant performance improvements
- Publicly available tool and experimental infrastructure
Benchmark Services
Evaluated on 9 industrial-sized REST services (>10K LoC):
- Federal Deposit Insurance Corporation (FDIC)
- LanguageTool
- OhSome
- Open Movie Database (OMDb)
- REST Countries
- Genome Nexus
- OCVN
- Spotify
- YouTube Mock
BibTeX
@inproceedings{kim2023nlp,
author = {Kim, Myeongsoo and Corradini, Davide and Sinha, Saurabh and Orso, Alessandro and Pasqua, Michele and Tzoref-Brill, Rachel and Ceccato, Mariano},
title = {Enhancing REST API Testing with NLP Techniques},
year = {2023},
isbn = {9798400702211},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3597926.3598131},
doi = {10.1145/3597926.3598131},
booktitle = {Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages = {1232–1243},
numpages = {12},
keywords = {OpenAPI Specification Analysis, Natural Language Processing for Testing, Automated REST API Testing},
location = {Seattle, WA, USA},
series = {ISSTA 2023}
}
Recommended citation: Myeongsoo Kim, Davide Corradini, Saurabh Sinha, Alessandro Orso, Michele Pasqua, Rachel Tzoref-Brill, and Mariano Ceccato. 2023. Enhancing REST API Testing with NLP Techniques. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 1232–1243.
Download Paper
