Program debloating via stochastic optimization

Published in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER 20), 2020

Programs typically provide a broad range of features. Because different typologies of users tend to use only a subset of these features, and unnecessary features can harm performance and security, program debloating techniques, which can reduce the size of a program by eliminating (possibly) unneeded features, are becoming increasingly popular.

Resources:

Problem Statement

Most existing debloating techniques tend to focus on program-size reduction alone and, although effective, ignore other important aspects of debloating. We believe that program debloating is a multifaceted problem that must be addressed in a more general way.

Our Approach

In this spirit, we propose a general approach that formulates program debloating as a multi-objective optimization problem. Given a program to be debloated, our approach lets users specify:

A usage profile for the program (i.e., a set of inputs with associated usage probabilities)
The factors of interest for debloating
The relative importance of these factors (expressed as weights)

Based on this information, the approach defines a suitable objective function for associating a score to every possible reduced program and aims to generate an optimal solution that maximizes the objective function.

Debop: A Multi-Objective Debloating Tool

We present and evaluate Debop, a specific instance of our approach that considers three objectives:

Size reduction (sr): Reducing the program’s footprint measured in bytes
Attack-surface reduction (ar): Minimizing potential security vulnerabilities (measured by the number of ROP gadgets)
Generality (g): The extent to which the reduced program correctly handles inputs in the usage profile provided

Technical Approach

Debop formulates the debloating problem as:

p_deb = arg max O(p, p')
        p'∈Sub(p)

where the objective function O(p, p') is defined as:

General reduction: r(p, p') = (1-α)·sr(p, p') + α·ar(p, p')
Overall objective: O(p, p') = (1-β)·r(p, p') + β·g(p, p')

The weights α and β (both between 0 and 1) control the relative importance of different objectives.

Optimization via MCMC Sampling

Since enumerating all possible reduced programs is infeasible, Debop leverages Markov Chain Monte Carlo (MCMC) sampling, specifically the Metropolis-Hastings algorithm, to efficiently explore the search space and find approximate optimal solutions. The algorithm:

Starts with the original program
Iteratively generates new reduced programs by randomly adding/removing statements
Accepts or rejects candidates based on their O-score and acceptance probability
Avoids local maxima through probabilistic acceptance of worse solutions
Returns the sample with the highest O-score as the debloated program

Implementation

Debop is implemented in C++ and uses:

Clang for AST construction and statement identification
GCC (v. 7.4.0, -O3) for compilation and size measurement
ROPgadget tool for attack surface measurement (counting ROP gadgets)

Evaluation

Benchmark

We evaluated Debop on mkdir (version 5.2.1, ~28K LOCs), a Unix utility for creating directories, using:

26 tests provided with the program
2 additional tests from the BusyBox version
1,000 samples per trial
27 combinations of α (0.25, 0.5, 0.75) and β (0.1 to 0.9)

Key Results

Multiple Trade-offs (RQ1): Debop successfully generates debloated programs with different size reduction, attack-surface reduction, and generality trade-offs when provided with different α and β values. As β increases (favoring generality), generality scores increase while reduction scores decrease, confirming the approach’s flexibility.
Comparison with Chisel (RQ2): We compared Debop with Chisel, a state-of-the-art single-goal (size-only) debloating technique based on delta debugging:
- Chisel achieves higher size reduction (Debop ratio: 0.82-0.86)
- Debop achieves better attack-surface reduction (ratio: 1.07-1.13 vs. Chisel)
- Debop’s advantage: simultaneously optimizes multiple objectives, whereas Chisel focuses only on size

Insights

Debop pays a small price for generality compared to specialized techniques
Multi-objective optimization provides better overall results when considering multiple goals
The approach enables exploration of the Pareto frontier between conflicting objectives

Key Contributions

A novel, general formulation of program debloating as a multi-objective optimization problem
Debop: an instance supporting size, attack-surface, and generality objectives via MCMC sampling
Proof-of-concept evaluation demonstrating effectiveness and trade-offs
Open-source implementation and experimental infrastructure at https://sites.google.com/view/debop19

BibTeX

@inproceedings{10.1145/3377816.3381739,
  author = {Xin, Qi and Kim, Myeongsoo and Zhang, Qirun Zhang and Orso, Alessandro},
  title = {Program debloating via stochastic optimization},
  year = {2020},
  isbn = {9781450371261},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3377816.3381739},
  doi = {10.1145/3377816.3381739},
  abstract = {Programs typically provide a broad range of features. Because different typologies of users tend to use only a subset of these features, and unnecessary features can harm performance and security, program debloating techniques, which can reduce the size of a program by eliminating (possibly) unneeded features, are becoming increasingly popular. Most existing debloating techniques tend to focus on program-size reduction alone and, although effective, ignore other important aspects of debloating. We believe that program debloating is a multifaceted problem that must be addressed in a more general way. In this spirit, we propose a general approach that allows for formulating program debloating as a multi-objective optimization problem. Given a program to be debloated, our approach lets users specify (1) a usage profile for the program (i.e., a set of inputs with associated usage probabilities), (2) the factors of interest for debloating, and (3) the relative importance of these factors. Based on this information, the approach defines a suitable objective function for associating a score to every possible reduced program and aims to generate an optimal solution that maximizes the objective function. We also present and evaluate Debop, a specific instance of our approach that considers three objectives: size reduction, attack-surface reduction, and generality (i.e., the extent to which the reduced program handles inputs in the usage profile provided). Our results, albeit still preliminary, are promising and show that our approach can be effective at generating debloated programs that achieve a good trade-off between the different de-bloating objectives considered. Our results also provide insights on the performance of our general approach when compared to a specialized single-goal technique.},
  booktitle = {Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results},
  pages = {65–68},
  numpages = {4},
  location = {Seoul, South Korea},
  series = {ICSE-NIER '20}
}

Recommended citation: Qi Xin, Myeongsoo Kim, Qirun Zhang, and Alessandro Orso. 2020. Program debloating via stochastic optimization. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER 20). Association for Computing Machinery, New York, NY, USA, 65–68.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)