Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-based Specification Mining

Frameworks and libraries provide application programming interfaces (APIs) that serve as building blocks in modern sofware development. As APIs present the opportunity of increased productivity, it also calls for correct use to avoid buggy code. Usage-based specifcation mining technique has shown great promise in solving this problem through a data-driven approach. These techniques leverage the use of the API in large corpora to understand the recurring usage of the API and infer behavioral specifcations (preconditions and postconditions) from such usage. A challenge for such technique is thus inference in the presence of insufficient usage, in terms of both frequency and richness. We refer to this as a sparse usage problem. This paper presents the first technique to solve the sparse usage problem in usage-based precondition mining. Our key insight is to leverage implicit beliefs to overcome sparse usage. An implicit belief is a fact about language structure and semantics known to the programmer, and thus not explicitly documented in code. The technical underpinnings of our new precondition mining technique include a technique to observe the semantics of code structures leading to an API call to infer preconditions that are implicitly present in the code corpus, a strategy to detect these implicit beliefs in this paper based on the data and control fow related properties, a catalog of 35 code elements in total that can be used to derive implicit belief from a program, and empirical evaluation of all of these ideas. We have analyzed over 350 millions lines of code and 7 libraries that suffer from sparse usage problem. Our approach realizes 6 implicit beliefs and we have observed that addition of single level context sensitivity can further improve the result of usage based precondition mining. The result shows that we achieve overall 60% in precision and 69% in recall and the results relatively improved by 32% in precision and 78% in recall compared to base usage-based mining approach for these libraries.

Approach overview

From the input code corpus, we build a control flow graph for each method. The implicit beliefs are derived by recognizing the corresponding code elements. Each implicit belief is propagated in subsequent paths. Preconditions are then inferred from explicit conditions and implicit beliefs guarding API calls.