Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-based Specification Mining

Download the Ground-truth of Preconditions

We have used a large code corpus (Allamanis and Sulton 2013) consisting of 14,785 projects. The large code corpus is curated using Github's social fork system in a way to isolate the low quality projects that are rarely forked. The corpus contains over 350 million lines of source code where only files written in Java language are considered. Figure 9a shows the complete statistics of the used datasets. The dataset includes in total 1,212,124 API methods calls (Figure 9b) from 7 different libraries of interest.