Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-based Specification Mining

We studied the characteristics of the preconditions that were correctly-mined, incorrectly-mined and missing from our approach.

Correctly Mined Preconditions

For the APIs in our dataset, correctly mined preconditions falls into three categories: null comparison, primitive comparison and method invocation. The first one contains simple preconditions comparing arguments with null such as ARG!=null. The preconditions in the second category contains arguments of primitive types comparing to constants or other primitive arguments such as ARG1≥0 and ARG1≤ARG2. The last one containing method calls, on receivers such as !Receiver.hasNext() or on arguments such as !ARG.isEmpty() or on both. Table 1 shows the numbers of preconditions for each category correctly mined from the base approach and the additional ones from our approach using implicit beliefs. The numbers in parentheses are the percentages over all expected preconditions in the corresponding category.

The result shows that our approach could mine additional preconditions that were missed by the base one in all three categories. However, the improvement was mainly in simpler preconditions which involve comparing arguments against null or comparing between primitive values. This is due to the fact that the code elements mostly contain implicit beliefs for inferring those kinds of preconditions. Three in six of the implemented components, Object Instance Creation (OIC), Type Comparison (TC) and Null Dereference (ND), look for non-null property of API components. Our tool mined these preconditions for the chosen libraries (Table 1) when related implicit belief is present. Count-controlled Loop (CCL) component only infers the conditions checking the index/counter of the loop against its bounds. Local Exception (LE) could only remove incorrectly-mined preconditions but not add correctly-mined ones. Short Circuit Evaluation (SCE) is the only component that could infer new complex preconditions involving multiple API components.

Incorrectly Mined Preconditions

We have examined the incorrectly mined preconditions to find out the reason behind the occurrences of such conditions. Table 2 shows these categories of incorrectly mined preconditions for different libraries. The first category contains incorrectly-mined preconditions which are stronger than required. For example, API parse(InputSource, DefaultHandler) in class javax.xml.parsers.SAXParser parses the InputSource parameter using the specified DefaultHandler parameter. It makes sense from usage point of view to not pass an null default handler. Through using implicit belief, the tool will also mine the precondition that the second parameter cannot be null. However, that condition is not required by API and, thus, introduces the incorrectly mined precondition. These cases increases the number of stronger preconditions compared to the base usage-based mining approach and, therefore, increases the number of preconditions incorrectly mined by our approach. The libraries CHART, DATA and SWING resulted in mining more stronger conditions compared to base usage based mining approach.

We also observed incorrectly-mined preconditions which are weaker than required. For example, the mined preconditions is ARG!=-1 while the required one is ARG≥0. However, all of these weaker preconditions came from the explicit guard conditions present in the code instead of from implicit beliefs. Another major type of incorrect conditions in our result is the project-specific conditions. To some extent, they are also stronger than needed. However, we put them in a separate category because these mined preconditions are only frequent in certain projects. As some APIs do not have rich use cases across multiple projects, many project-specific conditions were considered as preconditions for these APIs. Result on XML library suffers from this the most, we examined and found that most of the call sites in the dataset calls the APIs for a document, until reaching the end of the document. Hence conditions such as hasnext() or !isEndDocument() is not part of specification for most of the APIs of this library, but are frequently present before the calls of the APIs because of the access pattern. Despite that, our implicit belief from local exception component successfully removed conditions guarding project-specific exception and lowered the number of incorrect preconditions from project-specific conditions.

Missing Preconditions

Although our approach was able to detect more preconditions through implicit belief compared to existing usage based mining approach, we still miss preconditions. We have analyzed the missing classes of preconditions that our approach was unable to find given the dataset we have used for 7 libraries of interest. We found that we did not miss any preconditions that the base approach could mine. We have mainly observed three categories of missing preconditions shown in Table 3.

No Check. The first category of missing cases are those preconditions that does not appear explicitly nor implicitly at all in the code corpus before calling the APIs. The absence of such preconditions are mainly caused by four factors:

Exception Handling: Sometimes programmers might be unsure about the preconditions of the APIs, thus, they would surround API calls with try statements and catch clauses to handle the exceptions. This is often the reason why code corpus do not have the necessary preconditions for APIs before calling.
Semantics Guarantee No Exception: Sometimes developers are well aware that the domain semantics of their project guarantees that certain preconditions of an API always hold. For example, if they know the files are always non-empty, then the list of lines read from the files are always non-empty too. Therefore, accessing first element of the list is always possible. In such cases, checking whether precondition is met before calling the API becomes extraneous, resulting in the absence of such preconditions.
Intentional Throw Exception: Programmers often use test cases to ensure code correctness. In such scenario, programmers may intentionally provide cases where the API will surely throw exception. The exceptional cases will confirm the programmer the code behavior is correct. These type of cases usually do not contain any preconditions before calling the API.
Unintentional Throw Exception: The other subcategory that we observed in our code corpus is that programmers incorrectly used the API without checking the required preconditions. Incorrect usage of any API results in throwing exceptions. This type of buggy code does not contain required preconditions.

Infrequent. Those preconditions are present in the client code but not frequent enough to be considered as correct preconditions by the mining tool.
Private. The final category belongs to the type of preconditions that involve private/internal members of the APIs, that cannot be accessed outside the implementation of the APIs. They are, therefore, not possible to be observed in the client before calling the API.