Static application security testing (SAST) is a common essential step in the development lifecycle of large software companies like SAP. It enables detection of critical vulnerabilities in an application source code before deployment, when fixing the problem is the least expensive.
While SAST have many known limitations, the impact of coding style on their ability to discover vulnerabilities remained largely unexplored and the following questions emerge:
What does it mean when a SAST tool reports the green traffic light indicating that no vulnerability was detected?
Was the entire source code fully analyzed or were there any code areas left unexplored, leaving dangerous vulnerabilities under the carpet?
While we targeted the two most used web application languages, similar code patterns can be created for other programming languages and, very likely, similar results would be obtained.
By discovering the presence of these tarpits during the software development lifecycle, our approach can provide important feedback to developers about the testability of their code. It can also help them to better assess the residual risk that the code could still contain vulnerabilities even when static analyzers report no findings. Finally, our approach can also point to alternative ways to transform the code to increase its testability for SAST.
Our experiments show that testability tarpits are very common. For instance, an average PHP application contains over 21 of them and even the best state-of-art static analysis tools fail to analyze more than 20 consecutive instructions before encountering one of them.
To assess the impact of tarpit transformations over static analysis findings, we experimented with both manual and automated code transformations designed to replace a subset of patterns with equivalent, but more testable, code. These transformations allowed existing SAST tools to better understand and analyze the applications, and lead to the detection of 440 new potential vulnerabilities in 48 projects. We responsibly disclosed all these issues: 31 projects already answered confirming 182 vulnerabilities. Out of these confirmed issues-- that remained previously unknown due to the poor testability of the applications code-- there are 38 impacting popular Github projects (>1k stars), such as PHP Dzzoffice (3.3k), JS Docsify (19k), and JS Apexcharts (11k). 25 CVEs have been already published and we have others in-process.
You got a short summary of our work, hope you enjoyed. If you want to see some technical details just continue reading...
An example of tarpit for SAST
To illustrate what a tarpit for SAST is, let us consider the code example shown in the previous picture and here enlarged reported for simplicity.
That code is an excerpt of the Mantis Bug Tracking application found vulnerable to a File injection vulnerability in 2011, allowing remote attackers to include and execute arbitrary local files (more details in CVE-2011-3357).
Specifically, the file that is executed via the require_once instruction is dynamically defined by the value of the $act_file variable (line 20 in our example). This instruction is what in static analysis terminology is referred to as a sink, i.e., a dangerous instruction that must have clean (sanitized) data. Indeed, if an attacker can influence $act_file (the data processed by the dangerous operation), then she can influence the file that will be executed. By constructing the backward propagation from variable $act_file, we can see that it depends on $_POST[$name] and thus on whatever a user passes to the application via the HTTP POST method as value for the parameter $name:
In static analysis terminology the $_POST[$name] is referred to as a source i.e., locations in the program where data is being read from a potentially risky source. When a dataflow path exists between a source and a sink without a proper sanitization, then an injection attack may be built and the SAST tools should report it. For instance, if an attacker makes an HTTP POST request including action=<attacker_payload>, then the file that will be included and executed in line 20 will depends on <attacker_payload>. This is referred to a File injection attack.
Unfortunately, 4 over 6 of the SAST tools we tried in our experiments (this included two commercial tools and 4 state-of-the-art open-source ones), were not able to detect that vulnerability. Our hypothesis is that the call_user_func_array dynamic dispatching feature in line 12 is confusing those SAST tools that are missing the File injection vulnerability.
How can we validate this hypothesis so to evaluate if call_user_func_array as used in line 12 is indeed a tarpit for those SAST tools? Our idea is to craft a testcase for the SAST tools based on that tarpit. We will refer to these testcases as testability pattern instances.
Testability pattern instances share a common structure shown in this puzzle picture, where the tarpit is encapsulated between a source-sink dataflow vulnerable to a cross-site scripting (XSS, the most common injection vulnerability that SAST tools can detect). The idea is that if a SAST tool cannot detect the XSS, it must be because of the tarpit. The tarpit may require some additional companion code to be fully executable.
By concretizing this discussion on our example, the common skeleton (blue part in the puzzle) for a testability pattern instance would be:
This code reads a parameter from an HTTP GET request and just print the parameter’s value in the web page without any sanitization. Notice that SAST tools not detecting the expected XSS on this trivial skeleton are just excluded.
By adding the call_user_func_array tarpit as in line 12 of our example, the testability pattern instance becomes:
The part in bold indicates the tarpit and its code companion (the function). Running a SAST tool against this testability pattern instance amount to evaluate whether that tool gets confused or not by that usage of the call_user_func_array. Indeed, if the SAST tool does not report the expected XSS, then we can derive that the tarpit confuses the tool.
To further validate our hypothesis, we removed the tarpit via a simple refactoring of line 12 into:
$r = gpc_get($args); // no tarpit anymore
Indeed, line 12 was making use of a dynamic feature of PHP, even if the function to be called was hardcoded and known already at static time. This simple refactoring was sufficient to remove the tarpit and to enable the SAST tools to detect the File injection vulnerability, confirming the tarpit was indeed at line 12 and only there.
What to do with tarpits for SAST
You do not do much with one or few pattern instances. You can just claim that a SAST tool does not support these and those tarpits. However, when you start creating many of them, trying to be comprehensive with respect to a programming language, then you can do very interesting things.
We reviewed the documentation, the internal specifications, and the APIs of both PHP and JS and distilled this information into hundreds of potential tarpits that emphasize different functionalities. We then embedded these tarpits into testability pattern instances as the one we illustrated above. For instance, 6 pattern instances were created just to capture different variants of the call_user_func_array (e.g., a variant where the first parameter is not hardcoded, but is rather a variable; another where that parameter is a variable concatenated with a constant string; etc). Similar pattern instances are further clustered into a testability pattern that provides an overall textual description of the tarpits that are captured and simplify their presentation to end-users.
Now that we have all these testability pattern instances, capturing many tarpits and covering a significant spectrum of the targeted programming language, we aim to perform three key activities:
Measurement: evaluate SAST tools against our pattern instances
Discovery: make developers aware of the tarpits in their code via automated discovery rules
Mitigation: make apps more testable for SAST by removing tarpits via transformations or improve the SAST tool
Measurement of SAST tools
As mentioned, each testability pattern instance is like a testcase for SAST to determine whether the SAST tool support or not the tarpit capture in the instance. We tested all our pattern instances against a set of commercial and open sources SAST tools to identify the tarpits that could impede the testability of an application for each of these tools. We used 6 SAST tools for PHP: 2 commercial and 4 open-sources ones (RIPS , PHPsafe , WAP , and Progpilot ). Similarly, we used 5 SAST tools for JS: 3 commercial and 2 open-source ones (LGTM  and NodeJSScan ).
The detailed results are presented in the graphs below and we refer the interested reader to our technical report  for more details. Here we focus only on the overall score (see the bars in blue labelled as “All”). The best commercial tools were only able to handle 50% of the PHP and 60% of the JS tarpits, thus potentially leaving large parts of an application code unexplored.
SAST measurement over PHP tarpits
SAST measurement over JS tarpits
Our testability pattern instances are available for the community and SAST tools’ owners can thus use them to measure the progresses of their tools against tarpits and to improve their support rate over time.
Discovery: make developer aware of SAST tarpits
Measuring SAST tools against tarpits is good, as long as those tarpits are used in the real world. If they aren’t, the fact that a SAST tool does not support them is less impactful. To evaluate the impact on those unsupported tarpits, we implemented automated discovery rules for all our PHP patterns and used them to scan 3341 open-source PHP applications borrowed from the following four datasets:
GH: 1000 applications with high popularity in Github (more than 1000 stars)
GM: 1000 applications with medium popularity in Github (between 200 and 700 stars)
GL: 1000 applications with low popularity in Github (between 20 and 70 stars)
SC: all the 341 applications from Sourcecodester , which hosts open-source PHP projects that serve as references to other developers that want to implement their websites
The results, shown in the graph below, demonstrate that the prevalence of our tarpits is very high in the real world. The horizontal axis indicates how many pattern instances per line of code were discovered. The vertical axis indicates how many of the discovery rules created for our pattern instances returned tarpit occurrences in an application. The average project contains 21 different tarpits and even the best SAST tool cannot process more than 20 consecutive instructions without encountering a tarpit that prevents it from correctly analyzing the code. Again, we redirect the interested reader to access our technical report  for more details about the discovery rules and the prevalence analysis of our tarpits.
Prevalence of PHP tarpits
The ability to automatically discover each tarpit brings many benefits. It can provide immediate and precise feedback to the developers about the tarpits in their code (e.g., by integrating the discovery rules into an IDE). This information can then be used to make an informed decision about which combination of SAST tools are better suited to analyze the code, which parts of the application are blind spots for a static analyzer and thus may require a more extensive code review process, and which region of code could be refactored into more testable alternatives.
Mitigation: make apps more testable for SAST or improve SAST tools
Experimental results, both from measurement and from discovery, show that our tarpits are problematic for SAST tools and that they are prevalent in the real world. All in all, the testability for SAST, intended as how good are SAST tools to test applications, is problematic. This is further demonstrated by the outcomes that emerged in our additional experiments (see below).
How can this problem be mitigated? Two options can be envisaged:
Improve SAST tools
Make applications more testable for SAST
Indeed, owners of SAST tools can use our publicly available libraries of testability patterns for SAST  (we are currently enriching these libraries so if you plan to use them get in touch with us if you want to use the latest version) to determine which tarpits are not supported and so to improve the tool in forthcoming releases to increase the support rate. The libraries could be used to monitor the progress of SAST tools toward tarpits. Since we are not owner of a SAST tool we did not explore this option.
In our research, we explored the second option that is more interesting in the context of software company like SAP. In doing so, we also achieved very good results that demonstrate that we can increase testability for SAST and detect more vulnerabilities.
Make applications more testable for SAST
We performed two experiments to assess the use of code refactoring as a mean to make an application more testable for SAST tools. In the first, we manually investigate five PHP and five JS applications, for which SAST tools were unable to discover the presence of known vulnerabilities. By transforming (manually) the testability tarpits in those applications we enabled the tools to detect the vulnerabilities. Moreover, over 200 additional bugs were reported, leading us to the disclosure of 71 confirmed vulnerabilities, as some of the discovered issues still applied to the latest version of the tested projects. In the second experiment, we target instead thousands of popular real-world applications (the same we used for the prevalence experiment), to which we apply five pattern transformations in a fully automated fashion. Our tool modified 1170 applications, by transforming 32,192 occurrences of the five tarpits. By running SAST tools both before and after the transformations we could observe the improvement in the overall testability, supported by the detection of ~9000 new findings over which we inspected ~2700 entries uncovering hundreds of previously unknown vulnerabilities. In particular, we discovered 370 vulnerabilities in 43 different applications, 55 of which affected very popular projects with more than 1000 stars in Github. We responsibly disclosed all issues, and we have received 111 confirmations from the development teams (36 confirmations for the popular projects). These outcomes confirm the added value of our approach and the impact of removing tarpits to increase testability for SAST tools.
More details about these two transformation experiments are available in our technical report .
Transformation of testability tarpits is a very interesting and challenging research topic. Clearly not all the tarpits can be automatically transformed by preserving the semantic of the program. Sometimes we can transform losing the semantic but ensuring that if the original program was vulnerable, then also the transformed program, easier to test, is vulnerable (transformation that preserves the vulnerability). In all the other cases, automated transformations would be impossible without some help from the development team of the application.
SAST tools are subject to testability issues that may prevent them from detecting important vulnerabilities. Just accepting a green light from the SAST tool without knowing what fragments of the application were analyzed may just hide vulnerabilities under the carpet.
By devising measurable, discoverable, and possibly transformable tarpits for SAST we can get higher awareness of what a SAST tool is analyzing and even improve the testability for SAST by acting on the SAST tool itself or on the application code.