Data breach are one of the scariest threats for a company. Exposure of confidential data, either accidental or caused by criminals, may lead to a loss in terms of competitive advantage, and even to fines in case of personal information exposure.According to a
report from IBM, the global average cost of a data breach for an organisation is about $3.86M dollars in 2020, with about a $1M increase to $4.77M in case the breach is due to an employee's compromised credentials.
In such a scenario, companies are making huge investments in order to reduce their attack surface and prevent possible data breach. Yet, among the diverse threats that need to be safeguarded, there is one which is often underrated, while possibly compromising the life of an entire business:
hardcoded secrets in source code.
Hardcoding secrets on a public codebase could be considered same as closing the entrance door of a house and forgetting the key in the lock: this is the most straightforward and obvious way to cause a data breach, indeed hardcoded credentials do not need any particular skill to be exploited.
A recent publication from researchers at North Carolina State University shows that more than 100k public GitHub repositories are leaking secrets. Even worse, it could be an underestimation, since this study targeted only access tokens of a limited number of providers, and cryptographic keys, i.e., they did not consider hardcoded passwords.
Recently, many startups and companies (
included GitHub itself)
are increasingly relying on secret scanners, i.e., tools for scanning public (and/or proprietary) source code projects looking for hardcoded secrets.
Yet, even if the usage of scanners reduces the risk, they have some limitations:
- the existing scanners only target access tokens of a limited number of providers, usually implementing unsophisticated techniques to distinguish between a true positive (i.e., a real token), and a false positive (i.e., a token detected by the scanner that is not a real one);
- none of these scanners target passwords, mostly due to the difficulty of reducing false positives. In fact, when trying to identify passwords using a code scanner, false positives are the vast majority of the findings (more than 95%), and are very hard to reduce (any sequence of characters can be chosen as a password, regardless of its complexity).
SAP Security Research recently open sourced
Credential Digger, a source code scanner that, in addition to access tokens, takes into consideration also passwords, and uses machine learning to decrease the number of false positive discoveries. Even if it is not the miracle cure for hardcoded secrets (will it ever exist?), it is a first step towards their identification and remediation with a low manual effort.
The high cost of bad culture
Nowadays, it is common that big tech companies encourage their employees to develop personal projects, to contribute to open source, or to test new tools. This approach has many advantages: the employer invests on its employees' continuous development, giving freedom to explore new innovative ideas, and employees could in turn open new opportunities for them and the company itself. A win-win situation.
Yet, this positive and forward-looking attitude brings some risks. In fact, it may happen that employees leak credentials of their professional account in their personal open source projects. In the first section we used the metaphor of forgetting the key in the entrance door lock. This scenario is more similar to closing the entrance door, taking the key away, but leaving the windows open.
According to our observations, this phenomenon is not due to a lack of skills: there are many top tier developers who unintentionally publish their credentials in their personal projects. Rather, this appears to be a matter of culture: lack of security knowledge, together with careless, make people underestimate and feel indifferent to the impact of publishing something that should be treated as the most confidential piece of information.
It is worth noticing that publishing credentials may lead to a wide range of nasty personal consequences, like receiving
unpleasant bills, or having to fix a
compromised web server. In addition, the leaked credentials can also be professional ones,
resulting in a significant risk for companies. This scenario is very hard to prevent for companies, because
people often contribute to open source projects, which are outside of the controls and the prevention measures a company has in place.
As an example, in 2016
Uber suffered from a major data breach and leaked personal information of 57 million customers. The data breach originated from some Uber's employees who published their corporate credentials on their personal public repositories. As another example, in 2019 an employee from Comodo, a cybersecurity company, exposed his corporate email and password on a public repository on GitHub,
giving access to the company's Microsoft-hosted cloud services.
Everybody is aware of the issue represented by hardcoded credentials, and everybody knows that job credentials must not appear in personal projects (and that credentials must absolutely not be hardcoded in source code). Nevertheless, it's still happening.
The use of a code scanner can help also in this situation, but it's up to developers to use it and secure their code. If the rationale behind the need of a code scanner is not clear, it means that there is still a bad security culture. And this can make huge damages.
What is internal stays internal... doesn't it?
We would like to add a further concern which at the moment is not very considered, while it is happening more and more frequently: internal/private source code leakage.
Obviously, a company may not open source its entire codebase: part of it can stay in internal repositories only accessible by its employees. Nevertheless, the phenomenon of hardcoded credentials is not just affecting public repositories, but also private ones. Indeed, there is a psychological effect that (wrongly) suggests that what is internal will stay internal. This is a harmful common belief that, in some cases, leads to forgetting good coding conventions.
In fact, also private source code can be leaked, in many ways.
First, internal source code can be used in production. In this case, hardcoded credentials can be exploited if someone reverse engineering the application. As an example, MyCar, an app to remotely control cars
"compatible with all vehicles, which includes luxury makes, hybrids, manual-transmissions, and diesel vehicles", contained a hardcoded admin password in its source code. Since this app can be used to lock/unlock the doors and arm/disarm the alarm,
users were exposed to car theft.
Second, internal source code can be leaked as a consequence of an unauthorised access to the servers of the company, leaving all the bad coding conventions (including hardcoded credentials) exposed and publicly visible. As an example, many Samsung's proprietary projects were leaked while the storage server was
unintentionally configured as public. Yet, not only source code was leaked, but also employees' hardcoded credentials in these code repositories, included many git private keys which could have led to a second breach, even after fixing the visibility of the server.
Proprietary source code leakage in the wild
Besides traditional underground communities hiding in the dark web, source code leaks are becoming more frequent also using public servers. In particular, we have analyzed the case of a private gitlab instance which has recently become known to the chronicles for sharing proprietary source code of famous companies like Disney, Lenovo, Mercedes-Benz, and many others.
Every new leak has been blindly announced on Twitter, or in a private Telegram group.
While some of these leaks may be due to intrusions, hacking activities, or internal threats, many others are the consequence of faulty access control configurations, like in the cases of
Intel,
Mercedes-Benz and the above-mentioned Samsung.
The threats caused by the leak of private source code can go beyond a data breach caused by hardcoded credentials: vulnerabilities may be discovered with static analysis of the code, and being either exploited or sold; backdoors could be installed in products and ransomware in internal systems, once gained access to the internal network; the leak itself may affect the image of the company, reprehensible of not being able to protect its products and so its customers; and some funny epic fail moment could go down in history.
Better safe than sorry
So, to wrap up, why are hardcoded secrets a thing, and why should we care?
- Hardcoding secrets does not strictly depend on the developer's skills, it's also a matter of culture and education
- Hardcoding secrets in private code repositories is not safer than doing it in public ones
- Hardcoding personal credentials is not more dangerous (or less dangerous) than hardcoding professional ones
- Hardcoding secrets can have many consequences, and can also be the cause of a data breach since they can give access to private servers, databases, and accounts
- Code scanners can help prevent, or to quickly remediate, the problem of hardcoded secrets, but there must be a willingness to use them, and there must be education to understand their usefulness
If, while reading this blogpost, you felt like we were talking about you…:
- Always use 2FA, when available
- Rotate the exposed passwords/tokens/secrets
- Delete the secrets from the history of the repository, if using version control
- Use a code scanner!
Discover how
SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.