In a previous article, we discussed why hash values are crucial in evidence collection and digital forensics. Following on from that, it’s worth discussing why Pagefreezer specifically makes use of the SHA-256 hashing algorithm when applying a digital signature to one of our records.
The Benefits of SHA-256
We use SHA-256 because this 256-bit key is much more secure than other common hashing algorithms. Without going into too much technical detail, here are the key benefits of SHA-256:
- It’s a secure and trusted industry standard: SHA-256 is an industry standard that is trusted by leading public-sector agencies and used widely by technology leaders.
- Collisions are incredibly unlikely: There are 2256 possible hash values when using SHA-256, which makes it nearly impossible for two different documents to coincidentally have the exact same hash value. (More on this in the following section).
- The avalanche effect: Unlike some older hashing algorithms, even a very minor change to the original information completely changes the hash value—what is known as an avalanche effect.
The main reason technology leaders use SHA-256 is that it doesn’t have any known vulnerabilities that make it insecure and it has not been “broken” unlike some other popular hashing algorithms.
To better understand what this means, we need to look at the history of some other popular hashing algorithms. But before we do that, we should recap what exactly hash values are. I would recommend reading the above mentioned article in full—and downloading this handy reference guide—for a more complete explanation, but the section below provides a quick overview.
What Is a Hash Value?
The Cybersecurity and Infrastructure Security Agency (CISA) defines a hash value, or hash function, as:
A fixed-length string of numbers and letters generated from a mathematical algorithm and an arbitrarily sized file such as an email, document, picture, or other type of data. This generated string is unique to the file being hashed and is a one-way function—a computed hash cannot be reversed to find other files that may generate the same hash value. Some of the more popular hashing algorithms in use today are Secure Hash Algorithm-1 (SHA-1), the Secure Hashing Algorithm-2 family (SHA-2 and SHA-256), and Message Digest 5 (MD5).
In simple terms, a hash value is a unique number string that’s created through an algorithm, and that is associated with a particular file. If the file is altered in any way, and you recalculate the value, the resulting hash will be different. In other words, it’s impossible to change the file without changing the associated hash value as well. So if you have two copies of a file, and they both have the same hash value, you can be certain that they are identical.
A hash value guarantees authenticity thanks to four particular characteristics:
- It is deterministic, meaning that a specific input (or file) will always deliver the same hash value (number string). This means that it is easy to verify the authenticity of a file. If two people independently (and correctly) check the hash value of a file, they will always get the same answer.
- The odds of “collisions” are low. This means that the chances of two different inputs (files) coincidentally having the exact same hash value are incredibly small—practically non-existent.
- A hash can be calculated quickly. Generating a hash value is quick and easy (provided you have the right tool). The size of the file in question is also irrelevant—generating a hash value for a large file is as simple as creating one for a small file.
- Any change to the input will change the output. Even the smallest change to the input file will result in a change to the resulting hash value. This means that it is impossible to alter a file without changing the associated hash value, which makes it very easy to prove (or disprove) the authenticity of a piece of digital evidence.
Not All Hashing Algorithms Are Created Equal
As CISA mentions in its definition of a hash function, some of the most popular algorithms are Message Digest 5 (MD5), Secure Hash Algorithm-1 (SHA-1), and the Secure Hashing Algorithm-2 family (SHA-2 and SHA-256).
Predictably, these are also the hashing algorithms that are often used when generating digital signatures and authenticating digital records.
The problem is that, while they are all often used to verify data integrity, only SHA-256 is still secure—MD5 and SHA-1 have known vulnerabilities.
MD5 has been around since 1991 and is now thoroughly “broken.” As mentioned in the previous section, for a hash value to guarantee authenticity, the odds of a collision need to be incredibly low—meaning the chances of two different inputs coincidentally having the same hash value must be practically zero.
The issue with MD5 is that it is very susceptible to intentional collisions—known as collision attacks—that try to produce two different inputs which result in the same hash value. In fact, a basic computer and a tool like HashClash can now generate collisions in no time at all—we’re talking about minutes, if not seconds.
The Great SHA-1 Collision
Like MD5, the popular SHA-1 algorithm is also broken. As far back as 2005, a convincing theory for how SHA-1 could be broken was proposed by researchers, and the National Institute of Standards and Technology (NIST) immediately suggested that federal agencies move to SHA-2. In 2017, this theoretical vulnerability was made very real when Google announced the first official SHA-1 collision.
“Today, more than 20 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision,” read a statement released by Google. “This represents the culmination of two years of research that sprung from a collaboration between the CWI Institute in Amsterdam and Google. We’ve summarized how we went about generating a collision below. As a proof of the attack, we are releasing two PDFs that have identical SHA-1 hashes but different content.
“For the tech community, our findings emphasize the necessity of sunsetting SHA-1 usage. Google has advocated the deprecation of SHA-1 for many years, particularly when it comes to signing TLS certificates. As early as 2014, the Chrome team announced that they would gradually phase out using SHA-1. We hope our practical attack on SHA-1 will cement that the protocol should no longer be considered secure.”
This YouTube video provides a good overview of the SHA-1 collision.
If you’re looking for a deeper dive, you can also have a look at this detailed presentation by the team responsible for the collision.
Since 2017, this work has been taken even further. Researchers have succeeded in creating what are known as chosen-prefix collisions, which allows for much more manipulation of forged data.
“Finding a practical collision attack breaks the hash function badly of course, but the actual damage that can be done with such a collision is somewhat limited as the attacker will have little to no control on the actual data that collides,” one of the researchers, Thomas Peyrin, told ZDNet after the paper was published. “A much more interesting attack is to find a so-called ‘chosen-prefix collision,’ where the attacker can freely choose the prefix for the two colliding messages. Such collisions change everything in terms of threat because you can now consider having collisions with meaningful data inside (like names or identities in a digital certificate, etc).”
Don’t Rely on Old Technology with Vulnerabilities
Since a much better option is available, there is no reason to make use of hashing algorithms that have known vulnerabilities.
NIST’s official stance on SHA-1 is the following: “Federal agencies should stop using SHA-1 for generating digital signatures, generating timestamps and for other applications that require collision resistance.”
Yet, despite this, many private-sector companies continue to use SHA-1 (and sometimes even MD5)—a decision that opens up their data to questions of accuracy and authenticity.
We believe in always taking a best-practices approach. And we take data security very seriously at Pagefreezer. That’s why we are ISO 27001 certified and SOC 2 compliant. It’s also why we use SHA-256. We want the authenticity of our records to be beyond question.
Want to learn more about hash values and the authentication of digital evidence? Download our reference guide, Authenticating Digital Evidence Under FRE 902(13) and (14): Using Digital Signatures (Hash Values) and Metadata to Create Self-Authenticating Digital Evidence.