Schedule a Demo

BLOG

See the latest news and insights around Information Governance, eDiscovery, Enterprise Collaboration, and Social Media. 

All Posts

Why You Should Use SHA-256 in Evidence Authentication

In a previous article, we discussed why hash values are crucial in evidence collection and digital forensics. Following on from that, it’s worth discussing why Pagefreezer specifically makes use of the SHA-256 hashing algorithm when applying a digital signature to one of our records.

The Benefits of SHA-256

We use SHA-256 because this 256-bit key is much more secure than other common hashing algorithms. Without going into too much technical detail, here are the key benefits of SHA-256: 

  • It’s a secure and trusted industry standard: SHA-256 is an industry standard that is trusted by leading public-sector agencies and used widely by technology leaders. 
  • Collisions are incredibly unlikely: There are 2256 possible hash values when using SHA-256, which makes it nearly impossible for two different documents to coincidentally have the exact same hash value. (More on this in the following section).  
  • The avalanche effect: Unlike some older hashing algorithms, even a very minor change to the original information completely changes the hash value—what is known as an avalanche effect. 

The main reason technology leaders use  SHA-256 is that it doesn’t have any known vulnerabilities that make it insecure and it has not been “broken” unlike some other popular hashing algorithms.


To better understand what this means, we need to look at the history of some other popular hashing algorithms. But before we do that, we should recap what exactly hash values are. I would recommend reading the above mentioned article in full—and downloading this handy reference guide—for a more complete explanation, but the section below provides a quick overview.

What Is a Hash Value?

The Cybersecurity and Infrastructure Security Agency (CISA) defines a hash value, or hash function, as:

A fixed-length string of numbers and letters generated from a mathematical algorithm and an arbitrarily sized file such as an email, document, picture, or other type of data. This generated string is unique to the file being hashed and is a one-way function—a computed hash cannot be reversed to find other files that may generate the same hash value. Some of the more popular hashing algorithms in use today are Secure Hash Algorithm-1 (SHA-1), the Secure Hashing Algorithm-2 family (SHA-2 and SHA-256), and Message Digest 5 (MD5).

In simple terms, a hash value is a unique number string that’s created through an algorithm, and that is associated with a particular file. If the file is altered in any way, and you recalculate the value, the resulting hash will be different. In other words, it’s impossible to change the file without changing the associated hash value as well. So if you have two copies of a file, and they both have the same hash value, you can be certain that they are identical.  

A hash value guarantees authenticity thanks to four particular characteristics:    

  • It is deterministic, meaning that a specific input (or file) will always deliver the same hash value (number string). This means that it is easy to verify the authenticity of a file. If two people independently (and correctly) check the hash value of a file, they will always get the same answer.
  • The odds of “collisions” are low. This means that the chances of two different inputs (files) coincidentally having the exact same hash value are incredibly small—practically non-existent.
  • A hash can be calculated quickly. Generating a hash value is quick and easy (provided you have the right tool). The size of the file in question is also irrelevant—generating a hash value for a large file is as simple as creating one for a small file.
  • Any change to the input will change the output. Even the smallest change to the input file will result in a change to the resulting hash value. This means that it is impossible to alter a file without changing the associated hash value, which makes it very easy to prove (or disprove) the authenticity of a piece of digital evidence.

Not All Hashing Algorithms Are Created Equal

As CISA mentions in its definition of a hash function, some of the most popular algorithms are Message Digest 5 (MD5), Secure Hash Algorithm-1 (SHA-1), and the Secure Hashing Algorithm-2 family (SHA-2 and SHA-256). 

Predictably, these are also the hashing algorithms that are often used when generating digital signatures and authenticating digital records. 

The problem is that, while they are all often used to verify data integrity, only SHA-256 is still secure—MD5 and SHA-1 have known vulnerabilities.  

MD5 has been around since 1991 and is now thoroughly “broken.” As mentioned in the previous section, for a hash value to guarantee authenticity, the odds of a collision need to be incredibly low—meaning the chances of two different inputs coincidentally having the same hash value must be practically zero.

The issue with MD5 is that it is very susceptible to intentional collisions—known as collision attacks—that try to produce two different inputs which result in the same hash value. In fact, a basic computer and a tool like HashClash can now generate collisions in no time at all—we’re talking about minutes, if not seconds.

The Great SHA-1 Collision

Like MD5, the popular SHA-1 algorithm is also broken. As far back as 2005, a convincing theory for how SHA-1 could be broken was proposed by researchers, and the National Institute of Standards and Technology (NIST) immediately suggested that federal agencies move to SHA-2. In 2017, this theoretical vulnerability was made very real when Google announced the first official SHA-1 collision

“Today, more than 20 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision,” read a statement released by Google. “This represents the culmination of two years of research that sprung from a collaboration between the CWI Institute in Amsterdam and Google. We’ve summarized how we went about generating a collision below. As a proof of the attack, we are releasing two PDFs that have identical SHA-1 hashes but different content.

“For the tech community, our findings emphasize the necessity of sunsetting SHA-1 usage. Google has advocated the deprecation of SHA-1 for many years, particularly when it comes to signing TLS certificates. As early as 2014, the Chrome team announced that they would gradually phase out using SHA-1. We hope our practical attack on SHA-1 will cement that the protocol should no longer be considered secure.”

This YouTube video provides a good overview of the SHA-1 collision.

 

If you’re looking for a deeper dive, you can also have a look at this detailed presentation by the team responsible for the collision.

 

Since 2017, this work has been taken even further. Researchers have succeeded in creating what are known as chosen-prefix collisions, which allows for much more manipulation of forged data. 

“Finding a practical collision attack breaks the hash function badly of course, but the actual damage that can be done with such a collision is somewhat limited as the attacker will have little to no control on the actual data that collides,” one of the researchers, Thomas Peyrin, told ZDNet after the paper was published. “A much more interesting attack is to find a so-called ‘chosen-prefix collision,’ where the attacker can freely choose the prefix for the two colliding messages. Such collisions change everything in terms of threat because you can now consider having collisions with meaningful data inside (like names or identities in a digital certificate, etc).”

Don’t Rely on Old Technology with Vulnerabilities

Since a much better option is available, there is no reason to make use of hashing algorithms that have known vulnerabilities. 

NIST’s official stance on SHA-1 is the following: “Federal agencies should stop using SHA-1 for generating digital signatures, generating timestamps and for other applications that require collision resistance.” 

Yet, despite this, many private-sector companies continue to use SHA-1 (and sometimes even MD5)—a decision that opens up their data to questions of accuracy and authenticity.         

We believe in always taking a best-practices approach. And we take data security very seriously at Pagefreezer. That’s why we are ISO 27001 certified and SOC 2 compliant. It’s also why we use SHA-256. We want the authenticity of our records to be beyond question.

Want to learn more about hash values and the authentication of digital evidence? Download our reference guide, Authenticating Digital Evidence Under FRE 902(13) and (14): Using Digital Signatures (Hash Values) and Metadata to Create Self-Authenticating Digital Evidence.

New call-to-action

Peter Callaghan
Peter Callaghan
Peter Callaghan is the Chief Revenue Officer at Pagefreezer. He has a very successful record in the tech industry, bringing significant market share increases and exponential revenue growth to the companies he has served. Peter has a passion for building high-performance sales and marketing teams, developing value-based go-to-market strategies, and creating effective brand strategies.

Related Posts

SEC Rule 17a-3 & FINRA Records Retention Requirements Explained

Financial industry recordkeeping regulatory requirements like the U.S. Securities and Exchange Commission (SEC) Rules 17a-3 and 17a-4, and the Financial Industry Regulatory Authority (FINRA) Rules 4511 and 2210, play a crucial role in maintaining the integrity of the U.S. financial markets. These regulations are not just bureaucratic formalities; their oversight involves ensuring that financial services firms adhere to stringent record retention requirements, essential for the transparency, accountability, and trust that underpin the financial system.

The Reddit OSINT/SOCMINT Investigation Guide

According to its IPO prospectus submitted to the US Securities and Exchange Commission on February 22, 2024, Reddit has more than 100K active communities, 73 million daily active visitors, 267 million weekly unique visitors, and more than 1 billion cumulative posts.

Understanding a Request for Production of Documents (RFP)

Requesting production of documents and responding to requests for production (RFP) are key aspects of the discovery process, allowing both parties involved in a legal matter access to crucial evidence.