Before hash values, proving the authenticity of digital evidence could be tricky — especially if opposing counsel was determined to exclude the evidence.
It may be contested that screenshots, for example, could have been manipulated with basic photo editing tools. Without authentication like hash value verification, legal teams had to spend significant time and resources providing a sponsoring witness who could testify to the authenticity of digital evidence.
That's just one reason why hash values are crucial in digital evidence authentication. In this article, we’ll dig into what exactly hash values are, their role in digital forensics, and how you can authenticate your digital evidence with hash verification.
Table of Contents
- What is a Hash Value or Hash Function in Digital Forensics?
- The Federal Rules of Evidence for Digital Evidence Authentication
- What The Federal Rules of Evidence Amendments Say About Hash Values
- The Importance of Hash Values in Digital Forensics
- How Do Hash Values Authenticate Digital Evidence?
- What are the Main Differences Between Popular Hashing Algorithms MD5, SHA-1, CRC32, SHA-2, and SHA-256?
- How To Use Free Online Tools to Generate or Verify a Hash Value for a Digital File
- DISCLAIMER: Risks of Using Free Tools to Generate Hash Values
- How to Generate Defensible Digital Evidence with Hash Values
What Is a Hash Value or Hash Function in Digital Forensics?
The Cybersecurity and Infrastructure Security Agency (CISA) defines hash value(s) or hash function(s) as:
A fixed-length string of numbers and letters generated from a mathematical algorithm and an arbitrarily sized file such as an email, document, picture, or other type of data.
This generated string is unique to the file being hashed and is a one-way function—a computed hash cannot be reversed to find other files that may generate the same hash value.
In simple terms, a hash value or hash function is a specific number string that is associated with one particular file, created through a hashing algorithm.
If the file is altered in any way, the hashing algorithm will produce a different number string.
It’s impossible to change the file without changing the associated hash value as well. So if you have two copies of a file, and they both have the same hash value, you can be certain that they are identical.
The Federal Rules of Evidence Amendments for Digital Evidence Authentication
Thanks to the Federal Rules of Evidence Amendments 902(13) and (14), witness testimony to the authenticity of digital evidence has been replaced by certification.
To streamline evidence submission and authentication, electronically stored information (ESI), like social media posts and comments, cellphone images, text messages, and website content can now be submitted as machine-generated authenticated evidence.
To understand what this means in a practical sense, let’s take a closer look at the amendments themselves:
FRE 902(13): Certified Records Generated by an Electronic Process or System
A record generated by an electronic process or system that produces an accurate result, as shown by a certification of a qualified person that complies with the certification requirements of Rule 902(11) or (12). The proponent must also meet the notice requirements of Rule 902(11).
This rule allows for the certification of records by a qualified person who can verify the accuracy of the process or system that generated the records, eliminating the need for in-court testimony to establish authenticity.
FRE 902(14): Certified Data Copied from an Electronic Device, Storage Medium, or File
Data copied from an electronic device, storage medium, or file, if authenticated by a process of digital identification, as shown by a certification of a qualified person that complies with the certification requirements of Rule (902(11) or (12). The proponent also must meet the notice requirements of Rule 902 (11).
Amendment 902(14) allows for data copies to be authenticated through a process of digital identification—typically using hash values. Like 902(13), this rule requires a certification by a qualified person who can attest to the integrity of the process used to copy the data.
What The Federal Rules of Evidence Amendments Say About Hash Values for Evidence Authentication
While the amendments themselves don’t mention any specific ‘electronic process or system that produces an accurate result,’ references to hash values are made in accompanying comments provided by the Standing Committee on Federal Rules in the 2017 Amendment:
Today, data copied from electronic devices, storage media, and electronic files are ordinarily authenticated by "hash value."
A hash value is a number that is often represented as a sequence of characters and is produced by an algorithm based upon the digital contents of a drive, medium, or file.
If the hash values for the original and copy are different, then the copy is not identical to the original. If the hash values for the original and copy are the same, it is highly improbable that the original and copy are not identical.
Thus, identical hash values for the original and copy reliably attest to the fact that they are exact duplicates.
The Importance of Hash Values in Digital Forensics
Hash values are fundamental to digital forensics, providing a reliable, efficient, and secure method for verifying the integrity and authenticity of digital evidence. By incorporating hash values into their investigative processes, forensic experts can ensure that digital evidence is trustworthy and defensible in court.
Here are four fundamental ways hash values ensure the authenticity, integrity, and reliability of digital evidence:
1. Ensuring Data Integrity
As we’ve discussed, one of the primary functions of hashing in digital forensics is to verify the integrity of data. According to the Federal Rules of Evidence (FRE) amendments 902(13) and 902(14), digitally stored information can be submitted as authenticated evidence without the need for witness testimony, provided it has been properly hashed and certified.
2. Authenticating Evidence
When digital evidence is collected, a hash value is generated from the original data using a hashing function like SHA-256. This hash value acts as a unique digital fingerprint for that specific piece of evidence.
Now, at any point in the investigation, the collected evidence can be hashed again and compared to the original hash value.
If the hashes match, the data has remained unchanged. Any discrepancy between the hash values indicates tampering or corruption, alerting forensic analysts to potential issues with the evidence. In court, the hash value can be used to demonstrate that the evidence has not been altered since its collection.
3. Preventing and Detecting Tampering
Digital evidence is vulnerable to tampering, either intentionally or unintentionally. Hash values provide a robust mechanism for detecting any changes to the evidence.
Even the smallest alteration to a file will result in a completely different hash value. Forensic tools that generate and compare hash values can quickly detect such changes, ensuring the immutability of the evidence. This feature is particularly useful for identifying unauthorized access or malicious modifications.
4. Facilitating Evidence Comparison
In cases involving multiple copies of digital evidence, hash values simplify the process of comparing these copies to ensure they are identical.
Rather than manually examining the contents of each file, forensic analysts can compare the hash values of the original and duplicate files. Matching hash values confirm that the copies are identical, streamlining the verification process and reducing the risk of human error.
How Do Hash Values Authenticate Digital Evidence?
A hash value guarantees authenticity thanks to five particular characteristics:
1. Hash values are deterministic.
A specific input (or file) will always deliver the same hash value (number string). This means that it is easy to verify the authenticity of a file. If two people independently (and correctly) check the hash value of a file, they will always get the same answer.
2. The odds of “collisions” are low.
If you’re using hashing algorithms like SHA-265 chances of two different inputs (files) coincidentally having the exact same hash value are incredibly small—practically non-existent.
3. A hash can be calculated quickly.
Generating a hash value is quick and easy (provided you have the right tool). The size of the file in question is also irrelevant—generating a hash value for a large file is as simple as creating one for a small file.
4. Any change to the input will change the output.
Even the smallest change to the input file will result in a change to the resulting hash value. This means that it is impossible to alter a file without changing the associated hash value, which makes it very easy to prove (or disprove) the authenticity of a piece of digital evidence.
5. It’s secure.
Because of what is called ‘pre-image resistance’ hash values should be computationally infeasible to reverse, meaning you cannot derive the original input given only the hash output.
The below video from the Computerphile YouTube channel offers a great explanation of how hashing and hash values are used in the realm of digital signatures and data authentication.
What are the Main Differences Between Popular Hashing Algorithms MD5, SHA-1, CRC32, SHA-2, and SHA-256?
Not all hashing algorithms are created equal. Though we’ll save you the in-depth technical details, it’s valuable to have a basic understanding of which algorithms are useful for digital forensics and which hashing algorithms are obsolete.
MD5 (Message-Digest Algorithm 5)
- Developed in 1991 by Ronald Rivest.
- Was used primarily for data security and encryption, but because of its vulnerability to security breaches, the primary use today is authentication.
- Due to susceptibility to collision attacks, MD5 is not suitable for cryptographic security in new systems.
SHA-1 (Secure Hash Algorithm 1)
- Developed in 1993 by the National Security Agency (NSA).
- Formerly a staple in security applications like SSL certificates, officially retired in 2022 in favor of more secure options.
- Vulnerable to collision attacks, making it inadequate for modern cryptographic requirements.
CRC32 (Cyclic Redundancy Check)
- Primarily used to detect accidental changes to raw data in digital networks and storage devices. It’s common in file compression, file verification, and in applications where fast and simple error-detection is needed.
- NOT suitable for cryptographic security because it is not designed to withstand malicious alterations.
SHA-2 (Secure Hash Algorithm 2)
- Set of hash functions developed in 2001 by the NSA.
- Variants include SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.
- Recommended for current cryptographic applications, including compliance with security standards.
- Highly secure against known cryptographic attacks.
SHA-256 (Part of SHA-2 family)
- Key in blockchain technologies and digital signatures, underscoring its robustness and reliability.
- Inherits SHA-2’s strong security features, providing solid protection against cryptographic threats.
How To Use Free Online Tools to Generate or Verify a Hash Value for a Digital File
To generate a hash that’s associated with a particular file is fairly easy, and can be done with an online tool in a few simple steps:
- Visit https://www.toolsley.com/hash.html
- Upload the file that you want to generate a hash value for
- Select the hashing algorithms you want to use (we recommend SHA-256)
- The hash value will automatically appear next to your selected algorithms.
- To verify the hash, you can click on the green link icon next to the hash value. A window will pop up with a web address you can copy.
- Paste the web address into a new tab in your browser.
- If you upload the same file as you generated the hash value for, you’ll get a green bar that says, “VALID”.
- If you edit your original file in any way (change one letter even) and try to validate it, the bar will be red and say, “INVALID”.
DISCLAIMER: Risks with Using Free Online Tools to Generate Hash Values for Authenticating Evidence
As we’ve discussed, hash values act as a digital signature or fingerprint that can authenticate digital evidence.
However, if the original was tampered with before assigning a hash value, the hash value would still generate – it would not be able to flag that the evidence was already tampered with.
That is why it is crucial to make sure evidence is, “generated by an electronic process or system that produces an accurate result, as shown by a certification of a qualified person” as is required in FRE 902 (13) & (14). The process used to generate or copy the data must be certified by a qualified person who can attest to the integrity of the process.
So, as long as a piece of evidence was correctly collected and processed, anyone using the same algorithm to authenticate it at a later stage will see that exact same resulting hash value.
Any change to the data will result in the hash value changing — making the evidence easy to authenticate.
However, if hashing is not done correctly, opposing counsel will quickly question the authenticity.
This means collecting and attempting to authenticate evidence yourself using free hashing tools, as we’ve shown you above, may be risky.
And if you’re trying to process many pieces of digital evidence and relying on copy-paste to keep track of which hash value is associated with which document, it would be easy to miss errors or lose track.
The bottom line: If you’re submitting this evidence in court, you’ll want to make use of only the most reliable hashing methods and tools.
How to Generate Defensible Self-Authenticating Digital Evidence with Hash Values
If you want to generate defensible self-authenticating digital evidence, there’s quite a lot to consider. We’ve created a free reference guide that summarizes hundreds of pages of documentation, explaining how you can generate self-authenticating evidence that will stand up in court. You can download this free paper by clicking on the button below.
How Pagefreezer Can Help You Generate Defensible Digital Evidence with Hash Values
If you want to generate hash values for evidence you’re collecting online from websites, social media platforms, team collaboration platforms like MS Teams or Slack, we can help.
Not only can we help you authenticate digital evidence with the highest standard of hashing algorithms, SHA-256, our software also allows you to simplify the process of collecting ESI from online data sources. You’ll be able to:
- Capture dynamic online media like YouTube and Instagram videos
- Automate website, social media, and other evidence collection
- Capture all associated metadata revealing key information
- Easily search and organize collected evidence
- Export in a variety of formats, complete with hash values, digital signatures, and metadata
If you’d like to learn more about our digital evidence collection services, watch the video below or visit our evidence collection page here.