Schedule a Demo


See the latest news and insights around Information Governance, eDiscovery, Enterprise Collaboration, and Social Media. 

All Posts

eDiscovery: Frequently Asked Questions

Traditional discovery is the initial phase of litigation when all parties are required to provide records and evidence relevant to a specific case. However, thanks to the explosion of electronically stored information (ESI), discovery must now work alongside eDiscovery—a process that involves the identification, preservation, collection, retention, and review of data in an electronic format. This makes the discovery exponentially larger and more complex.

Consider how many emails and instant messages are exchanged within an organization each day. Now add external messages and social media posts, and multiply that by millions of people all working and communicating online each day – and the fact that for each communication, there is a history of exchanges.

Within these petabytes of data, legal teams must find anything that might be relevant, collect, and retain it, and then review what has been collected to decide whether or not it is relevant to the legal case. It’s an expensive and time-consuming process that has resulted in the development of sophisticated technology to support eDiscovery, as well as the emergence of its own set of processes and jargon.

So, let’s take a look at some of the most common eDiscovery terms, and the most frequently asked questions.


What is EDRM?

The Electronic Discovery Reference Model (EDRM) is a valuable resource for understanding eDiscovery processes. The framework outlines nine distinct stages of the eDiscovery process and it is useful for legal teams in understanding the demands of the eDiscovery process and optimizing their internal practices for each and every stage.



However, it is not a prescriptive model aimed at telling eDiscovery professionals exactly how they should manage the eDiscovery process. Instead, it represents a conceptual view of the eDiscovery process.

Companies with a good understanding of the EDRM benefit from better implementation and better practices, such as being litigation ready, saving time and costs, defending themselves more successfully, and avoiding fines for late submission of data, etc.


The nine stage of EDRM include:

  1. Information Governance: How information is properly captured and stored.
  2. Identification: Organizations are legally obligated to preserve any ESI that might be relevant to a legal matter. “Identification,” includes any activities—such as case reviews and interviews—that assist in identifying key pieces of electronic information that are likely to be important to a case down the line.
  3. Preservation: Once crucial ESI has been identified, the next step is preserving that evidence for litigation. Failing to do so can result in what is officially known as spoliation—the tampering or destruction of evidence.
  4. Collection: Once evidence has been preserved by placing it on legal hold, the legal team needs a way to collect and present it in a way that’s defensible, meaning that the authenticity of the data is beyond question.
  5. Processing: Collected evidence is ‘cleaned up’ ahead of attorney review. This involves deleting irrelevant data, converting files, and ultimately collecting it all in a single folder.
  6. Review: This is one of the most important stages of the EDRM, it is also one of the most costly. Once all this ESI has been collected and processed, it has to be reviewed by legal teams to understand how it relates to the case at hand.
  7. Analysis: This phase involves evaluating ESI for particular use during a legal matter. More than reviewing the information, the Analysis identifies key patterns, topics, and people involved in the case — and digs into the content and context.
  8. Production: Once crucial ESI has been identified and incorporated into a legal strategy, evidence must be produced in a way that makes it usable during formal legal proceedings.
  9. Presentation: Once legal evidence has been produced, all that remains is presenting it during a legal proceeding like a trial or deposition, taking note that just as ESI has started to dominate the discovery process, presentation has also shifted away from paper towards more digital presentation.

Read our blog posts on the eDiscovery EDRM model and understanding the 9 phases of EDRM for more information.


What is Information Governance—And How Is It Related to eDiscovery?

Too little data and businesses cannot competitively operate. Too much data and the result is just as debilitating. Today’s enterprises are producing more data each day than they generally know what to do with. With information overload, trying to make smart decisions can feel overwhelming at best, and impossible at worst.

Enter information governance. Gartner defines information governance as “the specification of decision rights and an accountability framework to ensure appropriate behavior in the valuation, creation, storage, use, archiving and deletion of information. It includes the processes, roles, and policies, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”

To address the new challenges presented by information governance in a digital age, the Association of Records Managers and Administrators (ARMA) International created The Principles (also known as the Generally Accepted Recordkeeping Principles) in 2009 to help companies improve their approach to information governance. They were updated in 2017 and contain eight principles, which act as a standard of conduct. The principles illustrate how modern enterprises should:

  • Oversee information management to ensure accountability within the organization
  • Manage information in a way that is open and transparent
  • Guarantee the authenticity and reliability of information
  • Classify and protect information that should not be accessible by all
  • Comply with all relevant recordkeeping regulations
  • Maintain the availability and accuracy of information
  • Retain information for regulatory, legal, and historical requirements
  • Dispose of information no longer required.

Information governance is an important element in eDiscovery as well, because without these best practices in place, an organization will not be able to easily and quickly identify, preserve, or retain the correct information should a legal matter arise.

For insights into how enterprises can organize and leverage their data and implement the Information Governance Maturity Model, read more on our blog.


What is the Difference Between Information Governance and Records Management?

Records management and information governance are not the same thing. Although they are sometimes used synonymously—and the two concepts do admittedly have a similar purpose within the average organization—there are several important differences.

To better understand these differences, let’s look at how ARMA defines the two terms:


ARMA’s Information Governance Definition

A strategic framework composed of standards, processes, roles, and metrics that hold organizations and individuals accountable to create, organize, secure, maintain, use, and dispose of information in ways that align with and contribute to the organization’s goals.


ARMA’s Records Management Definition

Records management is the systematic management of records and information through its various lifecycles. It includes the analysis, design, implementation, and management of manual and automated systems regardless of format or medium.


At first glance, these definitions seem similar, but a careful reading highlights key differences between the two.

While records management tends to focus on the systems used to manage the lifecycle of documents, information governance approaches data from a more high-level, strategic position.

Similarly, a few decades ago, records management referred to the management of discrete (and largely physical) documents. The proliferation of online data sources from social media channels to enterprise collaboration platforms, the Internet of Things (IoT), and everything in between, have resulted in a landscape that’s too complex for traditional records management alone.

In addition, good information management cannot be accomplished by records and compliance teams alone. IT teams are crucial to success, which is why information governance places such a great emphasis on IT.

Finally, as information governance takes a broader view of the management of records than traditional approaches, it also provides the opportunity to utilize information in more strategic ways. Not only can it be crucial in mitigating risk within an organization, it also makes it easier to use large volumes of unstructured data to the benefit of the company.

To understand the roles of recordkeeping and information governance in enterprises today, read our blog post, The Difference Between Information Governance and Records Management.


What is Unstructured Data?

Structured data can best be described as anything that has a high consistency in terms of fields and values across database entries. A bank account or employee directory are good examples of this. It’s highly structured data with a relational nature that is easy to understand and quick to search.

Unstructured data is everything else. Approximately 80% data in modern organizations qualify as unstructured, including PDFs, text documents, spreadsheets, presentations, images, videos, and audio files.

Importantly, all the communication tools that modern enterprises use—email, mobile text messages, social media, and enterprise collaboration platforms—obviously also exist within the realm of unstructured data (although they do have some structure, thanks to metadata). And given how much communication takes place, these sources are responsible for massive amounts of data that organizations aren’t always sure how to deal with.        

To find out more about monitoring, data loss prevention, and legal holds with regards to unstructured data, read our blog post on understanding unstructured data hiding in enterprise social networks.


What is Chain of Custody?

Chain of Custody is the process of documenting the handling of evidence in a chronological order. It is case-specific and provides a clear record of who was responsible for evidence at any given time, as well as the way that this evidence was stored, transferred, handled, and formatted.

It is essential to guarantee defensible reporting that confirms who, when, how, and what electronically stored information (ESI) was preserved, collected, and processed. Without this, you are leaving your organization vulnerable to legal obstacles and sanctions for the spoliation of evidence. This presents challenges for both an organization’s legal department and its IT staff who are responsible for managing corporate data.

Without an adequate chain of custody, however, it is impossible to absolutely authenticate evidence. A clear process should document the collection and retention of all data, and chain of custody ensures full transparency of the process.

You need to answer these three key questions to ensure your ESI evidence is authentic and accepted by the Court:


How was it collected? You will need a systematic process in place to document your data. You need to gather information specifying when, where, and who collected and preserved the evidence.


How was it preserved? The chain of custody must establish that the evidence was not subject to alteration from its collection until its presentation. The process needs rigorous and meticulous documentation, as a simple view might lead to a case running into serious problems.


Who collected the evidence? It is likely that the litigant or organization’s legal department will be responsible for collecting or preserving a piece of evidence. If questioned by the court, the respondents might be asked to testify and, potentially, be accountable for charges of tampering.


Chain of Custody should include: Date and time of collection, pick-up or delivery location, delivering party, delivery detail, and receiving party.

For more information, read our blog post, What is Chain of Custody?


What is a Legal Hold?

A legal hold, also known as a litigation hold, is a process that ensures an organization acts to preserve all forms of relevant information and data associated with litigation matters, investigations, or other legal disputes.

A legal hold is initiated when a notification is sent from an organization’s legal department to instruct custodians and data stewards not to delete electronically stored information (ESI) or discard archived documents that may be potentially relevant to an upcoming legal matter.

There are no specific legal requirements to determine the content of a legal hold notice, but the communication must be clear, succinct, and easy to understand. The custodians receiving legal hold notices could be from a variety of roles and departments, so the legal hold must be straightforward and include all necessary information.

This includes but is not limited to:  

  • The subject matter of the information that the organization must preserve
  • Which materials employees should preserve
  • The time frame to which the preservation duty applies
  • Whom employees may contact for any questions about the legal hold
  • How employees should handle any relevant information that they possess or collect 
  • The deadlines for acknowledging the hold and any actions.

The legal hold notice should also instruct the recipients’ duty to preserve all potentially relevant information without modification until further notice.

Organizations could face serious consequences if they fail to preserve and present information related to the subjects of the legal matter, ranging from financial penalties to default judgment or even dismissal. Organizations can be penalized for failing to meet legal hold requirements if they lose ESI or fail to restore or replace ESI.

Read more on our blog to learn about legal holds and how they relate to your organization.


What is a Request for Production of Documents?

A request for production is a discovery device used to gain access to documents, electronic data, and physical items held by an opposing party in a legal matter. The aim is to gain insight into any relevant evidence that the opposing party holds.

A request for tangible things and physical documents is one thing, but what about Electronically Stored Information (ESI)? The prevalence of eDiscovery and ESI in modern legal matters have complicated the production of documents. There are a number of reasons for this. First, with so much ESI being created through different online platforms and communication tools, it can be difficult for organizations to know what information they hold and to put the necessary retention policies and preservation processes in place.

Finding a particular piece of evidence in a mountain of data can also be difficult, and delivering modern ESI in a format that satisfies both the expectations of opposing counsel and Article IX of the Federal Rules of Evidence is a complex task.

But, simply stating that you can’t deliver requested information is not good enough. Rules (like Rule 37 of the US Federal Rules of Civil Procedure) have been put in place to keep all parties accountable. A legal team is legally obligated to respond to this request, either by producing the information, or alternatively, by providing a written explanation as to why the documents cannot be delivered.

It is not enough to just deliver requested information either. Data must be produced in the correct form, and when it comes to content that exists in an online platform like WordPress, Slack, Twitter, or Facebook, finding an export format that complies with the rule (and the specific request of the opposing party) is challenging. Screenshots are impossible to authenticate, while the typical JSON exports that platforms provide lack the context needed during the litigation process.  

The best way to deal with it is to leverage a purpose-built solution that’s specifically aimed at facilitating the eDiscovery of this sort of ESI. 

To find out more about how to meet the legal requirements for a request for production of Electronically Stored Information (ESI), read our blog post, Understanding the Request for Production of Documents.


What is Data Retention?

Retention is how an organisation stores data to meet regulatory and recordkeeping obligations. It is a proactive, ongoing process and a central component of records management and information governance. Many industries have very explicit recordkeeping rules. For example, SEC and FINRA recordkeeping requirements mandate that financial services organizations keep records of all communication. Government organizations must similarly retain data in order to comply with the Freedom of Information Act (FOIA) and state-specific Open Records laws.

Retention periods differ by industry, but typically range anywhere from three to 10 years. Once data falls outside the retention period, companies often choose to dispose of it. Hanging onto so much information in an age when massive amounts of new data are constantly being created can quickly become expensive and overwhelming. Second, once data is no longer needed to meet regulatory requirements, it can become little more than a liability. Retaining sensitive customer data can invite security and privacy risks, so if the information is no longer needed, it’s better to delete it.

To manage the retention process, businesses apply internal plans and policies that outline how and for how long data should be stored across the organization. They also implement systems that automatically dispose of data according to a set retention schedule.

Read more on our blog for in-depth insights into data preservation. 


What is Data Preservation?

Data preservation is the safekeeping of Electronically Stored Information (ESI) for an anticipated legal matter. It is closely related to eDiscovery and litigation readiness because once litigation seems reasonably likely, an obligation to collect and preserve relevant ESI is triggered.

The US Federal Rules of Civil Litigation (FRCP), 37(e) broadly states that ESI must be preserved when litigation can reasonably be anticipated. In this context, “reasonable” includes, but is not limited to:

  • Direct communication from opposing counsel
  • Receiving a subpoena
  • A media report of an investigation or impending litigation
  • A notification that a formal complaint has been lodged

Unfortunately, knowing what is “relevant” is a complex question without a neat and simple answer. Every individual case requires legal teams to confer with clients to identify ESI that could be relevant to the case at hand. In most instances, multiple stakeholders across an organization would need to get involved, including in-house legal teams, compliance personnel, records managers, IT personnel, HR managers, etc.

For more information on data preservation, read our blog post, The Difference Between Data Retention and Data Preservation.


What is Evidence Spoliation?

Evidence spoliation refers to the intentional, negligent, or reckless withholding of evidence, the fabrication or alteration of evidence, or destroying evidence relevant to a legal proceeding.

How does this relate to something as ephemeral as social media though? While a tweet or comment doesn’t necessarily feel like an official piece of Electronically Stored Information (ESI) in the same way that a PDF or even an email does, the same duty to preserve evidence that applies to other forms of ESI also applies to social media.

The duty to preserve is triggered when a party reasonably foresees that evidence may be relevant to issues in litigation. Due to the increasing relevance of social media data in the realm of litigation, organizations need to put formal systems and processes in place to ensure that relevant social media content is properly preserved. As mentioned above, social media evidence should obviously not be deleted intentionally, but it’s equally important to ensure that data isn’t accidentally disposed of. 

Read more on our blog to learn how to place social media evidence on legal hold and avoid spoliation.


What is Optical Character Recognition (OCR)?

Optical Character Recognition is the electronic conversion of handwritten content, printed text, or image-only digital documents into a machine-readable and searchable digital data format.

For the purposes of eDiscovery, OCR software can identify and convert text characters from discoverable materials, such as physical contracts, typed letters, JPEGs of photographed documents, and image-only PDFs. Once done, this content can be searched in the same way you would search a Word document—simply type your query into the search bar and you’ll see all references in the document.

OCR not only speeds up the discovery process, it reduces costs too, since time-consuming human review is no longer needed. Instead, digitized information can be instantly searched for keywords, names, dates, and so on.

Although the concept of OCR is straightforward, in practice the technology can be challenging to implement due to a number of factors. For example, different fonts and methods of letter formation can make the job of identifying characters more difficult.

The process of OCR is therefore divided into image pre-processing, character recognition, and the post-processing of the output. The steps include: scanning the document, refining the image through OCR software, aligning text and converting colors or shades of grey to black and white only, identifying which characters are on the page, ensuring accuracy and producing a fully searchable, digital text file that can be manipulated, examined, and edited in any way the owner wishes.

To find out more about OCR and its most common uses across different industries, read our blog post, What is Optical Character Recognition (OCR)?


What is Technology Assisted Review?

Successful eDiscovery demands a comprehensive document review process – but finding and tagging documents and data that are discoverable and relevant to the case is becoming increasingly more complex and time-consuming.

Whether a legal firm is litigating an average case or a much larger dispute, the result is that legal teams are finding themselves sifting through thousands of gigabytes of data per case, with clients who aren’t always sure what they need, where it’s stored, or even if they’ve preserved all their data correctly in a defensible format. 

To reduce the costs associated with the electronic review process, technology-assisted review (TAR) has gradually become standard practice in the eDiscovery process.

TAR uses artificial intelligence and machine learning to analyze massive data sets, and then identify and tag potentially discoverable documents. It can provide statistics, categorization, and reporting data that is superior to traditional human review and requires less hours to produce.

To understand the key benefits of TAR, how it is used in the courts, and its limits, read more on our blog.


What are Hash Values?

In any legal case, Electronically Stored Information (ESI), like social media posts and comments, cell phone images, text messages, and website content can be submitted as machine-generated authenticated evidence. This means witness testimony is often no longer necessary.

But, it also means that making use of hashing algorithms when collecting and authenticating evidence is crucial. To better understand this, we need to take a closer look at the amendments themselves. The US Federal Rules of Evidence Amendments 902(13) and (14), state:

A hash value, or hash function is a fixed-length string of numbers and letters generated from a mathematical algorithm and an arbitrarily sized file such as an email, document, picture, or other type of data. This generated string is unique to the file being hashed and is a one-way function—a computed hash cannot be reversed to find other files that may generate the same hash value. Some of the more popular hashing algorithms in use today are Secure Hash Algorithm-1 (SHA-1), the Secure Hashing Algorithm-2 family (SHA-2 and SHA-256), and Message Digest 5 (MD5).


A small change to the input will result in a dramatic change to the hash value on the right. This makes it nearly impossible to alter a digital file without changing the hash value and making it clear that the data has been tampered with.


In simple terms, a hash value is a specific number string that’s created through an algorithm, and that is associated with a particular file. If the file is altered in any way, and you recalculate the value, the resulting hash will be different. This means you cannot change the file without changing the associated hash value as well. If you have two copies of a file, and they both have the same hash value, you can be certain that they are identical. Because of this, a hash value acts as a digital signature (or fingerprint) that authenticates evidence. As long as a piece of evidence was correctly collected and processed, any other party independently examining the hash value will find the same number string.

Read our blog post, The Importance of Hash Values in Evidence Collection and Digital Forensics, to understand the excellent DIY tools that exist to help you generate defensible, self-authenticating digital evidence.


What is Metadata?

Metadata is the data about data. It provides information about digital data. For example, the metadata of a social media post would include information about the author of the post, when it was posted (date and time), versions of the post, links (un-shortened), location, likes, and comments.


An example of what the metadata related to a simple tweet looks like.


Metadata typically falls into one of five categories: descriptive, structural, administrative, statistical, or reference data.

Metadata plays a major role in compliance and litigation because it is crucial for the authentication of online data like social media and website content. Whenever you need to prove that records of website content, comments, or social media posts look exactly like they did when they were first published, you need metadata that shows when, where, and how they were created. Screenshots are not enough because they cannot be authenticated—it’s simple enough to fake a screenshot. 

For regulated industries, such as financial services, or public-sector entities governed by FOIA/Open Records laws, metadata is needed to prove that records are authentic. Two primary use cases are when an auditor asks a financial services firm for official website records, or a journalist places an open records request for a city’s social media data.

Because information from emails, social media comments, and enterprise collaboration conversations are central to litigation today, every piece of online data that is relevant to a litigation must be authenticated. For highly-litigated industries in particular, metadata is critical, since the authenticity of records is often heavily contested. Anyone entering data from these sources into evidence needs to be able to prove that it hasn’t been tampered with, as well as when, where, and how a record was created. Without metadata, it’s very probable that the digital evidence will be denied in court.

To understand more about the types of metadata or how it is used in legal cases, read our blog post, What is Metadata and Why is it Important?


What Role Do Emojis Play in eDiscovery?

Each day, billions of communications are relayed through mobile text messages, collaboration platforms, and social media comments. Emojis have become an important tool to clarify a message, give it context, and even add emotion to a sentiment—words themselves cannot always convey that a person is joking or angry, for example, but an emoji can.

The challenge is that while emojis give context, they are also open to interpretation. For example, a message containing an eggplant in this emoji string, 🍎🍐🍆🌽🍌 is probably just a grocery list. This on the other hand 😍😏🍆😜 could very well be inappropriate innuendo relevant to a sexual harassment case.

To complicate matters, that second string can’t automatically be classified as inappropriate either. In this case, emojis can be likened to face-to-face verbal communications rather than a written sentence. Think of how much you imply through your tone, stance and hand gestures when you speak. The expression on your face could completely change the meaning of your words, but it’s difficult to successfully decode meaning if you’re not familiar with the context.

This is why it’s crucial for legal teams and eDiscovery professionals to see posts, messages, and conversations exactly as they originally appeared when it comes to collecting evidence from social media, team collaboration tools, and mobile text messages. Looking purely at a CSV spreadsheet of exported data often will not provide the necessary context.

Only by looking at messages and conversations exactly as they originally appeared can legal professionals begin to identify subtext and understand the implicit meaning all those emojis, Twitter terms, and chat abbreviations are actually trying to convey.

Read our blog post, Decoding Emojis During eDiscovery, to unpack how emojis are used during the eDiscovery process. 

Want to learn more? Check out our guide, The Essential Online Investigation Guide for Websites, Social Media, and Team Collaboration Tools.

The Essential Online Investigation Guide

George van Rooyen
George van Rooyen
George van Rooyen is the Content Marketing Manager at Pagefreezer.

Related Posts

Why Social Media is a Goldmine for Evidence and Essential for Investigations

Social media has become a treasure trove for legal evidence, providing insights into to people's lives and behaviors, that can significantly impact investigations and litigation. Users are driven by algorithms to post engaging, often provocative content, leading to a wealth of incriminating evidence.

Beyond the Desk: Teddy Leung

Introducing the relaunch of our Beyond the Desk series! Beyond the Desk helps candidates get to know the folks who work at Pagefreezer, learn about who they are, what they do, and why they love working at Pagefreezer.

The Power and Pitfalls of Social Media Evidence in Trademark Infringement Cases

In this article we'll discuss a few recent cases that reflect how social media evidence can play an important role in establishing consumer confusion in trademark infringement lawsuits.