Schedule a Demo


See the latest news and insights around Information Governance, eDiscovery, Enterprise Collaboration, and Social Media. 

All Posts

How Optical Character Recognition (OCR) Revolutionizes eDiscovery and Online Investigations

Documentation and data in modern business are generated at a rapid pace, both on and offline. A wide variety of industries, such as banking and the financial sector, have to manage thousands of different types of documents regularly. Keeping up with this cumulative documentation can be difficult and costly. A digitized, paperless business means an increase of the office’s organization, work efficiency and more accurate results.

Optical Character Recognition (OCR) technology offers the opportunity for businesses to get a better digital handle on their information. Additionally, in a digital format, they can obtain a better range of control and manipulation over stored data. To learn more about the basics of OCR’s definition and application to different types of businesses, head here. This article will provide you a deeper overview of OCR in the context of eDiscovery, its increasing importance and how it makes a difference in online investigations.


What is OCR? 

Optical Character Recognition (OCR) refers to the technology that converts written characters, or even handwritten text and numbers, into digital data. With OCR, businesses can digitalize, store and manage their documentation in readable and searchable digital format.

OCR is more sophisticated than simply scanning, where documents are simply recorded as a copied image. OCR recognizes individual letters, characters and numbers, converting them into documents that can be manipulated, edited and searched.

The first OCR technology appeared 100 years ago, with the invention of the Optophone, a reading device for the blind that translated letters into sounds, facilitating understanding and communication. In the 1990s, OCR became more popular, as it facilitated the digitisation of newspapers. Later, in the early 2000s, OCR became a cloud-based service and could be accessed on desktop and mobile devices.

OCR evolved rapidly throughout the years, and modern OCR software is far more advanced, providing a result close to perfection in terms of accurate recognition and conversion of documents. An example of modern OCR is the Google Translate app, which allows you to point your smartphone to a document or sign and see it translated in real time.


How Does OCR Technology Work?

The OCR process can be divided into three distinct stages: image pre-processing, character recognition, and post-processing of the output. First, the document is scanned, prepared and the image is converted into black and white to facilitate better recognition. Then, it is time for the software to identify the characters. This is achieved in a number of different ways.The most basic forms of OCR match the pixels into an existing font database, while the most sophisticated forms of OCR break down each character into constituent elements, such as curves or corners, to match physical features and actual letters. The software can even use a dictionary and AI technology to ensure a high level of accuracy. After these steps, a digital text file is produced and converted into searchable documentation.

Banks, law firms, insurance, healthcare and tourism are some of the industries that most use OCR to scale their services and improve customer experience. A number of different types of business processes are automated by OCR and supporting technologies. The ability to digitize and convert a paper document or image into a digital format saves companies time and streamlines their workflow.

The video below by Techquickie provides an excellent overview of the definition, evolution and main uses of OCR in businesses.


OCR in eDiscovery and Online Investigations

Considering all its uses and benefits, OCR represents a great opportunity for eDiscovery and online investigations. Businesses have an obligation to produce electronically stored information as requested, and having a system in place which renders files created in any original format searchable greatly facilitates the process of location. Simply scanning documents would not provide the same functionality that OCR achieves in this regard.

As OCR can rapidly convert images or any paper-based documentation into searchable and readable digital files, the technology allows companies to speed up the process of locating specific pieces of information. OCR also reduces the margin for human error by removing the need for the manual review of masses of paper documents. 

This process not only avoids mistakes, but it allows you to categorize and search digitized information by keywords, names, dates, etc. — allowing for better information governance. Removing the need to trawl through high volumes of information, you can simply search for the date or keyword you need. This streamlining of the search process carries the potential to drastically reduce the associated costs of eDiscovery.

With OCR, paper documents are converted into electronically stored information (ESI). ESI is easily protected and can be edited, copied, and distributed at will. Paper-based documentation is much more vulnerable to damage or destruction. OCR helps to ensure your information is more secure as it is held digitally.

It is important to note that OCR processing can take time, especially if you have a large volume of data. In this case, companies must ensure that they have a process in place to ensure that OCR is applied to all their documents as standard practice, before they are needed, for example, as the result of a legal hold.

This practice also means businesses have time to double check to ensure the process has been completed properly. Although the process is not entirely infallible, modern OCR gives a much better degree of accuracy, and helps businesses scale their processes of document archiving.


Why Is Searchability In Legal Documents So Important?

OCR automates the process of digitizing images or paper documents, which can have a transformative effect on the way that information is preserved. Paper-based discovery is replaced by searchable digital data, such as a PDF file or Word document. But what makes searchability in legal documents so important? 

  • Court requirements: Most courts require text searchability. During an investigation, they can check if you used OCR software once your documents are eFiled. 
  • Time and cost savings: The manual effort of digitalising a high volume of paper discovery is not only time consuming, but expensive. With OCR businesses save time and budget. 
  • Higher accuracy: OCR minimizes errors, such as typos, grammar mistakes and incorrect sentence structure. You can have an accurate replication if needed. 
  • Manage handwritten discovery - Many legal notes and paper discovery are handwritten and OCR software is able to process and digitalize those documents. 
  • Quick and easy access to files - OCR facilitates the ability to quickly search for and locate individual words within extensive files. When you need to work at speed or pinpoint an incredibly specific section of text, this can be a gamechanger.

More Ways That OCR Enables Better eDiscovery 

The benefits of OCR go beyond simply rendering physical documents digitally searchable. It also has a range of other uses in relation to eDiscovery processes, reducing manual work of handling legal documents (which can be a time-consuming and costly task.)

OCR enables businesses to identify text in images. For example, when an image is scanned and interpreted with OCR, the text contained within it becomes searchable. The software has its own way of scanning files so that it recognizes the specific characters and spaces on the document. That means that you can search for keywords throughout your image files, just like any other document. 

Another great feature of OCR technology is the ability to make video captions searchable. It facilitates and saves so much time in cases where long videos (for example, witness statements or testimonials) need to be searched for key phrases or conversations. OCR is great for pinpointing exactly who said what and when. Captions are easily generated, but OCR can take things a step further by rendering the visual output of the captioning into a fully searchable text document.

OCR is also very effective when it comes to translation. The more advanced OCR systems have their own dictionaries and are programmed to understand how words and sentences are properly constructed. This ensures the right language and grammar in terms of clarity and correctness. This way, you can make sure your document is digitized accurately and free from errors. 


How To Implement & Benefit From OCR

There are a number of different ways that OCR can be integrated into your business, according to your needs. Here we will outline the main ways you can apply and use OCR technology in your company.

  • Built-in OCR Software: Some scanners come with built-in proprietary OCR software, so you can make sure your documents are searchable as soon as they are scanned. 
  • Third-party OCR software: It is possible to implement stand-alone, third-party OCR software and install it on your teams’ computers. This can be very effective, but your employees need to remember to apply OCR consistently to any paper-based documentation they generate. 
  • Integrated OCR software: You can use a document management system with integrated, automatic OCR. This type of implementation allows you to store, organize, manage, and automatically apply OCR to all your documents. 

Regardless of the type of implementation you prefer, some form of OCR is essential to the production of searchable digitised documents. As previously discussed, there are various ways in which OCR can benefit an organisation, especially if there is an ESI request as a result of litigation. OCR ensures your documentation is digitized, stored accurately and easy to access.


OCR & Social Media

Our online activity plays a major role in legal matters, as we generate, store and share high volumes of data in this manner.

Due to the relevance of social media in litigation, companies need to implement a proper system to ensure all its relevant online data is properly preserved. Discoverable information could be stored on the user’s social media accounts so organizations need to consider chat and social media platforms as crucial sources of ESI. 

Thanks to OCR, social media eDiscovery does not need to represent a problem for companies. With the power of OCR, solutions such as WebPreserver allow businesses to export in a range of fully searchable formats.

You are able to capture a social media timeline, posts, web pages, comments, images, etc and export all these data as a searchable OCR PDF. This kind of approach speeds up eDiscovery processes: accurate evidence can be located with increased precision.


OCR: For Every Modern Business & Better eDiscovery

The benefits of modern OCR are applicable across many sectors, often running in the background fulfilling a supporting function. It might not always be customer-facing but OCR facilitates many everyday practices. OCR makes digital files more dynamic and usable and it is highly beneficial to modern businesses with regard to eDiscovery. OCR has massive potential for time saving, cost saving and ensuring your business is properly prepared in case of a litigation process. 

OCR technology is truly revolutionary when it comes to the searchability of documents. You get better outcomes for justice with fuller access to evidence. Without OCR, it would be very difficult to extract fine detail from the same quantities of analog data and crucial evidence might have been missed or overlooked. 

If you would like to know more, download this paper to learn more about collection and preservation of data.aug_21_2


Peter Callaghan
Peter Callaghan
Peter Callaghan is the Chief Revenue Officer at Pagefreezer. He has a very successful record in the tech industry, bringing significant market share increases and exponential revenue growth to the companies he has served. Peter has a passion for building high-performance sales and marketing teams, developing value-based go-to-market strategies, and creating effective brand strategies.

Related Posts

The Power and Pitfalls of Social Media Evidence in Trademark Infringement Cases

In this article we'll discuss a few recent cases that reflect how social media evidence can play an important role in establishing consumer confusion in trademark infringement lawsuits.

5 Reasons Why Native Format Collection is Essential for Social Media Evidence

As succinctly noted by The Florida Bar Association in its publication, Florida Law Journal:

Why Hash Values Are Crucial in Digital Evidence Authentication

Before hash values, proving the authenticity of digital evidence could be tricky — especially if opposing counsel was determined to exclude the evidence.