Schedule a Demo

BLOG

See the latest news and insights around Information Governance, eDiscovery, Enterprise Collaboration, and Social Media. 

All Posts

What Is a Data Inventory (Data Mapping)?

We’ve mentioned before how important information governance is. However, with the sudden shift to remote work caused by COVID-19, having thorough systems and processes in place to manage information has proven more important than ever. 

rsz_adobestock_184811686As employees suddenly depended entirely on platforms and solutions like Slack, Workplace from Facebook, Microsoft Teams, Zoom, and G Suite to communicate and collaborate, monitoring and managing this flow of data has become crucial as well. Simply put, if there were any weaknesses in an organization’s approach to information governance before, the Coronavirus Pandemic has highlighted it.  

One indispensable tool in this world of distributed teams and cloud-based solutions is the data inventory. For companies looking to understand what data they’re holding and where exactly it lives, the first crucial step is a comprehensive data inventory.

What Is a Data Inventory?

A data inventory (sometimes also called a data map) is simply a single source of truth—with relevant metadata—that provides instant insight into all sources of data a company has, what information these sources collect, where this data is stored, and what ultimately happens to it. 

“At a minimum, data inventory is important because knowing what data your business collects leads to improved efficiency and increased accountability for everyone in the organization. The results from data inventory can also lead to better overall reporting, decision-making and operational performance optimization,” says Steve Boston, Director of Information Technology Services at consulting firm GBQ. “Without an accurate inventory, it is far more challenging to assess any underlying risk, which can further make it difficult to identify the controls that your organization needs to protect your valuable information assets.”

Only when you start taking a detailed inventory of all your data sources do you realize how much information you’re actually collecting. Here are just some of the data sources many organizations have:

  • Accounting and point-of-sale software
  • Customer relationship management (CRM) software
  • Third-party applications and cloud-based solutions, like:
    • Slack
    • Salesforce
    • Hubspot
    • Microsoft 365
    • G Suite
  • Electronic Data Interchange (EDI) software and solutions
  • Websites with password-protected pages, forms, and chat bots
  • Social media accounts that anyone can comment on or send a direct message to

What’s obvious from the list above is that sensitive and valuable information is collected across the organization. Every department is collecting and storing data for its own purposes—and because of this, it can be easy to overlook important data sources. 

Just consider the average website. It’s likely to exist on top of some kind of content management system (CMS), but might also have a section behind a user login screen with data hosted elsewhere. Then it could also have multiple forms that feed information to cloud-based sales and CRM solutions, as well as a chat bot from a third-party vendor. 

The same is true of internal tools and solutions. Employees could be creating documents in Microsoft Office, G Suite, and countless other lesser-known solutions (like Dropbox Paper), and then sharing them through email and team collaboration tools, which includes everything from Slack and Microsoft Teams to Asana and Trello. And on top of that, they could be hosting (and recording) Zoom calls, during which sensitive information is discussed and displayed. 

Needless to say, keeping track of all of this can be tricky. But given the regulatory environment most companies are dealing with these days, ignoring the problem is not an option.

Data Inventory and the GDPR

The fact of the matter is, it’s not only the sudden prevalence of remote work that makes information governance and proper data inventory so important—it’s also the stringent requirements of modern privacy legislation, like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA).

These regulations demand that organizations know exactly what user data they hold and what they do with it. Additionally, companies are expected to respond effectively to a Data Subject Access Request (DSAR) or Right to Erasure Request

Even though the GDPR does not explicitly require a data inventory, there’s no doubt that it makes compliance with these regulations much easier. For this reason, the International Association of Privacy Professionals (IAPP) states that proper data inventory is a foundational step to complying with a regulation like the GDPR. 

“One can search the GDPR in vain for the terms ‘data inventory’ or ‘mapping’. They are simply not obliged by the plain language of the law,” states Rita Heimes, General Counsel and Privacy Officer for the  IAPP. “But unquestionably, the first operational response to GDPR, essential to building a program that aims to comply with the law, is a comprehensive exercise of data mapping and inventory.”

According to Heimes, this process should consist of the following:

  • Know what the definition of personal data is under the GDPR

  • Identify exactly what personal data is collected and how it is used

  • Find out where data is stored (geographical location and servers)—this should also be done for any third-party systems

  • Map the travel of data through the organization from the very first moment of collection—third parties, vendors, and partners should again be considered

  • Know how long data is retained for

  • Understand what this data looks like—is it structured data in a relational database, or is it unstructured data that could be harder to identify, export, or delete?

The Challenge of Creating a Data Inventory

The ultimate goal of a data inventory is simple in theory but can be challenging in practice. 

“Ideally, the inventory and processes created to support it allow—eventually, at least—the capacity to identify data location and storage information at the level of an individual data subject: What data do I have on Jane Doe, and where is it located? If Jane wants access to her data, how can I be sure to find it all for her?” says Heimes. 

How do you accomplish the above? Predictably, it isn’t easy, which is why the 2019 IAPP-EY Privacy Governance Report revealed that “fewer than half of all respondents report being ‘very’ or ‘fully’ compliant with the GDPR. Among EU respondents alone, 43% report they are only ‘moderately compliant’ with the GDPR, even when GDPR compliance is their primary responsibility. One in 10 admit they are only ‘somewhat’ compliant with the GDPR.” 

And for many organisations, data mapping remains a very manual and time-consuming process. “Many data protection and privacy professionals, perhaps assisted by outside counsel or consultants, begin with a questionnaire,”says Heimes. “Those with adequate time can engage in an initial discovery exercise to unearth their organization’s general personal data life cycles, followed by deeper-dive questionnaires and follow-up interviews, and even workshops.

“While less scalable than technological data mapping tools, traditional questionnaires have the benefit of being comprehensive and can be sent to many people within an organization, allowing for a potentially comprehensive and wide-spread investigation. Their risks, however, include the potential for weak or inaccurate responses, and misunderstanding on the part of those completing the questionnaire who make assumptions and do not or cannot get clarification before submitting their answers. The task of answering the questionnaire may be tasked to someone with inadequate knowledge or awareness.

“Privacy professionals who are in a rush, then, may not be able to use a questionnaire followed by interviews. Instead, it may be necessary to jump directly to in-person meetings. This may take more personnel time—and at a higher level of management within the organization—but is likely the best way to get useful information about data processing as quickly, accurately, and efficiently as possible in the shortest time.”

But what about massive organizations where this sort of labor-intensive approach isn’t an option? While questionnaires, interviews, and Excel spreadsheets can work, dedicated solutions are much better at streamlining the process.  And given the risks that come with non-compliance, any solution that improves the data mapping process is well worth the investment.

Pagefreezer has created an information governance model that specifically addresses data from online sources such as websites, social media accounts, and enterprise collaboration platforms. Download this document to see exactly how online data should be stored, indexed, managed, and disposed of.

New call-to-action

Peter Callaghan
Peter Callaghan
Peter Callaghan is the Chief Revenue Officer at Pagefreezer. He has a very successful record in the tech industry, bringing significant market share increases and exponential revenue growth to the companies he has served. Peter has a passion for building high-performance sales and marketing teams, developing value-based go-to-market strategies, and creating effective brand strategies.

Related Posts

SEC Rule 17a-3 & FINRA Records Retention Requirements Explained

Financial industry recordkeeping regulatory requirements like the U.S. Securities and Exchange Commission (SEC) Rules 17a-3 and 17a-4, and the Financial Industry Regulatory Authority (FINRA) Rules 4511 and 2210, play a crucial role in maintaining the integrity of the U.S. financial markets. These regulations are not just bureaucratic formalities; their oversight involves ensuring that financial services firms adhere to stringent record retention requirements, essential for the transparency, accountability, and trust that underpin the financial system.

The Reddit OSINT/SOCMINT Investigation Guide

According to its IPO prospectus submitted to the US Securities and Exchange Commission on February 22, 2024, Reddit has more than 100K active communities, 73 million daily active visitors, 267 million weekly unique visitors, and more than 1 billion cumulative posts.

Understanding a Request for Production of Documents (RFP)

Requesting production of documents and responding to requests for production (RFP) are key aspects of the discovery process, allowing both parties involved in a legal matter access to crucial evidence.