API-Based Preservation Challenges

If you’ve recently spent time investigating the various technical approaches to capturing, preserving, and organizing online content, you’ve likely heard of tools that use application programming interfaces (APIs, for short) to archive social media feeds and other websites. While you may have heard a good deal about APIs and their use, it’s understandable that you may remain unclear on some of the finer points of what APIs are, and how exactly they function to collect data.

This is important information to understand, as it often makes a significant difference concerning the data you want to capture – and if and when you’re able to capture it at all. And the truth of the matter is, when it comes to collecting evidence that could make all the difference in your case – you need to be certain that you can capture the evidence you want, when you want it, and in the easily authenticated format you need.

What is an “API”?

In essence, an API allows a third-party developer to access the functions or data of another application. To use a widely-known example, there are APIs utilized on Facebook which enable companies to collect various types of data about Facebook users. That data then allows the third-parties to do things like target advertisements, create games, and tailor posts to suit their audience and function best within the Facebook platform.

Often, APIs are allowed by the primary platform because they ultimately provide some benefit to the primary platform. For example, if the third-party developer is an event organizer, it may be allowed by Facebook because it utilizes the “events” section of the Facebook calendar, or by Google, to add events to the Google calendar function. The relationship, in this way, is symbiotic, as both the third-party developer and the primary platform receive a benefit.

Social networks essentially offer two entry points for the collection of data, first, as interfaces for users, and second, as interfaces designed for interaction with APIs. While APIs were initially intended for programmers that were building apps, various researchers and evidence collection competitors have utilized them to collect social media postings online. While APIs, when they function properly, are typically able to capture the majority of data and Metadata contained on a site, there are a number of potential pitfalls with using an API data collection method that many might not realize without some investigation into the realities involved.

Challenges in API-Based Collection

API-based collection methods, though they may be generally effective in collecting data, when they function properly, have more than a few drawbacks. One is that those who utilize the APIto collect the data don’t always get a fully recognizable “rendering” of the data that is collected.

Instead, a string of data is often provided, which may include the information you need, but does not necessarily resemble the format of the website it is supposedly collecting data from.Another issue often encountered is that there may not even be an API available for the specific site you need to collect, or if one exists, it may come with terms and conditions that you can’t or don’t want to agree to. Furthermore, instead of using just one web capture tool that can search, crawl, and capture any online content, each website that is archived through the use of an API must have its own software to interact with that API. This can often create difficulty as well.

Unfortunately, however – and this is important – perhaps the biggest drawback of utilizing an evidence collection tool that depends upon APIs means that, by default, you are subject to the whims and decisions of the platform developer who is providing the API, as well as other developers who are also using it. What does this mean, from a practical standpoint? Simple. If the API provider, for example, Facebook – decides to change, remove, or limit API use in response to something like a data leak, or inappropriate use of its data, then the tool which you are using to collect the data will no longer work.

Well, how likely is that to really happen, you wonder? The answer is – it already has, and it is just as likely to continue to occur in the future. As, this article and many others like it make clear, following the recent Cambridge Analytica data leak, Facebook and a number of other social media providers including Instagram and Twitter announced significant changes to their platforms, including API limitations restricting the amount and types of information that developers can access about network users. These policy-level changes closed a door that no third-party software developer can open. In fact, Facebook treats any attempt to pull metadata through its API as an attack by a “third party malicious actor”, which is why we are routinely told that Facebook examiner accounts are locked and suspended when using an API-based tool. It has created a critical challenge for investigators, legal service firms and law firms who previously relied on these methods.

Consider these thoughts from leading thinkers in the tech industry on the API-based collection difficulties as a result of the changes to social media platforms:

Since 2018’s Cambridge Analytica scandal, collection has been difficult due to platforms like Facebook limiting API availability.
-Wired Magazine-

Facebook is significantly limiting data available from Facebook’s Events, Groups, and Pages APIs, plus Facebook Login, as well as Instagram accounts. Facebook is also shutting down search by email or user name and changing its account recovery system after discovering malicious actors were using these to scrape people’s data.
-TechCrunch-

If your practice – and the positive resolution of your cases – depends on the efficient, effective collection of evidence when you need it, you may likely be recognizing a clear problem here.

The truth is, that at any time, Facebook, Instagram, Twitter, or any mainstream social platform could decide to restrict API usage, as some have already done. When that happens, those utilizing API-based evidence collection tools will find themselves in a very difficult position indeed. As TransPerfect Legal Solutions recently stated, “With recent technological changes to Facebook and Instagram, WebPreserver has become our preferred method for capturing those accounts.”

Why WebPreserver?

WebPreserver is API-free, fully automated, and innocuous. It is future proof, and not beholden to any third party. Aware of the inherent challenges in the tiny niche market of legally admissible web capture technologies, WebPreserver presents investigators and attorneys with the only fully-featured alternative to broken API-based collection practices. WebPreserver can collect the data you need at any time, at any location, in just seconds. It fully meets all of the standards of forensic electronic evidence, and solves the problems associated with API-based social collection methods.

Don’t pin the hopes of your case (or clients) on collection methods that are outside your ability to control. Don’t let the policy decisions of social media platforms control the outcome of your evidence collection. With WebPreserver, you can be sure you’re collecting the evidence you need, in the format you need it – every time, now, and in the future. If your firm collects social media evidence often, WebPreserver is the only product that makes sense in today’s technology climate.

API-Based Preservation Challenges