BLOG

See the latest news and insights around Information Governance, eDiscovery, Enterprise Collaboration, and Social Media. 

All Posts

What is WARC and Why is it Important?

A Brief History of The Internet Archive

For over 20 years, The Internet Archive has built a library of Internet sites and other cultural artifacts preserved in digital form, through the use of WayBack Machine. Over this period of time, it has collected over 279 billion web pages.

Preserving this information and making it accessible to the public is what The Internet Archive is known for, but it is also known for the invention of Web Archive Format Files or WARC. WARC is a file format for the long term preservation of digital data. It stores web pages and other digital resources including images and meta information in their original source code.

WARC Files - The Standard for Long Term Preservation

warc-diagramThe WARC format eventually evolved into an international ISO standard (ISO 28500:2017) for digital asset archival. Since then, WARC has been adopted by many software vendors, libraries and government agencies across the globe as the new standard for digital records archival, specifically for web pages or full websites.

Governments have also embraced this standard. The National Archives and Records Administration (NARA), the nation’s recordkeeper, and the Library of Congress adopted WARC as the only acceptable file format for the long-term preservation of website & social media records according to Bulletin 2014-04, "Format Guidance for the Transfer of Permanent Electronic Records".

With WARC as the standard, the ability to create and present WARC files has become an expectation and a need.

WARC for Social Media

PageFreezer has been offering WARC exports for websites for many years now, but providing WARC formats for social media records was a completely new concept.

warc-download

PageFreezer is proud to announce that it is the first vendor to release WARC exports for social media data. Customers can now export single social media posts, complete social media timelines or selections of social media records in WARC with a single click from the PageFreezer dashboard.

WARC for Digital Forensic Investigations

But WARC provides more than a nice feature for government agencies to comply with FOIA and Open Records laws. The standard is also relevant for corporations seeking social media evidence for eDiscovery purposes. WARC exports of your social media records include all the metadata that is provided via the social media API, the HTTP header metadata and all the digital resources used in the message like video, audio and images in combination with the actual social media message, making it a valuable source for digital forensics investigations and legal authentication.

By taking advantage of PageFreezer’s new WARC export capability, your social media archives will now automatically comply with NARA’s record-keeping guidelines.

If you want to learn more how WARC can help your organization better comply with Open Records regulations or help in eDiscovery, contact us now.

Related Posts

Data Archiving for Government: Battling Disinformation & Misinformation

The internet has become a breeding ground for misinformation and disinformation. According to the Pew Research Center, Americans' exposure to––and belief in––misinformation differs by the specific news outlets and the general pathways they rely on. More specifically, people who rely on social media are more prone to consuming misinformation and disinformation. 

What is a Document Retention Policy And Why do You Need it

Any business that deals with sensitive information needs a document retention policy. Does your business have staff you keep records on, for example? If so, your business deals with sensitive information.

What's the Difference Between a CMS Backup and an Archive?

Building websites used to require a lot of time, knowledge, and manual coding. But with today's sophisticated content management systems (CMSs), building and managing a website is much easier. While customizing features may require technical know-how, the general rule is that most websites are built and powered by an intuitive and user-friendly CMS.  However, CMSs were not expressly designed for archiving data. Storing CMS data requires more than hitting save when updating a page. A CMS is not an archive system; your website data isn't necessarily automatically saved in versions—nor is the backup data easy to access. That means you may struggle to identify and access the backed-up data you want to access. Hence, it's best to have a system dedicated to archiving and backups.