OSINT Expert Series: Meet Jesse Ward

Few people have had a front-row seat to the evolution of online investigations like Jesse Ward.

As Director of Customer Support and Services at Pagefreezer, Jesse has helped thousands of investigators, regulators, and legal teams conduct investigations and leverage WebPreserver, Pagefreezer’s web capture browser plugin, to collect defensible digital evidence at scale.

With roots in journalism and hands-on digital investigations, Jesse brings a grounded, real-world perspective to how OSINT work actually happens in the messy, high-stakes environments investigators navigate every day.

In this interview, he shares what he’s learned from years of guiding users through complex collections:

The most important factors in a successful investigation
Where people routinely underestimate the work
How platform shifts are reshaping the OSINT landscape

Read the interview or watch the video below to get Jesse’s perspective on what’s changing in OSINT and what still matters most.

Editor’s Note: This interview has been edited for length and clarity.

Kyla Sims: Ok Jesse, how did you end up at Pagefreezer? What is your background?

Jesse Ward: My background is in journalism and compliance. I have a Bachelor of Journalism with honors from the University of King’s College in Halifax.

I was the editor of the campus paper at Dalhousie University for two years and won the Student Journalist of the Year award from the Canadian University Press.

Then I was a documentary associate producer with The Fifth Estate at CBC Toronto, making documentaries and doing freelancing for different Canadian newspapers. I wrote for Vice for a while and did some freelancing, and then I realized journalism would not be a lucrative career path and I wanted to do something different.

So I started learning to code, self-taught, and I had a job with the Nova Scotia Association of Realtors for two years, working on compliance automation solutions for them.

Then I found an opportunity at WebPreserver.

This was when WebPreserver was still a separate startup from Pagefreezer and only had 30 customers and two full-time employees. I applied for a customer support specialist role that I thought would be a great fit for my combination of investigative, writing, communication, and compliance skills.

There’s a very long, funny story about how that played out.

Kyla Sims: Was it a big change for you when Pagefreezer and WebPreserver merged, or did you continue to primarily look after WebPreserver?

Jesse Ward: Within six months of joining, WebPreserver merged with Pagefreezer.

So my first two months at WebPreserver, I was one of two full-time employees, responsible for the product roadmap, customer support, sales, QA testing, and planning. Timur, who’s still with us as a software engineer, co-programmed the fixes we needed and wrote the new features that I determined we needed from talking to customers.

So it was a real trial by fire.

At the time, that was actually sustainable because we only had about 30 customers and it was a bit of an experiment in the market.

It was a really exciting time to be part of the rapid growth of a product, going from, in just a couple years, a product that only had two team members working on it to a full team.

Kyla Sims: Did WebPreserver look a lot different than it looks today?

Jesse Ward: The basic functionality [of WebPreserver] was the same, but there were far fewer features.

Customers were finding real value in automating processes they were previously using entire teams of paralegals or contractors for. With the time we were freeing up for them, they were happy to talk to us about what they would like to see the product do.

That’s where we started learning more about the file types they wanted to export, the ways they wanted to perform discovery on the evidence they collected, and how to scale that out.

For our private investigator customers, we learned more about how they wanted to present the evidence they were collecting to their end users.

“We learned all of the really niche things that you only learn from
talking to customers. That made the product really fit…”

And we learned all of the really niche things that you only learn from talking to customers. That made the product really fit people’s desired workflows. Things as niche as all the different formatting types PDFs should fit so they’re compatible both with digital e-filing systems used by courts and with being printed and put on someone’s physical desk — and what information needs to be contained in the metadata on every page and how that needs to be formatted to still work if it’s printed.

Now it is quite a bit more comprehensive and feature-rich.

Kyla Sims: It’s funny, I think of you as one of the WebPreserver doulas, who brought it into the world and raised it.

Jesse Ward: Yeah...

I mean, Diego was the first person at Pagefreezer to work on WebPreserver. When I started, WebPreserver was already about two years old. Diego had initially been tasked with drafting specs for a tool that would basically fulfill the niche left in the market by Pagefreezer’s web archiving and social media archiving products.

At that point, those products were already past their infancy and we were doing a great job in the market attracting some enterprise customers for web and social.

But the question we kept hearing from prospects who wanted a comprehensive solution was: “Not only do I need to keep a record of everything I’m publishing and how my intellectual property is appearing in the market online, I also want to know how my intellectual property is being represented by other people online.”

This would range from use cases where I’m a tobacco company’s lawyer; the company exists in more than 30 countries, and every country has a different team of social media managers. It’s simply too difficult to scale out the work of authenticating all these social media accounts to a tool like Pagefreezer Social Media Archiving, but I still want to keep tabs on everything I legally have to.

How can I still accomplish that if I don’t have authentication?

Well, here’s a tool that will go and collect all the URLs to the posts published by these accounts, and you can use it to collect everything they’ve published. It’s more time-consuming for you, more manual, but it’s how you can still accomplish that if you don’t have authentication.

There are similar use cases we’re familiar with today for things like archiving your own retail luxury product website, but also keeping records of what other people are publishing when they could be infringing on your copyright.

So the initial intention was: build tools on one side that can capture everything you publish on the web and social media, then on the other side, capture other people’s web and social media.

The product development cycle was really led from the perspective of: what are the biggest pain points there, and how do we extrapolate from there?

Try to scale up the scope of what we can capture and automate, while being aware that for every new automation we add, there’s extra debt to pay because we have to keep them up to date.

Kyla Sims: That’s interesting. I didn’t realize you were here right at the beginning.

Jesse Ward: Almost six years now.

Kyla Sims: That’s very cool. So nowadays, what does your typical week look like at Pagefreezer?

Jesse Ward: What my week looks like is a combination of tasks associated with ensuring my team is unblocked and have the resources, training, access to information, and space to experiment and make mistakes that they need to fulfill our mandate:

providing a stellar pre-sales experience in a sales engineering role with the sales team,
ensuring that when the sales team lands new deals, we perform onboarding, implementation, and training to a spec that earns a customer’s positive sign-off within 30 days and preferably under 14 business days, and
supporting our existing customers through best-of-industry technical customer support.

Practically, that looks like a lot of meetings.

Kyla Sims: That makes sense. What kind of customers reach out for support the most? Is there a particular flavor?

Jesse Ward: It’s going to be customers who are dealing with the highest-priority deadlines.

For WebPreserver, we can have customers reach out on a Friday evening. We set the expectation in our contracts that we have customer support 40 hours a week, from morning to evening Eastern time. But that’s not the hours that lawyers or investigators necessarily work, because they need to stay on top of collections.

So anyone who’s dealing with a tight deadline on the WebPreserver and collections/investigation side – legal teams and investigators, not so much actual regulatory bodies – will have high-priority collections.

The scenario we encounter multiple times a year is: “I finally found the smoking gun of the social media profile for someone they thought I’d never find. It looks like there’s a lot of gold here, but they could be hot on my trail and close this down any minute. How can I collect everything as soon as possible?”

“The nature of these investigations is that as soon as you find anything
you want to collect, it becomes an immediate priority.”

The nature of these investigations is that as soon as you find anything you want to collect, it becomes an immediate priority. So we hear from those users if they’re running into any challenges.

But the nature of support is also that we aim to have trained customers enough that they can successfully overcome challenging collections or audit projects simply through training on the product.

Kyla Sims: What’s something you find yourself explaining pretty often to customers?

Jesse Ward: People want to know what WebPreserver is actually doing.

When we’re showing WebPreserver in a demo, we often get questions like, “Can WebPreserver get content from profiles if I’m not friends with that profile and the content is private?”

People don’t have a concept of how it works. The way that social media or websites deliver information to an end user is really opaque. You don’t know how it gets there; it just shows up.

So when you’re considering a tool that collects the end result, it’s similar: you don’t know how the information got there in the first place, and you don’t know how this tool is going to access it and then save it.

The first questions people usually have for a tool like this are: “Is this legit and above board?”

At the demo stage people want to know, “Is this serious and above board?” because there are people online who will promise services to go collect evidence that is behind a login…basically, they want to know we’re not hackers.

That’s a check people make because there aren’t many tools that do what WebPreserver does.

“There aren’t many tools that do what WebPreserver does.”

Then the next question people have is: “How can I trust that what I am capturing is comprehensive and legitimate?”

Most people we talk to don’t have much idea about the difference between social media APIs – where platforms allow data from their platform to be translated into other platforms – versus how social media serves content to an end user on mobile or desktop.

That’s something we have to educate people on. We tell people that what we’re really doing is providing a managed service to overcome the challenge posed by these social media companies keeping their APIs opaque and sealed off from researchers, investigators, and law enforcement.

We’re providing a managed service of reverse-engineering how they serve content to an end user to perform very elegant automations that capture the end result of what they serve you, in as comprehensive a manner as possible given the environment we’re working in.

So your question was, what do people ask the most? But my answer is that we try to preempt as many questions as we can about how it works, because the questions are:

Is this legitimate and above board?
Is this just a screenshot tool?
Is this using their APIs, and do you have API access?
Is this happening locally or in the cloud?

We try to preempt all of that and capture this general question of “What is this and how does it work?” by saying that WebPreserver is a tool that automates activity in your browser, to automate the discovery, expansion, and collection of social media and website records in a manner that is comprehensive, transparent, and happens completely locally so that you maintain a complete chain of custody.

That summary usually answers the top five questions people have.

“WebPreserver is a tool that automates activity in your browser, to automate the discovery, expansion, and collection of social media and website records in a manner that is comprehensive, transparent, and happens locally so that you maintain a complete chain of custody.”

Kyla Sims: What is something that tends to make the biggest difference in how smoothly a capture or an investigation goes?

Jesse Ward: The thing that makes an investigation or collection process as easy as possible is identifying the scope of what you’re looking for as quickly as possible.

The greatest enemy you’re going to run into is the entropy of collecting and reviewing more than you need.

If we’re sticking specifically to WebPreserver, what matters most is that you know… ideally, if you’re an investigator and your investigation enters the online world – as it almost inevitably will – the perfect tool for you would be a database that has a record of everything that’s ever been published online, easily searchable, where you can be as greedy with keywords as you want.

Of course, that’s not the environment you’re working in. The environment you’re working in is one where everything is changing constantly, records are being removed and added very quickly, and the scope of records you can access is the cliché tip of an iceberg.

Your time is your most valuable asset.

So how can you understand what you need and narrow that down as fast as possible?

If you identify the Facebook page of someone, and you want to understand, “Have they ever mentioned this keyword?” or “Has this person ever interacted with their posts, ever commented on their posts or photos?” — the immediate value proposition of WebPreserver is that it will enable you to collect the URL of every record this person has published.

There’s no native feature within the social media platform to be able to search everything they’ve ever published on their profile and every comment they’ve ever received, but we can reverse-engineer what the platform serves you to allow you to do that.

What it takes is scrolling through everything, scraping the URL, copying all of that, opening up every record one page at a time, automating the expansion of everything. When it’s done, you’ll have a record you can export, and then you can search those keywords.

You can do that for one person, but you’re not going to do that for everyone they’re friends with. So understanding how long things will practically take and shrinking the size of your investigation down to a realistic scope – knowing the principles to follow to narrow your scope as soon as possible and figure out where your best leads are – is crucial.

Like any project plan, the first and most important thing is to figure out where you can afford to invest your time before you get greedy with trying to capture as many records as possible.

“The greatest enemy you’re going to run into is the entropy of collecting and reviewing more than you need…Your time is your most valuable asset.”

Kyla Sims: Would you say that’s the part of the process people underestimate or overlook the most?

Jesse Ward: There are lots of aspects of social media and web collections that people underestimate, ranging from the sheer volume and scale of content on the social media side to how complex some of it can be.

We deal with lots of cases where we have to help investigators explain the scope of what’s been asked of them to the legal team they’re working with.

Commonly they’ll say, “Great, I found this Facebook page, but I started using WebPreserver to capture it and I realized this person posts on Facebook more than a dozen times a day – and they have since 2007.”

That’s not an exaggeration. There are a lot of people who use platforms that way. As a result, if you were going to collect everything they’ve ever published, it would be more than 50,000 records.

What people underestimate most is that for any website or social media profile you encounter, you can easily underestimate the depth of what you’re embarking on, even as you’re trying to narrow your scope.

Similarly, on the website capture side, we’ve dealt with many cases where someone thinks they have a simple collections project. For example: a company that sells household furnishings, and the task is simply to capture a record of how they present all of their products because there could be potential copyright infringement.

Then you realize that when you open the page for a lamp, there are options to expand the description for every different type of fabric for the lampshade. This is a real example. Those descriptions might have text directly parsed from your client’s website.

You realize: “Oh s***, I need these. For every lampshade, for every lamp page, I need all of these variations.”

Then you get into an exercise of combinatorics. It becomes mathematical: you’re running the equations and realizing, “I don’t just need a thousand pages. I might need 16,000 pages” because they can all require the page to be interacted with in a certain way to show certain text. Those are real examples.

Additionally, there are things like retail products and the reviews left on them. We’ve worked with regulators who are looking into things like: are youth commenting on the reviews for tobacco products or other regulated products that are not for minors?

If people are leaving comments that seem like they are coming from minors, or sometimes posting pictures of themselves using it and they’re clearly a minor, you realize you need to capture all of that.

What in your mind was just capturing a thousand retail product pages and their reviews, becomes much bigger.

You thought maybe it was just “click open the reviews tab” or “see if WebPreserver can help automate opening the reviews tab,” but for some of these products there are 30 pages of reviews or more.

You have to click a “next” button to get through all those reviews. Then you realize you’re in over your head because now you need to measure how many pages of reviews each product has and account for that.

So it’s really important when you set out on these projects not just to consider how narrow you can make your scope, but also to understand the specific terrain you’re working in.

We have some great guides for WebPreserver investigations, for TikTok, Facebook, Discord, Reddit, Instagram, X/Twitter, LinkedIn, and Bluesky as well. We’re prepared to help walk people through those very structured terrains, which are actually the best-case scenario. For social media, at least you can understand the full scope of what you may need to collect for the accounts or groups you’re interested in.

For websites, it’s a completely different picture where there’s no standard implementation and you need to know what really matters to your client or to your case so that you can quickly survey the scene and understand: “Is this something where I can run a straightforward bulk capture, or is this something where I might need help from an expert to understand it?”

Kyla Sims: Were you doing social media investigations in your journalism career? How different does the space look now?

Jesse Ward: There were competitor products in the market at the time that actually just completely died in the water; the people running those projects decided they wouldn’t maintain them after social media APIs cut off access.

It used to be as easy as paying for an API license to something like Facebook and then building a wrapper around that to build a tool that could submit queries against their social media graph API – things like, “Find people with this name who live in this town and went to this college.” They made it super easy.

Once that all got cut off, WebPreserver… well, the urban legend around WebPreserver is that we simply didn’t want to pay for those APIs, so we built a tool that would circumvent that by using automations we would maintain to interact with content that’s served to the user.

That turned out to be our saving grace, because we had this rush of new customers saying, “All right, I was using this other tool but it doesn’t work. Are you guys affected by this?”

“We know APIs are always unreliable. We pay engineers to stay on top of this stuff and we have test suites to make sure we can keep automating.”

And we could say, “Happily, no, we’re not, because we know APIs are always unreliable. We pay engineers to stay on top of this stuff and we have test suites to make sure we can keep automating this. So we’re happy to set you up with a license right now to continue your investigations.”

That’s one way things were very different. It was really the Wild West in terms of permissions, but WebPreserver was already prepared to adapt.

The other thing that was different in the mid-2010s and earlier for social media investigations was simply that people used to publish so much more publicly on social media.

Most people are not so laissez-faire now with posting pictures of last night’s party or their recent criminal activity. You still see some of that, but social media is increasingly becoming a walled garden.

The internet is turning into a dark forest where people are comfortable sharing pictures of their children or last night’s shenanigans in what they believe are more tightly controlled group settings.

Increasingly, for regulators, investigators, insurance investigators – people trying to use social media as a data source to understand what’s happening in a scene – it’s so much harder. You need to look at Discord, Telegram, figure out what Discords and Telegrams exist in a community.

Within those public-facing Telegrams and Discords, you find links to other spaces where people are hopping off: “This is the Los Angeles Discord, but for those of us who like racing, we’re branching off into this other Discord and that one’s private.” Now you want to join that one because the people who are into racing might have information you need.

You have to chain yourself through as many private online communities as possible where some of the stuff might surface.

“It was really the Wild West in terms of permissions, but WebPreserver was already prepared to adapt.”

Kyla Sims: Have the expectations or the skill sets of investigators also shifted?

Jesse Ward: There’s a very slow emergence of awareness in this industry for standards for discovery, collection, and presentation of evidence because the legal standards are ambiguous and the nature of the information is so tenuous.

When I say the nature of the information is tenuous, I mean: what metadata is required, what file types you should keep of what you collect. It takes a while for anything to catch on.

For example, we’re seeing more and more investigators ask us about our ability to capture websites in a WARC format or a MHTML format and export that, because they’re starting to face questions around the legal defensibility of content and they’re looking for as comprehensive a capture as possible.

We also started to hear more questions in the last couple of years than ever before around authentication of digital evidence. It takes so long for law to catch up to technology that it’s really through recent precedent-setting cases over the last six or seven years that people start to consider whether screenshots are enough at all, or whether they should be covering themselves by having digital authentication for every capture.

I’d say you’re starting to see some burgeoning awareness in the last few years of why screenshots aren’t enough and why you need digitally authenticated captures.

In terms of staying up to date on your ability to collect and discover information online, the other big change is an increasing alignment in investigative circles that you can’t rely on other people’s tools because they’ve become too unreliable, which is really interesting.

I think WebPreserver has gained a lot of trust in the investigative market over the years. But if you look at a lot of people who are the leading thought influencers in investigations – there’s the textbook on OSINT investigations that gets updated every year by former FBI agent Michael Bazzell, Open Source Intelligence Technique.

Previously, this was mostly an index of tools you could use and Google queries you could run when you wanted to follow up on a keyword, plus lists of databases that exist.

In the last two years in that textbook, he’s actually stopped referencing any tools on the market and instead is trying to teach people to build their own tools and build their own Swiss Army knife for the types of investigations they want to run.

His argument, which he’s written about, is basically: “We’re done trying to use other people’s tools that rely on APIs or automations at all, because it’s just not reliable enough when it gets down to it. You need to know how to manage this for yourself.”

Kyla Sims: Interesting.

Jesse Ward: I completely see where he’s coming from. But I would argue that in an ideal world, every investigator is also a software engineer and can whip up what they need on the spot.

Education for OSINT people today is becoming very technical, far beyond what I think is sustainable or scalable for the types of people who are usually in these roles, because their background isn’t in software engineering.

The overhead and the likelihood they make a mistake trying to manage their own virtual machines and running their own Python scripts they got from tutorials is high. Maybe in a few years we can get to a point where that’s easier for people through AI tools, but we’re not there yet.

[Interview End]

Talking with Jesse offers a window into the lived reality of modern OSINT work: the ambiguity, the constraints, and the creativity required to keep pace. We hope you enjoyed this interview. Thanks for reading, and stay tuned for more voices from the OSINT community.

Want to hear more OSINT insights from Jesse? Make sure to follow him on LinkedIn here.

Need to collect online evidence fast? Book a WebPreserver demo here.

OSINT Expert Series: Meet Jesse Ward