Every innovation organisation we work with has the same problem, even if they don’t describe it in the same way.

Years of reports. Hundreds of PDFs. Feasibility studies, programme evaluations, market analyses, technology assessments. All of it sitting in shared drives or document management systems, theoretically accessible, practically invisible.

Take me to the worksheet download

This is an age old issue. Search for something specific, a methodology used in a 2021 pilot, a statistic from a sector review, a diagram showing how two technologies interact, and you’ll either get nothing useful, or you’ll spend an hour clicking through documents that looked relevant but weren’t.

This isn’t a filing problem. It’s a representation problem. And new research from our infrastructure partner Weaviate shows exactly why traditional approaches fall short and what genuinely works instead.

Research: https://arxiv.org/pdf/2602.17687v1 / https://weaviate.io/ 

Why PDFs are hard for machines to understand

When most search systems process a PDF, they do one of two things.

They either extract the text and index that, discarding every chart, diagram, table layout, and visual relationship in the process. Or they treat the whole page as an image, which preserves the visual content but loses the ability to search meaningfully by meaning.

Innovation documents are inherently visual. A technology readiness assessment doesn’t just live in its paragraphs, it lives in its matrices, its comparison tables, its ecosystem maps. A programme evaluation report communicates as much through its charts as through its prose. Strip those out and you haven’t just lost formatting. You’ve lost knowledge.

What the research shows

Weaviate’s IRPAPERS research, published in February 2026, tested how well AI systems can search through large collections of complex documents. Think of it as a rigorous stress test of exactly the kind of challenge your organisation faces every day.

The headline finding is simple: AI that reads documents as images and AI that reads documents as text find different things. Neither catches everything on its own. But when you combine them, you get something significantly more powerful than either approach alone, a system that can find the right document whether the answer lives in a paragraph or in a chart.

The second finding is equally important for anyone thinking about AI-assisted knowledge work. When the AI was given five related documents to draw on rather than just one, its answers got dramatically better. It turns out that the richest answers come from connecting information across multiple sources, exactly the way a knowledgeable colleague would respond, rather than simply pointing you to a single page.

What this means in practice

For innovation programme managers and marketing leads the implications are concrete.

Your document library is a knowledge asset that is currently underperforming. The information is there, in reports, in evaluations, in the accumulated output of years of programme delivery. But it’s locked in formats that traditional search cannot properly interpret.

A properly architected AI search system over your document library should be able to answer questions like:

  • Which of our past feasibility studies addressed battery storage in maritime applications?
  • What methodologies have we used to evaluate SME engagement across our programmes?
  • Which technology areas show the most activity in our 2023 and 2024 sector reports?
  • What outcomes did our previous cohorts report at six months post-programme?

These aren’t questions that require a human to spend a day searching. They’re questions that a well-designed system should answer in seconds, drawing on the full visual and textual content of your documents, not just the text that survived extraction.

How we build this

At Temper Digital, our Connected Infrastructure is built using Weaviate, the same vector database infrastructure behind the IRPAPERS research. We chose Weaviate because they are the team actively pushing the boundaries of what AI-powered document search can do, and this research proves it.

When we build a knowledge infrastructure for an innovation organisation, we’re not just indexing documents. We’re creating a system that understands your content at multiple levels simultaneously, the words on the page, the structure of the document, and the visual information that text alone cannot convey.

The result is a searchable, queryable knowledge layer across your entire document estate. Programme teams can find what they need. Leadership can surface insights across years of activity. External stakeholders can be given structured, AI-powered access to relevant outputs, without requiring manual curation.

The system can be built internally with public facing controls embedded into your website to act as a lead generation tool for people who want to access your reports in a meaningful way.

Where to start

Before thinking about technology, it’s worth taking stock of what you actually have.

Most innovation organisations are sitting on more structured knowledge than they realise, but it’s scattered across years of outputs with no clear picture of what’s there, what’s useful, and what questions it could answer.

A simple exercise can be illuminating:

1. Map your document estate. What PDFs, reports, and documents does your organisation hold? Where do they live, how far back do they go, and who currently has access to them?

2. Identify what’s genuinely valuable. Not everything needs to be searchable. Which documents contain knowledge that people actually need to find, evaluations, sector analyses, programme outputs, technical assessments?

3. Imagine what you could build. If someone could ask a question and get a synthesised answer drawn from all of that material, what would that unlock? Who would benefit, and what decisions would it improve?

That third question is often where the real value becomes visible. And it’s a conversation we’re always happy to have.

Download our worksheet and run your own audit.