The knowledge base is auditable. The AI isn’t.

Part two of a two-part series. Part one covers why your AI architecture may not be fit for regulated environments.

I spoke a client a few months ago. A legal dispute had landed, pretty standard for a regulated contact centre, a customer claiming an agent had given them incorrect information.

The legal team’s request was standard. Produce documentation of exactly what information was available to the agent at the date and time of the interaction.

In the old world, that’s a simple fifteen-minute job. Access the knowledge article. Check through the version history for the one that was published on that date. Timestamped, version-controlled, exported, done.

They pulled the knowledge article like normal. That part worked fine.

Then they went looking for the AI chatbot interaction. What did the agent ask? What did the AI system respond? Which sources informed that response?

Nothing.

No transcript. No log. No record of what the AI had told the agent, or even confirmation the agent had consulted it at all.

The knowledge article was auditable. The AI was invisible.

Why you can’t find what the AI said

There are two reasons for this, and both matter.

The first is architectural. When AI knowledge systems are built on general-purpose enterprise indexes, SharePoint, Graph, vector stores, those indexes have no concept of content authority. They don’t know which procedure was approved last quarter and which draft someone left in a shared drive. They know semantic proximity. They know who has access. They don’t know what’s authoritative. You can read more about this in Data Permissions vs Knowledge Governance.

So even if you could go back to a specific date and time and reconstruct “what sources were in the index,” you still couldn’t prove which version of which content actually grounded the AI’s response. The index doesn’t record that. It was never designed to.

The second problem is more straightforward, and even more widespread. Most AI chatbots and agent assist tools don’t keep usable interaction history. They log for performance: query volume, response time, satisfaction scores. They don’t log for evidence. There’s no forensic-level transcript. There’s no source attribution at the response level. There’s no chain of custody.

The system knows the AI was used. It can’t tell you what the AI said.

Both problems need fixing. But you can’t fix either if you don’t know they exist.

The “AI told me to” problem

QA teams in contact centres follow a pretty standard workflow. Listen to the call recording. Check the agent’s statements against the knowledge base. Determine whether the agent followed documented process. Mark and coach accordingly.

That workflow assumes one source of truth. The call recording tells you what the agent said. The knowledge article tells you what they should have known. You compare them.

AI breaks that assumption.

Now there’s a third layer. The call recording tells you what the agent said to the customer. But it doesn’t tell you what the AI told the agent. And those two things may not be the same.

I’m watching this play out right now. An agent tells a customer something that turns out to be wrong. QA flags it. The agent says “but the AI told me to”. QA can’t verify either way.

Now what?

Discipline the agent, and you may be penalising someone who followed the system exactly as intended. Let it go, and you’ve accepted a compliance error you can’t explain. Try to fix the AI, and you’ve passed the problem onto an IT team who can’t identify what it actually said, so don’t know what to fix.

You can’t coach what you can’t see. You can’t fix what you can’t prove.

The “AI told me to” defence isn’t agents making excuses. In most cases it’s a legitimate question. And without interaction logs, neither the agent nor QA can answer it.

From one error to a pattern

A single wrong interaction is a quality issue. A pattern of wrong interactions across months is a compliance issue.

The difference between the two is usually whether you have the data to see it.

With traditional knowledge articles, compliance drift is visible. If a procedure was published with incorrect information on January 15th, you can identify every agent who had access to it and every interaction that occurred while it was live. You can scope the problem, assess the exposure, and brief the regulator.

With AI-generated responses, you’re flying blind in both directions. You can’t see a single wrong interaction clearly enough to determine root cause. And you can’t surface patterns across thousands of interactions if those interactions aren’t logged in enough detail to make pattern analysis possible.

A proper KMS handles this proactively. Articles have owners. Review cycles are built into the content lifecycle. When a review is due, the system notifies the SME or document owner. They check it, update it if needed, and approve it. The KMS records all of it, who reviewed, when, what changed, what was approved. That’s not compliance doing extra work. That’s the KMS doing what it was designed to do.

That infrastructure is exactly what makes the reactive case possible. If a procedure was wrong, you can see who owned it, when it was last reviewed, and whether the review cycle was followed. The audit trail runs through the content lifecycle, not alongside it.

With AI-generated responses, there’s no equivalent. No ownership model. No review cycle. No approval record. And if something goes wrong, no trail to follow.

Proactive content governance and reactive compliance investigation need the same infrastructure. A proper KMS provides it. Most AI deployments don’t even come close.

This isn’t just a logging problem

It’s worth being clear about what I’m arguing here, because it’s easy to think “just add more logging” and you’re good to go.

Logging matters. But the deeper issue is that most AI tools being deployed in regulated contact centres weren’t designed with auditability as a requirement. They were designed for deflection rates and CSAT scores, or exposing absolutely everything. The engagement dashboard your vendor showed you in the sales cycle doesn’t include a section on forensic evidence trails.

The knowledge article was auditable because auditability was built into the content lifecycle from day one. Version control, approval workflows, content ownership, retention schedules. None of that happened after the fact. It was the design.

Your AI deployment needs the same design thinking. Not as a custom development project you scope after something goes wrong. As a baseline requirement before you go live in a regulated environment.

AI prompt logs aren’t an analytics feature. They’re the evidence layer that makes your AI deployment legally defensible.

Ask your vendor, before you sign, whether they can produce a forensic-level transcript of any interaction, with source attribution, retained long term for your regulatory requirements. Most can’t. Some will tell you they can but can’t show you it in practice. Some won’t even understand the question.

That’s worth knowing now, not after the legal team calls.

Where are you in this? I’m particularly interested in how QA, and legal and compliance teams are thinking about AI interaction logs? Whether you even have a seat at the table to be able to ask for them, or are you only discovering the gap after the fact.

Why you can’t find what the AI said

The “AI told me to” problem

From one error to a pattern

This isn’t just a logging problem

Leave a Reply Cancel reply