Building AI is easy. Building AI well is not.
Any CTO who has taken a proof-of-concept to production knows the difference between success in a test environment and success in the real world. A model can run perfectly on clean data, but only when real customers, real variation and real edge cases come along will you see where the solution has cracks.
In our first episode of the Blinqx Tech TalQX podcast, I spoke with Martijn (Product Owner AI) and Rashan (AI Engineer) about exactly that tension: the promise of AI versus the responsibility of making it safe, explainable and scalable. In this blog, I look back on that conversation. And what it teaches us about how to build AI in B2B SaaS “without chunks.”
Most problems do not arise in engineering
I opened the episode with a simple question to Martijn: Are AI problems more often caused by technology or people?
He did not have to hesitate: by people.
Technology is mature enough today. What is usually missing is structure. Who is responsible for the behavior of the model? How is a prompt managed? What arrangements prevent someone from accidentally making the system unsafe?
The real risk is almost never in the architecture, but in speed without governance. I see this more widely in the market as well: teams want to move forward, but in the meantime build holes that they themselves later fall into.
AI only really works once customers make mistakes
Rashan was then asked whether it’s better to test long, or go live early. He went full for “go live early.
“When customers start complaining, that’s when your product really takes off.”
Not because errors are desirable, but because the world in production is incomparable to your test environment. That one is clean, logical and predictable. In production, you have to deal with variation, different formulations, incomplete files and patterns that you can never fully simulate.
Rashan aptly put it, “You are in your test environment at 5%. The rest you learn only in production.”
Successful AI products are therefore not created by one big release, but through continuous observation, analysis and adjustment. The real work starts as soon as your users give feedback, thumbs-down clicks or signal incorrect behavior. You need to be prepared for that, with monitoring, fallbacks, traceability and processes that make improvement as normal as releasing.
Why we chose one central AI infrastructure
Because AI behaves like a living system, you can’t organize it into separate islands. That’s why we built Qore/AI: the central foundation under all AI development within Blinqx.
Martijn described it in the podcast as a power grid: “If you plug it in, you can use AI.”
With one shared AI backbone, teams don’t have to keep reinventing prompts, security, logging or evaluation processes. That’s already ingrained. This allows them to focus on their domain, their product and their customer. Without each team creating its own risks.
The effect is noticeable: innovations arise faster and are much easier to transfer to other products. What works in Legal often also works in Finance or HR. A building block for insurance fits accountancy or legal workflows with minimal adaptation. Qore/AI does not lift just one product, but the entire portfolio.
Transparency is the basis for trust
In industries such as insurance, accounting and legal, the question of AI is never: does it work?
The question is: Why does it work this way? Customers want to know why a system gives a certain answer, what happens to their data and how it handles exceptions. Martijn said it himself, “As a product owner, you have to be able to explain why AI reacts the way it does.”
That’s not about explaining internal model mechanisms, but about understandability: what steps the system takes, when it hesitates, when it switches logic, and when a human needs to validate. Without that clarity, AI will have no place in compliance-driven environments. Transparency is not a nice-to-have. It is a prerequisite.
Experimentation should be done, but within safe limits
Within Blinqx, we encourage teams to experiment on their own. With centrally available tools, R&D teams in each sector build small AI experiments that connect directly to user challenges. This produces ideas that we centrally cannot come up with.
But experimentation and production are two different worlds. That’s why we have a clear transition moment: once an experiment proves valuable, we look to transfer it to Qore/AI. There it is made robust, testable and secure. And scalable for rollout to other sectors.
That’s the balance that works: freedom to learn quickly, combined with protection once things get serious.
Start small, but build for big success
At the end of the episode, we came to the question of where organizations should start. Martijn and Rashan’s answers were remarkably close:
Start small. But do it right.
Not with “AI across our entire platform,” but with one task. One concrete problem. And make sure you measure from day one, build in explainability and work model-agnostically. Your pipeline today is not the pipeline two years from now. And you have to accept that in your design.
It is much easier to build something good than to fix something unsafe after the fact. The thread of the episode is clear: AI rarely fails because of technology. It does fail because of thoughtless organization.
Frequently Asked Questions
Because teams underestimate the complexity and don’t have a clear framework to fall back on. A model works fine on structured test data, but falls through as soon as real variation, noise and edge cases come in. Without monitoring, fallback mechanisms and ownership, that only becomes apparent when it goes to production.
You can’t simulate production behavior. You can, on the other hand, safely go live early, provided the basics are right: logging, metrics, guardrails, evaluation paths and the ability to adjust quickly. Without that infrastructure, “long testing” works primarily as a postponement of inevitable problems.
Because AI behaves like a system, not a feature. If each product team builds its own prompts, validation rules and security, you get fragmentation, inconsistent behavior and compliance risks. A shared foundation like Qore/AI ensures repeatability, traceability and scale.
Start small with one concrete task where you can practice the full lifecycle: monitoring, evaluation, transparency, fallback and updates. If that works, only then start working bigger. How you start will determine whether you can easily scale up later.