AI Doesn’t Need a Conscience. It Needs a Stack Trace.

Forget morals! Real AI safety starts with logs, traceability, and systems you can actually debug.

Apr 19, 2025

Whenever the conversation turns to the future of artificial intelligence, it seems to drift almost inevitably into the realm of philosophy. People invoke ideas like responsibility, dignity, fairness, and ethics as if we’re designing sentient beings instead of distributed systems. These are important conversations, but after spending years deep in the trenches: training models, fine-tuning pipelines, patching LLMs in production, I’ve come to a different conclusion.

AI doesn’t need a conscience. It needs a stack trace.

The more time I spend building and debugging real-world AI systems, the more I realize that trust doesn’t come from principles. It comes from visibility. If a model misbehaves, for example generates biased content, chooses the wrong function/tool, hallucinates an API, it’s not enough to ask whether it was aligned with human values. The real question is: can we trace the failure? Can we reproduce it, understand it, and fix it?

This is what operational trust looks like. Not philosophy. Not theory. Debuggability.

I Don’t Trust the Model. I Trust the Stack.

Every time I ship a model into production, whether it's powering a classifier, a search engine, or an LLM-driven agent, I ask myself a very basic but crucial question:

“If this thing does the wrong thing in front of real users, how long will it take us to figure out why?”

And I mean really figure it out. Not hand-wave and guess. Not blame the prompt or the data vaguely. I mean actually trace the behavior through the layers of logic, context, inputs, outputs, and intermediate state. Because the truth is: these systems will fail. They’ll misread intent, hallucinate knowledge, regress on edge cases, or behave differently under subtly shifted inputs. And when that happens, the only thing that stands between a misfire and a disaster is our ability to trace the error to its root cause.

That’s what I trust, not the model’s reasoning, but the stack we built around it.

We Talk About Alignment. But We Deploy Ghosts.

There’s a huge disconnect in the way we talk about AI. We discuss values and alignment as if the model is some rational actor, while the reality on the ground is a brittle stack of heuristics, templates, and duct-taped toolchains. We’re deploying systems into the world including agents that write emails, answer legal questions, make financial recommendations, and even initiate transactions based on chains of prompt fragments and context injections that we barely understand ourselves.

And the truth is, when something goes wrong in these systems, it’s often unclear whether the issue lies in the LLM’s internal representation, in the prompt structure, in the retrieved documents, or in the tool-calling layer. We're not deploying intelligent beings. We’re deploying ghosts - opaque, unpredictable, and terrifyingly untraceable at times. If a backend service behaved like this, we’d never put it in production. But with AI? We ship it and hope the user doesn’t notice.

Principles Are Useless Without Mechanisms

I’ve read all the AI ethics manifestos. They sound great on paper: statements like “AI must reflect human values,” or “AI should promote fairness, accountability, and transparency.” I don’t disagree with the intention. But here’s the reality: a principle without a mechanism is just wishful thinking.

When an AI system misclassifies, discriminates, or misleads - who notices? Who’s responsible? What process do we have to isolate the issue and ensure it doesn’t happen again? It’s not enough to say we care about fairness or safety. We need tangible, technical systems to enforce them:

Full logging of every prompt, tool call, intermediate decision, and output
Versioned ground truth and snapshot-based evaluation pipelines
Real-time feedback integration loops with human oversight
Runtime policy enforcement that catches unsafe behavior before it leaves the system

This isn’t a values problem. It’s a software engineering problem. And the solution isn’t more declarations - it’s infrastructure.

I Want DevOps for Moral Failure

What if we treated failures of AI behavior the same way we treat infrastructure outages? In DevOps, when a system goes down, we have incident response playbooks. We open a ticket, isolate the blast radius, issue a patch or a rollback, and conduct a postmortem. We learn from the failure and make the system more resilient.

Imagine doing the same when an AI system behaves inappropriately whether by marginalizing a user, misunderstanding an identity, or generating content that violates safety constraints. Imagine having:

Structured incident workflows for behavioral misalignment
Evaluation suites for sensitive topics and fairness regressions
Dashboards that show not just accuracy, but behavioral drift
Postmortems for edge-case outputs that violated the system’s contract

This is the kind of infrastructure we need and not just to make AI safer, but to make it governable. You don’t need an ethics board for that. You need observability, feedback loops, and runtime constraints.

Don’t Make AI Moral. Make It Legible.

We’re asking the wrong question. AI doesn’t need to be moral. It doesn’t need to understand human dignity or possess empathy. It needs to be predictable, auditable, repairable, and bounded.

We need systems where we can trace decisions from the user input to the final output across every component and abstraction layer in the pipeline. We need infrastructure that allows us to catch regressions early, understand when the model’s behavior changes, and apply targeted fixes without retraining everything from scratch.

Let humans deal with values and meaning. Let AI systems show their work. Because in the end, trust isn’t about whether the model is "good"; it’s about whether we can trace its behavior, understand its failures, and fix them when they matter. So yes, I want ethical AI. But I don’t want it to have a conscience.

I want it to have a stack trace.

Mike Erlihson, PhD, Chief AI Expert in https://metaor.ai/

Mathy AI Substack

Discussion about this post