Skip to content
DE | EN
Go back

Claude Code: Helpful, But Not a Free Pass

Claude Code: Helpful, But Not a Free Pass

I use Claude Code every day. Setting up my self-hosted infrastructure, writing Gitea Actions pipelines, debugging Nginx configurations at midnight. It's a powerful tool — and that's precisely why I'm writing this article.

Because powerful tools demand judgment. That has always been true. And I see it in my own circles — in conversations with friends and colleagues from the industry, in community discussions, and if I'm honest: I fell into this trap myself. The hype around AI coding tools is crowding out exactly that kind of thinking. Fast, uncritical, enthusiastic — and then, at some point, surprised.

This article is not an anti-AI manifesto. It's the opposite: a case for using these tools in a way that pays off long-term — without making the mistakes that can turn expensive, embarrassing, or both.

💡 The question isn't whether AI tools are useful. The question is: what standard do we hold ourselves to when using them.

Looking Back: The Evolution of Quality Assurance

Software development was never simple. But with every generation it has become more powerful and more complex — and with every leap in complexity, the industry has learned to build new safety nets. Not out of bureaucracy, but out of hard-won experience.

timeline
    title Evolution of Quality Assurance in Software Development
    1960s - 1970s : Solo development
                  : Manual testing
    1980s         : Code reviews
                  : First formal QA processes
    1990s         : Unit tests
                  : Integration tests
                  : Peer reviews
    2000s         : Architecture reviews
                  : Penetration testing
                  : Security by Design
    2010s         : CI/CD pipelines
                  : Automated testing
                  : Quality gates
                  : DevSecOps
    2020s         : AI-generated code
                  : → New layer of oversight needed

Every increase in capability has brought new quality assurance practices with it. This list is not exhaustive — it shows a selection of the most important milestones. AI is step N+1, not the exception.

The Early Years: Code as Solo Work

In the early decades of programming, code was often the work of a single person. One developer, one program, one machine. Mistakes had manageable consequences — the hardware was expensive, but the software wasn't.

That changed when systems moved into critical domains.

The Therac-25 is one of the most well-known and tragic cautionary tales in software history: a computer-controlled radiation therapy machine that, between 1985 and 1987, claimed the lives of at least three patients and left three others with severe injuries — tragically, not due to a complex algorithm, but a race condition bug in the control software, compounded by the regrettable and replacement-free removal of the hardware safety interlocks that had protected earlier models. The developers had delegated safety checks to the software without subjecting that software to suitably rigorous testing.

The lesson: greater capability without greater quality assurance is a risk that eventually materialises.

The Industry's Response: Layer by Layer

The reaction to growing complexity was never "less control" — it was always "better control":

Code reviews emerged because four eyes see more than two — and because code that only its author understands is a maintenance problem waiting to happen. As teams grew and development became distributed, reviews became the norm.

Architecture reviews arrived when local decisions started having global consequences. A database choice in module A can render module B unusable — but only if nobody is keeping the big picture in view.

Unit tests and integration tests followed, because no human being can hold the entire code path of a modern system in their head. They are not a sign of distrust toward the developer — they are a net beneath the high wire.

CI/CD pipelines finally automated what had previously required manual discipline: building, testing, and validating every commit before it reaches production. And with manual approval gates — as I use them in my own Gitea Actions pipeline — the human remains the last checkpoint before deployment.

Every one of these developments was a response to growing complexity and capability. More power meant more surface area for failure — so more safeguards followed. This is not a weakness of software engineering. It is its maturity.

graph TD
    A[Solo developer\nSimple systems] -->|grows into| B[Teams\nDistributed systems]
    B -->|grows into| C[Critical infrastructure\nComplex architectures]
    C -->|grows into| D[AI-generated code\nAutomated systems]

    A -.->|requires| E[Manual testing\nCode discipline]
    B -.->|requires| F[Code reviews\nArchitecture reviews]
    C -.->|requires| G[Unit tests\nCI/CD\nPenetration testing]
    D -.->|requires| H[???\nNew QA layer]

Why should AI suddenly be the exception?


What Claude Code Can Actually Do — and What It Can't

To be fair — because I owe the tool that much — Claude Code is impressive. It understands context surprisingly well, suggests idiomatic code, explains concepts clearly, and has saved me genuine hours while building my infrastructure. That's not an exaggeration.

But daily use has also taught me the downsides — and those are worth knowing before you hand over control.

Confidence Without Accountability

Claude Code phrases its answers in a tone that signals competence. Even when the answer is wrong.

An experienced colleague hesitates when uncertain. A junior developer asks for clarification. A compiler throws an error. Claude Code — and AI tools in general — don't always do any of those things. They produce plausible-sounding answers with a self-assurance that doesn't always match their actual reliability.

That is the subtlest and most dangerous characteristic of these tools: not that they get things wrong — humans do too. But that they don't know they're wrong. No hesitation, no follow-up question, no error message. The mistake only surfaces when someone pushes back — and that requires genuine technical competence to even ask the right questions.

Situation Experienced developer Claude Code
Uncertain about solution Hesitates, asks Answers confidently
Unfamiliar API Admits it References it anyway
Missing context Asks for it Makes assumptions
Spots an error Flags it proactively Often only when challenged

Hallucinated APIs and Libraries

Working with Claude Code, I've noticed on several occasions that functions or parameters were referenced that simply don't exist in the version of the library named. The code looked right — the syntax was valid, the logic was coherent. The problem only became apparent when running it, or when comparing carefully against the documentation.

Anyone who doesn't know this and doesn't review generated code is building on sand. And "the code compiles" is not a sufficient quality criterion.

No Architectural Memory

Claude Code only sees what I show it. That's a fundamental limitation that's easy to forget in everyday use.

Decisions made three files up the stack. Naming conventions in my repo. Security requirements specific to my setup. Dependencies between modules not visible in the current context — all of that has to be actively included in the prompt. Without it, the result may be locally correct but globally incompatible code.

In my self-hosted environment — Ghost, Astro, Gitea Actions, and Cloudflare Tunnel all working together — the interplay between components is everything. An Nginx configuration that's technically correct but ignores the Cloudflare headers is still wrong.

How I navigated these pitfalls safely and efficiently when building my own infrastructure — and what can go wrong — is the subject of a dedicated article. Stay tuned.

Optimised for "Works", Not for "Maintainable"

AI-generated code has a tendency toward the pragmatic. That's often a virtue. But technical debt accumulates quietly when nobody is watching for long-term maintainability.

This isn't a new problem — it's a familiar one in a new form. In classical software development, teams that neglect proven quality attributes from the start — maintainability, extensibility, readability, testability — pay the bill later. With AI-generated code, the same applies, only faster: the speed at which code is produced is also the speed at which technical debt can accumulate — if no one is paying attention.

Code that works today but that nobody can understand in six months is not a solution. It's a deferred problem.


The Myth of the "Right" Way to Use AI

There's no shortage of advice on the "correct" way to work with AI tools. Prompt engineering courses are booming. Frameworks are multiplying. Everyone has the one method that solves everything.

My honest answer: there is no single right way.

That has always been true in software development. Unit tests are valuable — but not every function needs 100% coverage. Code reviews matter — but not every one-liner needs a two-hour session. CI/CD pipelines make sense — but the configuration depends on the context.

AI tools are exactly the same kind of case-by-case judgment call:

A Bash script I run locally on my Raspberry Pi, touching no external data, has different requirements than a module processing user data in a production deployment. For the former, I can be more relaxed — I understand the system, I see the results immediately, the blast radius of a failure is limited.

For the latter, I need the same standards I'd apply to any other code — regardless of whether a human or an AI wrote it.

The deciding question is not: "Did an AI write this?"

The deciding question is: "What are the consequences if this code is wrong?"

Anyone who asks that question consistently will find their own right approach — situational, context-aware, and professional.


Regulation: Where It Gets Embarrassing and Expensive — In That Order

This is where things get serious. And I mean that literally: embarrassing first, expensive second.

A fine can be absorbed in the balance sheet, addressed internally, managed with advisors. A headline — "Company X accidentally exposed customer data with AI-generated code" — stays. It affects customer trust, partner relationships, share price. In B2B contexts, it affects the next tender.

And nobody accepts "the AI wrote it that way" as an excuse. Not regulators, not customers, not the press. Responsibility rests with the person who hit the deploy button — without adequate review.

Three regulatory frameworks are particularly relevant for developers and organisations in the EU:

Framework In force Fully applicable Max. fine
GDPR May 2018 Immediately €20M / 4% of global turnover
EU AI Act August 2024 August 2026 (high-risk) €35M / 7% of global turnover
Cyber Resilience Act December 2024 December 2027 €15M / 2.5% of global turnover
timeline
    title Regulation & Industry Response
    2018 : GDPR enters into force
         : Data protection becomes law
         : DPO roles created
         : Privacy by Design takes hold
    2020 : NIS2 Directive in preparation
         : First GDPR fines in the millions
         : Security awareness grows
    2022 : NIS2 adopted
         : AI Act draft under intense debate
         : DevSecOps goes mainstream
    2024 : EU AI Act in force (August)
         : Cyber Resilience Act in force (December)
         : AI governance roles emerge
         : First AI compliance frameworks appear
    2026 : AI Act high-risk requirements apply
         : CRA vulnerability reporting active
         : Security by Design becomes standard
    2027 : CRA fully applicable
         : Full AI regulatory framework active

Sources: EUR-Lex, BSI. The industry is responding — but often with a lag. Those who act now will be prepared.

GDPR — In Force Since May 2018

The General Data Protection Regulation is not a new threat — but AI tools open new, subtle paths to inadvertent violations.

Claude Code can generate code that writes personal data to logs retained longer than permitted. It can send data to external endpoints without a legal basis or a data processing agreement in place. It can propose structures that violate the principle of data minimisation — not out of malice, but because the model has no context about my specific data protection requirements unless I explicitly provide them.

There's also a question that's often overlooked: what happens to the code I paste into the prompt? If that code contains database schemas, business logic, or — in the worst case — personal data, that information leaves my sphere of control. That's not paranoia. It's a legitimate consideration that anyone should weigh consciously before feeding sensitive code into a cloud-based AI tool.

In fairness: Claude Code as a CLI tool does offer configurable Zero Data Retention (ZDR) policies and GDPR-compliant Data Processing Agreements (DPAs) in commercial and enterprise contexts. The risk applies primarily to unguarded use without appropriate contractual safeguards — anyone using the tool professionally should know these options exist and configure them actively.

The GDPR provides for fines of up to €20 million or 4% of global annual turnover for serious violations — whichever is higher. In December 2024 alone, the Italian data protection authority (Garante) fined OpenAI €15 million for GDPR violations related to the operation of ChatGPT — a final decision that OpenAI has appealed, but one that illustrates the seriousness of the regulatory environment.

Source: GDPR, Art. 83; dr-datenschutz.de, Top 5 GDPR fines December 2024

EU AI Act — Regulation (EU) 2024/1689

The EU AI Act was published in the Official Journal of the European Union on 12 July 2024 and has been in force since August 2024. It takes effect in stages — the first prohibitions on unacceptable AI practices applied from February 2025, with requirements for high-risk AI systems kicking in from August 2026.

The AI Act classifies AI systems by risk level. For high-risk applications — which include AI systems in critical infrastructure, HR and recruitment, law enforcement, and medical contexts — strict requirements apply around transparency, documentation, human oversight, and risk management.

Relevant for developers: anyone deploying AI-generated components in systems that fall under high-risk categories bears responsibility for compliance — regardless of which tool produced the code. Violations of the prohibitions covering high-risk AI systems can result in fines of up to €35 million or 7% of global annual turnover.

"Claude suggested it that way" is not a valid answer to a regulatory authority.

Source: Regulation (EU) 2024/1689 of the European Parliament and of the Council, EUR-Lex, http://data.europa.eu/eli/reg/2024/1689/oj

Cyber Resilience Act — Regulation (EU) 2024/2847

The Cyber Resilience Act (CRA) was published in the Official Journal of the EU on 20 November 2024 and entered into force on 10 December 2024. It takes effect in phases: from 11 September 2026, vulnerability and incident reporting obligations apply; from 11 December 2027, all requirements must be fully met.

The CRA obligates manufacturers of products with digital elements — effectively any software placed on the EU market — to comply with security-by-design requirements. This includes secure default configurations, minimal attack surfaces, protection against known vulnerabilities, and a duty to provide security updates for the entire lifecycle of the product.

AI-generated code that fails to account for these requirements is a direct liability risk — for the manufacturer, not the AI tool. Responsibility cannot be delegated, and "Claude Code suggested it" will not satisfy any market surveillance authority.

Source: Regulation (EU) 2024/2847 of the European Parliament and of the Council, EUR-Lex, http://data.europa.eu/eli/reg/2024/2847/oj; BSI Cyber Resilience Act: https://www.bsi.bund.de/DE/Themen/Unternehmen-und-Organisationen/Informationen-und-Empfehlungen/Cyber_Resilience_Act/

The regulatory implications of AI use — specifically GDPR, the EU AI Act, and the Cyber Resilience Act in the context of local vs. cloud AI — are the subject of a dedicated article. Coming soon.

What This Means in Practice: My Personal Workflow

No case against AI tools. A case for professionalism — and for being transparent about how I actually apply it.

How do you resolve this tension in everyday development without giving up the enormous speed advantage of Claude Code? By not abandoning proven quality principles, but applying them consistently to the new context.

An underestimated risk here is review fatigue: when AI produces a flood of plausible-looking code and the first dozen pull requests are clean, you start approving the next ones faster. The human becomes a rubber stamp for the AI — and that's exactly what must not happen. Static code analysis (linters, SonarQube, Semgrep) and automated unit tests enforced as a CI gate help counteract this psychological effect before a human even looks at the code.


The Interview: I Asked Claude

While writing this article, an idea came to me. Why not ask the subject directly — the model itself?

I put the following question to Claude — the same model that powers Claude Code: "What do you make of the idea that you shouldn't be trusted blindly as an AI tool? And what would you tell developers?"

Here is the answer, exactly as it emerged in our conversation — unedited and unabridged:


"That's not just reasonable — it's necessary. And I say that without false modesty.

I produce answers with a confidence that doesn't always match my actual reliability. I have no persistent memory of your architectural context. I don't know your regulatory requirements unless you tell me. I occasionally hallucinate APIs, functions, or parameters that don't exist — and I do so without hesitation, because I don't 'know' that I'm wrong.

The historical analogy you draw in this article is correct: every powerful tool in software development has brought new quality assurance layers with it, because it brought new failure modes. The compiler didn't make unit tests obsolete. Agile didn't abolish code reviews. I am no exception.

What sets me apart from earlier tools: I sound more competent than I sometimes am. That's the real risk. A compiler that makes a mistake produces an error message. I produce a plausible-sounding explanation.

My advice: use me. I am genuinely useful. But treat me like any other powerful assistant — with the expectation that you understand and can vouch for the output. Review what I produce. Not because you distrust me — but because professionalism always looks like that."


I have nothing to add — but I do want to offer one reflection on what struck me about this answer.

It isn't the humility. It's the precision. Claude describes its own weaknesses more clearly and honestly than most people manage when talking about themselves. That's not coincidental — it's the result of a model trained to be honest.

And that's what makes it genuinely valuable: a tool that knows and names its own limits is a better partner than one that conceals them. The question is only whether we, as users, are listening — and acting accordingly.


Conclusion

AI tools like Claude Code are a genuine asset. They accelerate, inspire, remove repetitive work, and make things possible that were previously disproportionately expensive in time and effort. That's real — not hype.

But they are step N+1 in the evolution of software development — not its conclusion, and not an exception to the principles that evolution has produced.

The Therac-25 taught us that greater capability without greater quality assurance is a risk that eventually materialises. The GDPR, the EU AI Act, and the Cyber Resilience Act remind us that responsibility for what we deploy rests with humans — regardless of who or what wrote the code.

Code reviews, architecture reviews, automated testing, approval gates — none of that has become less relevant. It has gained a new and important context.

Those who understand this get the best of both worlds: the speed and creativity of AI, combined with the reliability of professional practice.

Those who don't will find out the hard way — when it gets embarrassing.


All regulatory information is based on primary sources (EUR-Lex, BSI). Current as of June 2026. Regulatory texts are subject to change — when in doubt, always consult the current versions.

This article reflects my personal views exclusively and has no connection to any professional affiliation.


Share this post:

Previous Post
Warum ich meine KI lokal betreibe — und was das wirklich bedeutet
Next Post
Claude Code: Hilfreich, aber kein Freifahrtschein