In over 5,600 production applications generated with AI, the security lab Escape.tech found more than 2,000 vulnerabilities, over 400 open secrets in the code and 175 cases of exposed personal data at the end of 2025. These applications were not created in a research environment, but in companies where they serve customers and business processes. They are the statistical trace of a working style that Andrej Karpathy described on 2 February 2025 as Vibe Coding: telling an AI in natural language what the software should do and accepting the generated code without checking it line by line. "Forget that the code even exists", wrote Karpathy in his original tweet.
For many teams, this has become serious practice. Working prototypes are created in hours instead of days. At the same time, there are more and more reports in which precisely this speed produces follow-up costs that only materialise years later in traditional software development. For companies that do not build software for fun, but do business with it, the question is therefore not whether vibe coding is "allowed". The question is under what conditions vibe coding produces software that lasts for more than 18 months and what conditions do not fulfil this.
What vibe coding exactly is and what it is not
Vibe coding describes a specific way of working with coding agents such as Cursor, Claude Code or GitHub Copilot: The person at the computer describes the goal, accepts the proposed diff largely unchecked, copies error messages directly back into the chat and lets the model iterate until it runs. Code becomes a black box.
This is to be distinguished from "AI-supported software development" in the broader sense. Those who pair with a model, read every proposal, demand tests, ask security questions and make architectural decisions themselves are not vibe-coding. The difference is not academic. It determines which software ultimately runs on a production system.
What constitutes "good code" in enterprise software
"Works" is the lowest conceivable threshold. It is necessary, but far from sufficient. Good code in an enterprise environment is measured by criteria that are barely visible in the first few weeks after delivery and determine the profitability of a system in the years that follow.
Six characteristics carry most of the weight. Maintainability means that a developer who is not the original author can understand and change the code in a reasonable amount of time without destroying anything else. Testability means that functions can be tested in isolation and regressions are automatically recognised. Security requires at least the OWASP Top 10 standard, no secrets in the source code and validated inputs. Performance under real load results from a sustainable data model, set indexes and recognised N+1 patterns. Observability provides logs, metrics and traces to find errors in production before customers call. Finally, compliance suitability requires that GDPR, BFSG and industry-specific requirements (ISO 27001, BaFin requirements, MDR in the medical devices environment) are taken into account in the architecture and data flows and not added on at a later date.
These criteria are not "nice to have". They are what transforms software from a functioning prototype into a resilient business asset.
Where vibe coding and good software diverge
Vibe coding is excellent at achieving the first criterion ("works"). It regularly fails to achieve the other six, and the reason is structural.
The security data is now robust. The aforementioned Escape.tech evaluation of 5,600+ productively deployed AI-generated applications is just one building block. The Veracode 2025 GenAI Code Security Report found that code co-written by AI has on average 2.74 times more security vulnerabilities than purely human code. The CodeRabbit report "State of AI vs Human Code Generation" (December 2025) also found 75 per cent more logical and correctness errors in AI code. And the security company Apiiro has documented an increase in the number of vulnerabilities discovered each month in Fortune 50 companies from around 1,000 to over 10,000 between December 2024 and June 2025 under the title "4x Velocity, 10x Vulnerabilities". Tenfold increase in six months. This is not a statistical fluctuation, this is an order of magnitude.
The picture is also shifting beyond security. GitClear's long-term analysis of 211 million code changes from 2020 to 2024 shows a collapse in code maintenance: the proportion of deliberate refactorings has fallen from 24.1 per cent of changed lines in 2020 to 9.5 per cent in 2024. The number of identical code blocks has increased eightfold in the same period. In other words, more code is being written, but less code is being cleaned up. What looks like acceleration is often technical debt brought forward. The bill comes later.
The reason is simple. If you don't read code, you can't evaluate it. An experienced developer makes dozens of small architectural and design decisions in a single hour. She sees a database query in a loop, realises that it will break the page with 50,000 data records, and pulls the query out before continuing to write. The model would have delivered working code. In the load test six months later, it would have cost the system. Vibe Coding does not replace these decisions. It skips them.
What this concretely means for enterprise software
Enterprise software differs from a weekend app in three dimensions: It runs longer, it is further developed by more people, and its failures have a price. It is precisely in these three dimensions that the risk of vibe coding unfolds.
Lifespan. In insurance IT, commission settlement and portfolio systems regularly run for ten to fifteen years. The original developers are long gone. An MVP that is vibe-coded in three weeks and goes into production untested in this world goes through hundreds of changes by changing teams. Without architecture documentation, without tests and without clear responsibilities in the code, each extension becomes more expensive than the one before. At some point, a trivial adjustment costs several man-weeks, and the answer is "better to build from scratch". This is the return of the throwaway mentality at system level.
Scaling in the organisation. Vibe coding works as long as a single person has the model, the context and the anomalies in their head. As soon as a second team joins in, they open the file and find three different authentication patterns next to each other, six endpoints without tests and a comment in the style of "// works, don't touch". Code that does not tell a story for itself cannot be onboarded. It can be managed, with high risk.
Liability and compliance. Anyone working in a regulated environment (finance, healthcare, critical infrastructure) owes their supervisory authorities proof: Who wrote, reviewed, approved which code and when? What risk analysis is an architecture decision based on? "The model proposed it this way" is not a valid entry in a software bill of materials (SBOM). A valid entry names the library, version, licence, known CVEs and the person responsible for bringing this dependency into the system. Vibe Coding does not provide this entry structurally.
Where Vibe Coding is appropriate
Vibe Coding is not the opposite of good software development. It is a tool with a clearly defined area of application.
Vibe coding makes sense for prototypes whose express purpose is to be thrown away, for internal scripts and one-off tools with no external impact, for explorative data analyses where the insight counts and not the code, and for quick UI mockups that are later rebuilt by a front-end developer anyway. In all of these cases, the majority of the above-mentioned requirements do not apply because the software never reaches the phase in which it should be maintainable.
A second legitimate area of application is the acceleration of experienced developers. Anyone who evaluates the code in seconds because they could write it themselves anyway is using the model as a keyboard, not as an architect. This is a lived reality in many engineering teams and no longer vibe coding in the narrower sense.
What decision-makers should take away from this
The proportion of unread code in a delivery is the key figure that is not included in any specification sheet today and that predicts subsequent maintenance costs better than any story point estimate. Ask for it. When a service provider or internal team delivers a new system, a legitimate question is: "What proportion of this code was read by a human before it got into the repository?" An honest answer is in the low single digits or high double digits, and it tells you more about the subsequent risk than any demo date.
Second, require a test, security and architecture picture as part of the sign-off. The acceptance criterion is not "the app runs", but "the app runs, has a documented architecture, covers the core paths with automated tests and is tested against the OWASP top 10". This applies regardless of whether AI was involved.
Thirdly, separate the playground and production system in organisational terms. It makes sense for teams to experiment with vibe coding, build prototypes and go through rapid learning cycles. It does not make sense for the same code to migrate to the system that will process customer data tomorrow without any further steps. A clear promotion stage between prototype and production (own repository, own review obligation, own security check) is the most effective measure you can arrange without an engineering background.
Software that lasts is still created by people who know what they are doing and why. AI shortens their paths, but does not replace them. Anyone who confuses this is buying speed on credit, and the interest is hidden in the source code.
If you are currently sitting on a major software decision and are unsure where the boundary between sensibly accelerated and risky gutted development lies, it is worth taking an external look before the code is in the production system, not after. We describe how we set up enterprise software projects together with clients on our page on enterprise software, and how we plan a viable, documented architecture instead of building up technical debt in System architecture and design.
Sources:
- Karpathy, A. (2 February 2025): Tweet, x.com/karpathy/status/1886192184808149383.
- Escape.tech (Okt 2025): Methodology post evaluating 5,600+ AI-generated productive apps (escape.tech).
- Veracode (2025): GenAI Code Security Report - 2.74× more security vulnerabilities in AI code.
- CodeRabbit (Dec 2025): State of AI vs Human Code Generation - 75% more logic/correctness errors.
- Apiiro (2025): "4x Velocity, 10x Vulnerabilities" - Increase in monthly vulnerabilities in Fortune 50 companies Dec 2024 - Jun 2025.
- GitClear (2025): AI Assistant Code Quality Research, 211 million code changes 2020-2024 (gitclear.com).