Generative AI Gulf: 45% of AI-Written Code Has Security Flaws

A new study finds that nearly half of code generated by AI models contains security vulnerabilities, highlighting risks as enterprises adopt generative AI for software development.

Software developers are writing code faster than ever, but the hidden price tag is starting to show. A comprehensive study of over 100 AI language models found that 45% of the code they generate contains security vulnerabilities, even as adoption of these tools has become nearly universal among professional programmers.

The practice, dubbed vibe coding by AI researcher Andrej Karpathy, lets developers describe what they want in plain English and watch as artificial intelligence writes the actual code. It’s proven remarkably effective at accelerating development. Research shows teams complete routine tasks 55% faster, and among startups in the latest Y Combinator batch, a quarter of companies have built their entire codebase this way. In the United States, 92% of developers now use these tools daily. The Middle East mirrors this pattern, with 75% of regional employees using AI tools in their roles over the past year, outpacing the global average of 69%.

The speed gains are real enough that Collins Dictionary named vibe coding its 2025 word of the year. But Veracode’s analysis of 80 real world coding tasks reveals a troubling pattern. While AI models have gotten dramatically better at writing code that works, they haven’t improved at all at writing code that’s secure. The study found that 86% of generated code fails to defend against cross-site scripting attacks, and 88% remains vulnerable to log injection flaws. These aren’t obscure security holes but basic weaknesses that any competent developer should catch.

The problem is particularly acute in certain programming languages. Java code generated by AI contains security flaws more than 70% of the time. Python, C sharp, and JavaScript fare somewhat better but still show failure rates between 38% and 45%. Larger, more sophisticated AI models don’t perform meaningfully better than smaller ones, suggesting this is a fundamental limitation rather than something that will improve with the next generation of technology.

For startups building quick prototypes to test ideas, these risks might be acceptable. The ability to turn a concept into a working demo in hours rather than weeks can mean the difference between securing funding or shutting down. But production systems serving customers are different. One UK study found that teams spent 41% more time debugging AI-generated code once their systems exceeded 50,000 lines.

The core issue is that AI optimizes for the shortest path to code that appears to work, not code that’s actually secure. When prompted to query a database, these systems often produce patterns they’ve seen thousands of times in their training data, including common security mistakes like SQL injection vulnerabilities. The code functions perfectly in testing but opens dangerous backdoors that attackers can exploit.

Forward-thinking companies are adopting a middle path. They’re using AI to accelerate development while implementing mandatory security scanning, requiring human review of critical code paths, and tracking new metrics beyond just how fast code gets written. The focus is shifting to measuring actual impact, like features released and bugs resolved, rather than celebrating raw output.

The Middle East is positioning itself as a testing ground for these approaches. Technology spending in the region is projected to reach $169 billion in 2026, with governments and hyperscalers driving infrastructure investments. The UAE has partnered with OpenAI, Oracle, and NVIDIA to build what will become the world’s largest AI campus outside the United States, while Saudi Arabia’s $40 billion AI fund targets data centers and semiconductor manufacturers. Yet McKinsey research shows that despite 84% adoption rates among GCC organizations, only 31% have reached mature deployment, and just 11% can attribute at least 5% of earnings to AI. The region’s aggressive infrastructure buildout hasn’t yet translated into proportional value capture, partly because the same security and quality challenges affect developers whether they’re in Silicon Valley or Dubai.

By 2026, analysts expect 60% of new software code will be AI-generated, climbing toward 90% by 2027. The companies that will win aren’t necessarily those adopting AI most aggressively but those pairing speed with discipline. The real competitive advantage isn’t how quickly you can generate code but how fast you can build systems that are secure, maintainable, and won’t require expensive repairs down the line. That requires combining AI’s strengths with human judgment about when to trust the machine and when to verify its work.