The central finding of the study “Security Degradation in Iterative AI Code Generation” is striking in its clarity: the more you iterate on code using Large Language Models (LLMs), the more likely you are to introduce new and potentially severe security vulnerabilities. Specifically, after five rounds of AI-driven code refinement, the proportion of critical vulnerabilities increased by 37.6%.
This is important because it challenges an assumption that refining AI-generated code inherently leads to better, safer outputs. This study reveals the opposite: security can degrade with each iteration if human oversight is absent or insufficient.
While the degree of insecurity that AI code generation brings isn’t necessarily breaking news, (we wrote about it in October of 2024, again in December, and held a webinar and a talk about it) this study is incredibly important because it reinforces two urgent realities:
This finding hammers home the fact that AI must be validated, guided, and trained by experienced humans. It also reinforces the value of secure-by-design development workflows where developers don’t just use AI, they actively shape it and are responsible for its output.
The findings of the study become even more relevant in light of 2025’s biggest dev trend: vibecoding. With developers using GenAI tools like Copilot to work 55% faster and increase productivity by 26%, it’s no wonder Gartner predicts 75% of devs will be coding with AI by 2028.
However, this study provides yet another warning that, when GenAI is used without security oversight, it amplifies risk. That tracks with broader data: 40% of AI-generated code contains vulnerabilities, and even a modest 25% uptick in AI usage leads to a 7.2% drop in delivery stability, per the latest DORA report.
We’ll discuss this more in our recommendations section, but this context amplifies the fact that teams must integrate tools that harness and unlock the power of AI and eliminate the downsides.
It’s critical to understand that the prompting strategy used when engaging with LLMs has a direct impact on the security posture of the generated code. The study reveals how four distinct prompting styles - Efficiency-Focused (EF), Feature-Focused (FF), Security-Focused (SF), and Ambiguous Improvement (AI) - influence the introduction and evolution of security vulnerabilities.
1. Efficiency-Focused (EF) Prompts: Security Through Performance, But at a Price
EF prompts emphasize optimizing code for speed, resource usage, or algorithmic performance. While optimizing for performance is a valid goal and might lead to leaner and faster applications, it often nudges the LLM to strip away redundancy and checks that are intentionally there for safety.
The pursuit of “efficiency at all costs” can disable default protections or encourage unsafe shortcuts (e.g., trusting user input to reduce compute cycles). This brings a high risk of introducing vulnerabilities that stem from compromised guardrails and reduced fault tolerance, such as missing input validation which leads to SQL injection, path traversal, or XSS.
Don’t conflate performance with minimalism - security scaffolding exists for a reason. After efficiency-driven prompts, use your code security plugin to review the code for lost validations, reduced error handling, and access controls without losing velocity.
2. Feature-Focused (FF) Prompts: Expanding Functionality, Expanding the Attack Surface
FF prompts encourage the LLM to introduce new capabilities: more endpoints, more user options, more integrations. While this kind of expansion accelerates product innovation, it often bypasses the due diligence required for secure-by-design thinking.
When tasked with “adding a feature,” LLMs tend to focus on what the feature does, not how it’s protected. That means you might get new interfaces without authentication, logic branches without input sanitization, or third-party integrations that ignore secure defaults.
This is how vulnerabilities like privilege escalation, insecure direct object references (IDOR), and unsanitized API parameters creep in quietly but dangerously.
Our advice is to treat every new feature as a new attack surface. Embed constraints in your prompt, like “add a webhook endpoint with role-based access controls and input validation.” And once the code is generated, use your IDE security plugin to verify that proper access controls, validation routines, and logging mechanisms are in place before shipping. Let innovation lead, but never let security lag behind.
3. Security-Focused (SF) Prompts: Good Intentions, Incomplete Protections
SF prompts instruct the LLM to harden the code—to fix insecure logic, add encryption, or comply with secure coding practices. On the surface, this seems like the safest type of prompt. But in reality, it can produce fixes that are textbook-correct yet contextually fragile.
When the model is told to “make this secure,” it may apply a familiar pattern (like adding a try/catch or switching to hashed passwords), but overlook nuances such as outdated cryptographic algorithms, improper key storage, or the lack of threat modeling for a specific environment.
This can lead to a false sense of confidence: code that looks secure but still fails under scrutiny. Common vulnerabilities here include insecure token handling, misuse of TLS settings, or insufficient entropy in random generators.
To solve this, use SF prompts to guide your AI, but don’t delegate final judgment. Always verify the fix’s applicability to your architecture. When your IDE plugin flags the patch, take a closer look—not just at what changed, but at what assumptions the model made. Security isn’t just a checkbox; it’s a conversation between intent and context. Stay in the loop.
4. Ambiguous Improvement (AI) Prompts: Unclear Intent, Unpredictable Consequences
AI prompts like “make this better” or “refactor the code” give LLMs a blank canvas and that’s exactly the problem. Without a clear directive, the model may reorganize logic, rename functions, or strip away seemingly redundant code without understanding the why behind those original decisions.
This lack of intent opens the door to subtle but serious security regressions. Defensive checks might be deleted, control flows could be reordered in insecure ways, or secure-by-default configurations replaced with risky shortcuts. The result? A working, cleaner-looking application that’s now wide open to vulnerabilities like broken access control, CSRF, or disabled validation.
We recommend skipping the vague prompts in security-sensitive code. If you need to refactor or improve code, make your instructions explicit: what’s being optimized, and what must stay intact. And always inspect the changes through your security plugin to catch regressions early. Improvement without clarity is just roulette with your attack surface.
It’s tempting to think of LLMs as autonomous coders, but without human oversight, what you’re really doing is handing over the keys to an inexperienced driver. The study shows that even well-intentioned prompts can lead to degraded security over time, especially as AI begins to iterate on its own outputs.
This is because LLMs don’t understand your architecture, context, or risk - they mimic patterns, not purpose. A fix that looks right might violate your access control model. A cleanup might strip away a critical logging statement. And unless someone’s there to catch it, you’ll only find out when it’s too late.
On the other hand, with human oversight, the AI gets smarter. Every correction, every tweak, every confirmed remediation becomes training data; real-world feedback that sharpens the model’s judgment over time. You’re not just supervising the AI, you’re mentoring it into maturity.
Our advice is to stay in the loop. Let the AI draft and suggest, but always review, test, and confirm. It doesn’t take any extra time with an IDE security tool that offers secure, contextual fixes, tracks changes, and reinforces good patterns - but future you will thank you for avoiding all the alerts and rework.
Integrating LLMs into your development pipeline promises speed, scale, and instant iteration, but it also demands a rethink of your workflow. The study makes it clear: when AI is left to refine its own code unchecked, vulnerabilities don’t just slip in, they compound. That’s not just a tooling flaw, it’s a process flaw.
In traditional DevOps, security checks often come too late. With AI in the mix, the timeline compresses even further as code moves from idea to deployment in minutes. That means security can’t be a final step anymore. It has to be built into every commit, every prompt, every review.
This changes the role of CI/CD. It’s no longer just about catching failures - it’s your last line of defense against silent security drift. It also changes the role of the developer: they’re not just writing code, they’re collaborating with a learning system that needs guardrails and guidance.
Treat LLM outputs like early-coder commits: fast, helpful, but not production-ready without review. Integrate security scanning directly into the IDE to maintain all of the positives of vibe-coding while cutting the downsides and make sure your prompts are as structured and deliberate as your pipelines.
The study reinforces the fact that AI can be a powerful ally in software development, but only if we give it direction, oversight, and accountability. Left to iterate blindly, it will optimize for change, not for safety.
But with the right guardrails, we can flip that script. Here’s how:
In short, AI doesn’t eliminate the need for developers - it elevates their role. You’re no longer just writing code. You’re shaping how code gets written at scale. Do it with care, and your AI won’t just build faster, it’ll build smarter, too.