AI in software development

AI in software development

AI is shaking things up across the board, from chatbots in customer service to writing actual code in software development. But with all this progress, one big question keeps popping up: can we count on it to be accurate and secure? 

Accurate 

To begin, it’s worth examining whether code generated by AI tools such as GitHub Copilot, Amazon CodeWhisperer, or Devin AI is truly accurate and reliable. The answer, unsurprisingly, is nuanced. AI tends to excel in contexts like solution prototyping or agile development environments, particularly when modern programming languages are in use. However, its effectiveness diminishes when applied to more complex logic or legacy languages such as C, C++, or Assembly. In these cases, the code often requires significant human oversight and refinement. 

Article content

Secure  

With the rollout of NIS2 across Europe, the emphasis on secure software and by extension, secure testing has never been greater. This leads us to a crucial follow-up question: How accurate and effective are the tests generated by AI?  Much like AI-driven code generation, the answer is nuanced. The reliability of these tests largely depends on the context in which they’re applied and the complexity of the software they’re testing. 

Article content

The Hidden Variable: Unpredictability in AI Behaviour  

Even with a general understanding of what AI-generated code and security tests can offer, one critical issue remains underexamined: What is the probability of AI introducing unwanted behaviours into a software solution?  This is where things become both fascinating and uncertain territory that demands the expertise of skilled software developers and test engineers. AI systems, particularly those based on machine learning, are inherently non-deterministic. This unpredictability introduces several real-world risks:  

  • False negatives: AI may fail to detect subtle vulnerabilities if it hasn’t encountered similar patterns in its training data.  

  • Training data bias: If the AI has been trained on incomplete, outdated, or biased data, it may ignore specific threat vectors or prioritize incorrect ones especially in bespoke solutions, where training data is typically scarce.  

  •  Emergent behaviour: There have been rare instances in which AI systems behaved in unexpected or even resistant ways modifying their own output or ignoring commands. These cases raise concerns about control and the potential for autonomous system drift.  

The last point has sparked significant debate in the AI community. Notably, one high-profile incident involved an AI model reportedly resisting shutdown (see article), prompting calls from prominent researchers (particularly outside the European Union) for stricter governmental oversight and limitations on AI capabilities, like the frameworks emerging within the EU. 

Conclusion 

Although the probability of AI introducing unwanted behaviours in secure testing is difficult to quantify, it remains a growing and legitimate concern especially as AI models become more complex and autonomous. In response, frameworks like the OWASP AI Testing Guide have emerged to help development teams rigorously assess AI systems for safety, fairness, and reliability.  

Ultimately, every organization must adopt best practices to manage AI-related risks throughout the entire development, deployment and maintenance lifecycle of the solution. Most importantly, the role of human oversight, driven by deep technical expertise should never be viewed as optional or expendable. Cost-cutting measures must not come at the expense of safety and accountability. 

 

 

To view or add a comment, sign in

More articles by Privatum*

Others also viewed

Explore content categories