Leveraging LLMs for Efficient Unit Test Generation

TNG Technology Consulting

We solve hard IT problems: Agile Software Development, Artificial Intelligence, DevOps & Cloud!

Published Mar 5, 2025

Unlocking the Potential of Artificial Intelligence in Software Development

As software engineers, we're constantly seeking ways to optimize our workflow and reduce manual labour. One promising area of innovation is the application of Large Language Models (LLMs) to coding tasks, with the potential to significantly enhance developer productivity and efficiency. Tools such as GitHub's CoPilot have already demonstrated the potential of AI-assisted code generation, providing developers with intelligent suggestions and automating routine coding tasks.

Our goal is to leverage LLMs to automate entire aspects of software development, such as unit testing, with greater independence from human programmers. Unit testing is a prime starting point, offering a well-defined problem domain. We aim to identify the most suitable LLMs for this task and unlock their potential to streamline unit test generation.

The LLM Ecosystem: A Brief Overview

The landscape of LLMs is diverse, comprising both open-source (OSS) and commercial options. OSS models offer the possibility of self-hosting, enabling secure processing of confidential data – an approach that we at TNG utilize with various models. In contrast, commercial LLMs have demonstrated superior output quality in different benchmarks. [1][2][3][4]

Evaluating LLMs: Beyond Code Coverage

Evaluating the quality of generated tests is crucial to understanding how much we can benefit from the model that generated them. Code coverage alone is insufficient, as it only measures the percentage of code executed during testing without requiring any assertions or validation of the results. To address this limitation, we utilize mutation testing, where the code is intentionally modified to introduce errors, and the unit tests are run to see if they can detect these changes.

We also investigate the potential of utilizing LLMs as judges to analyze tests and provide feedback on their quality. Specifically, we employ LLMs to assess metrics such as simplicity, readability, matching of test name and content and test independence on a score from 0 to 4, allowing for a more nuanced evaluation of test quality and effectiveness.

By combining mutation testing with LLM-based evaluation, we can gain a more comprehensive understanding of our tests' strengths and weaknesses.

Strategies for Effective LLM Utilization

To effectively leverage LLMs in unit testing, we employ a range of strategies, including:

Divide and Conquer: Breaking down complex tasks into smaller, manageable components until they can be addressed with a single LLM call.
One-Shot Prompting: Providing the model with examples for the desired output.
Explanation Requests: Gaining deeper insights into the model's decision-making process.

By combining these strategies, we can harness the full potential of LLMs to generate unit tests in our experiments.

Recommended by LinkedIn

When AI Can Code, New Frontiers of Software Development

Julien Perez 3 months ago

The AI Era in Software Development: Beyond Code…

Amit Mittal 2 months ago

Is the Era of Humans Writing Code Really Over?

PRABHU V 2 months ago

Our Experience with LLMs in Coding

In our exploration of utilizing LLMs for coding tasks, we've found that dividing tasks into planning and coding phases is beneficial when working with models that have distinct strengths, allowing for more flexible task allocation and maximizing their potential.

Our experience, which currently focuses on generating unit tests for Java, has led us to favour these combinations:

Claude-3.5-Sonnet or Claude-3.7-Sonnet for coding tasks, due to their exceptional code generation capabilities, and OpenAI's GPT-4o for planning tasks
OpenAI's o3-mini for both tasks, leveraging its excellent reasoning capabilities

On the open-source front, DeepSeek-R1 has emerged as a promising contender, demonstrating notable potential in reasoning capabilities for both planning and coding tasks. However, a significant gap in output quality remains between commercial and open-source models, as illustrated in Figure 1:

Notably, while metrics evaluated using an LLM as a judge (GPT-4o) show minimal variation between models, substantial differences are observed in code and mutation coverage, highlighting the existing disparities in model performance.

Key Findings

The key takeaway from our research is that LLMs can be incredibly powerful tools when utilized properly, but their potential often needs to be unlocked by getting them to "think" about the task at hand, rather than just generating code. By creatively coaxing LLMs to understand and reflect on their output, we can tap into their full potential and revolutionize automated testing, paving the way for more efficient, reliable, and innovative coding practices.

[1] https://towardsdatascience.com/llms-for-coding-in-2024-performance-pricing-and-the-battle-for-the-best-fba9a38597b6/

[2] https://livebench.ai/#/

[3] https://www.vellum.ai/llm-leaderboard

[4] https://www.keywordsai.co/blog/top-benchmarks-for-the-best-open-source-coding-llms

To view or add a comment, sign in

Leveraging LLMs for Efficient Unit Test Generation

TNG Technology Consulting

We solve hard IT problems: Agile Software Development, Artificial Intelligence, DevOps & Cloud!

Unlocking the Potential of Artificial Intelligence in Software Development

The LLM Ecosystem: A Brief Overview

Evaluating LLMs: Beyond Code Coverage

Strategies for Effective LLM Utilization

Recommended by LinkedIn

Our Experience with LLMs in Coding

Key Findings

More articles by TNG Technology Consulting

Others also viewed

Beyond the Code: How GenAI Will Transform the Role of Software Developers

The Three Eras of Software Development: From Manual Code to AI Pods — And Why Your Next Hire Might Not Be a Programmer

The Hidden Constraints of AI-Assisted Software Development

Navigating the Future: AI's Transformative Role in Software Development

AI Made Code Cheap. It Did Not Make Code Cheap to Own.

How to Create an Expert Code Reviewer Using the CrewAI Agent Framework to Solve Code Review Issues in Large Codebases

AI-Powered Code Review: Benefits, Limitations, and Real-World Use Cases

Merits and Demerits of AI-Assisted Software Development

Supercharging Development with Cursor AI: A Real-World Perspective on Rapid Software Engineering

TechEdge Weekly: The AI Revolution in Software Development

Why LLM Code Needs More Than Unit Tests

Solving Coding Challenges With LLM Tools

Improving LLM Coding Accuracy with Code Intelligence

Automating Model Evaluation Using LLMs

Affordable LLM Solutions for Coding Automation

Streamlining LLM Inference for Lightweight Deployments

Explore content categories

Unlocking the Potential of Artificial Intelligence in Software Development

The LLM Ecosystem: A Brief Overview

Evaluating LLMs: Beyond Code Coverage

Strategies for Effective LLM Utilization

Recommended by LinkedIn

Our Experience with LLMs in Coding

Key Findings

More articles by TNG Technology Consulting

Automated office heating via Home Assistant integration

Event Modeling - Part III: Design Decisions

Migrating a 600-module Java codebase: How TNG tackled a complex enterprise upgrade

AI-assisted PL/I to Java Migration

Event Modeling - Part II: When it actually works

Event Modeling - Part I: What it is and how it works

AI-assisted Java Migration

Teaching the OLMo-2 Large Language Model to Reason: An Adventure with Fine-Tuning on AMD GPUs using Open R1

Reading E-Prescriptions via Health Insurance Cards in an App

Streamlining Logistics and Fulfillment for a Telecommunications Giant

Others also viewed

Beyond the Code: How GenAI Will Transform the Role of Software Developers

The Three Eras of Software Development: From Manual Code to AI Pods — And Why Your Next Hire Might Not Be a Programmer

The Hidden Constraints of AI-Assisted Software Development

Navigating the Future: AI's Transformative Role in Software Development

AI Made Code Cheap. It Did Not Make Code Cheap to Own.

How to Create an Expert Code Reviewer Using the CrewAI Agent Framework to Solve Code Review Issues in Large Codebases

AI-Powered Code Review: Benefits, Limitations, and Real-World Use Cases

Merits and Demerits of AI-Assisted Software Development

Supercharging Development with Cursor AI: A Real-World Perspective on Rapid Software Engineering

TechEdge Weekly: The AI Revolution in Software Development

Similar topics

Why LLM Code Needs More Than Unit Tests

Solving Coding Challenges With LLM Tools

Improving LLM Coding Accuracy with Code Intelligence

Automating Model Evaluation Using LLMs

Affordable LLM Solutions for Coding Automation

Streamlining LLM Inference for Lightweight Deployments

Explore content categories