Generating Code from Tests

Generating Code from Tests

While we agree that specification driven development is currently the best way to generate code using AI.  We also want to make a case for test driven development to generate code using AI.  These can be used in combination with specification driven development for better grounding when it comes to very complex code generation.

You will find a simple test cases here:

https://github.com/ansarmuhammad/generate_code_from_tests/blob/main/test_sonar_report_reader.py

We asked various GenAI websites to generate code that would pass all the tests, Claude.AI generated this code which passes all the tests, you can see the code here:

https://github.com/ansarmuhammad/generate_code_from_tests/blob/main/sonar_report_reader.py

This exercise was done manually by just pasting the tests in the prompt and asking AI to generated code that passes the tests.  Needless to say, if we used an agent (like Crew.AI) to iteratively generate passing code (till all tests pass), very complex code can be successfully generated against test cases.

All the popular websites/companies generated the code with all passing tests in the first go:

-              claude.ai

-              perplexity.ai

-              chat.deepseek.com

-              chat.qwen.ai

-              kimi.com/chat

-              chatgpt.com

-              grok.com

-              m365.cloud.microsoft

-              gemini.google.com

Demo of the exercise can be seen here for ChatGPT: https://www.youtube.com/watch?v=zwWCBzHK8Qg

If you have a set of automated tests that would fail GenAI models (LLM) in generating code that passes all tests then I would love to discuss those. This could become our own Evaluation criteria and won’t need to depend on popular evaluations like HumanEval, MBPP, etc.  Since these are well known evaluation criteria for large language models (GenAI) some people fear that these models can be trained to perform well (overfitting) on the above evaluation criteria.

Some recent posts that cover introductory concepts in applied AI:

  1. 5 AI Terms You Need to Know https://www.garudax.id/pulse/5-ai-terms-you-need-know-ansar-muhammad-pmp-psm-1-mwupf/
  2. Simple chatbot versus an agentic chatbot https://www.garudax.id/pulse/simple-chatbot-versus-agentic-ansar-muhammad-pmp-psm-1-bwkjf/
  3. Agentic Code Generation through CrewAI https://www.garudax.id/pulse/agentic-code-generation-through-crewai-ansar-muhammad-pmp-psm-1-grkuf/
  4. Building a Multiagent system using Google’s agent development kit https://www.garudax.id/pulse/building-multiagent-system-using-googles-agent-kit-ansar-ibkbf/
  5. What is Retrieval Augmented Generation (RAG): https://www.garudax.id/pulse/what-retrieval-augmented-generation-rag-ansar-muhammad-pmp-psm-1-wpdpf/
  6. Using custom instructions in ChatGPT to help with better / relevant responses: https://www.garudax.id/pulse/using-custom-instructions-chatgpt-ansar-muhammad-pmp-psm-1-molof/
  7. Powerful tools that save time (NotebookLM, etc.): https://www.garudax.id/pulse/powerful-tools-save-time-notebooklm-etc-ansar-muhammad-pmp-psm-1-mbcrf/?trackingId=UGZxe0bkReuxxe8om8X43A%3D%3D
  8. Improving your AI predictions through voting technique https://www.garudax.id/pulse/improving-your-ai-predictions-through-voting-muhammad-pmp-psm-1-kjkzf/?trackingId=hAMUyZfISwGjziGF91wAMQ%3D%3D
  9. Automation with n8n.io: https://www.garudax.id/pulse/automation-n8nio-ansar-muhammad-pmp-psm-1-sgeaf/
  10. How to Generate Synthetic Data for your Testing https://www.garudax.id/pulse/how-generate-synthetic-data-your-testing-ansar-muhammad-pmp-psm-1-k4mvf
  11. Using Microsoft Copilot to find any errors in your Excel calculations https://www.garudax.id/pulse/using-microsoft-copilot-find-any-errors-your-excel-ansar-rqtxf/
  12. A prompt fit for an agent https://www.garudax.id/pulse/prompt-fit-agent-ansar-muhammad-pmp-psm-1-2v5xf/
  13. Protecting your AI agents from Prompt Injection Attacks https://www.garudax.id/pulse/protecting-your-ai-agents-from-prompt-injection-muhammad-pmp-psm-1-kaehf/

To view or add a comment, sign in

More articles by Ansar Muhammad, Azure AI, PMP, PSM 1

Explore content categories