Generating Code from Tests
While we agree that specification driven development is currently the best way to generate code using AI. We also want to make a case for test driven development to generate code using AI. These can be used in combination with specification driven development for better grounding when it comes to very complex code generation.
You will find a simple test cases here:
We asked various GenAI websites to generate code that would pass all the tests, Claude.AI generated this code which passes all the tests, you can see the code here:
This exercise was done manually by just pasting the tests in the prompt and asking AI to generated code that passes the tests. Needless to say, if we used an agent (like Crew.AI) to iteratively generate passing code (till all tests pass), very complex code can be successfully generated against test cases.
All the popular websites/companies generated the code with all passing tests in the first go:
- grok.com
Demo of the exercise can be seen here for ChatGPT: https://www.youtube.com/watch?v=zwWCBzHK8Qg
If you have a set of automated tests that would fail GenAI models (LLM) in generating code that passes all tests then I would love to discuss those. This could become our own Evaluation criteria and won’t need to depend on popular evaluations like HumanEval, MBPP, etc. Since these are well known evaluation criteria for large language models (GenAI) some people fear that these models can be trained to perform well (overfitting) on the above evaluation criteria.
Some recent posts that cover introductory concepts in applied AI:
Demo of the exercise can be seen here for ChatGPT: https://www.youtube.com/watch?v=zwWCBzHK8Qg