Implementing and Evaluating Gen-AI Coding Tools
Photo by https://unsplash.com/@cgower

Implementing and Evaluating Gen-AI Coding Tools

For sixty years computer programmers have considered our labor as something beyond what other people do. We're certainly not blue-collar workers, and pride ourselves on having a level of artistry and academic rigor that goes beyond white-collar work. So what happens when the first serious programming automation tools get introduced?

Chaos! Panic! Fanatical devotion! Reactions run the gamut, but seem to have one thing in common - strong emotion. But a few years into the gen-AI revolution, it might be time to step back and really evaluate how, and even if, these tools can be effectively integrated into our workflows.

The best perspective I've seen on gen-AI is that it's just another new technology. It's not the worst thing ever invented, not the best thing ever invented, just a new tool to add to our software engineer's toolbox. But what really separates it from most other time saving tools we've built in the past is the element of randomness inherent to the technology. We've never had to just accept that something we rely on is fundamentally unreliable, no matter how we try to coach it into consistency. And that's OK! We deal with unpredictability all the time - just usually not in our tools.

Software Engineers already know how to control chaotic actors

Picture an early design session for a public-facing API. One of the first considerations you'd have would be security (I hope). You'd consider what data you expose, and what external data you'll expose your systems to. You'd put walls up where you need to, and sanitize inputs when they might smell fishy.

Believe it or not, you can do this with code too - and you probably already do. You put source controls on your important code, keep your API keys private, and have senior engineers pay a bit more attention to a new hire's code! Keeping up the quality of generated code just takes a few more steps.

While SaaS companies like Github will try to sell you on automated checks, they can only go so far. Accept that you can't guarantee quality without reading the code; you can't guarantee safety without secure infrastructure (and still reading the code); and you can't guarantee efficiency at all (so don't let your chatbots jack up your AWS bill). Generated code is sold as a cheap solution to fast development, but without taking measures on the frontend to keep it up to your standards, you're just shifting the workload from the beginning of the development lifecycle to the never-ending debugging and maintenance stage.

A new process doesn't change your KPIs

Implementation is only half of the equation - evaluation of a tool's benefits to your workflow matters for gen-AI as much as any other technology. You probably already have metrics in place for monitoring your team's performance. Some will be qualitative, some will be quantitative, but whatever they are they shouldn't be changed to make your AI-assisted developers look more productive than they are.

No other type of automation would get away with a custom measurement made specifically to show its strengths - you wouldn't buy a new robot vacuum that brags about how few times it runs over dog poop when vacuuming by hand never smears dog poop all over your house, right? Similarly, you probably wouldn't implement a gen-AI coding tool that boasts a 10x increase in lines of code written if lines of code was never your goal, and the tradeoff is an occasional loss of millions of customers' data.

Experiment with implementing gen-AI tools. Try new things, listen to your coworkers' ideas, and research what other people are doing. But don't evaluate the outputs of the process through rose-colored lenses.

There will almost certainly be times when these tools are useful. In my own work at a fast-moving startup writing code for internal use, the benefits absolutely outweigh the costs (and of course it helps that my coworkers are smart people with lots of expertise). But if implementing guardrails outweighs the benefits of using gen-AI tools, do the hard thing and say so to whoever the decision maker on your team is. At best you'll avoid a costly mistake, at worst you'll get to be smug in a meeting six months from now.

-30-


To view or add a comment, sign in

More articles by Justin Ebert

  • Simulating 3D Motion With Sine Waves

    Check out this post on my blog, where all the animations are generated dynamically and there's limited interactivity I…

  • Baseball by the Numbers

    Last fall, I dove headfirst into the world of baseball data. I started with a simple question: how often do the Braves…

    1 Comment
  • Minecraft Mapping with Lidar Data

    A few weeks ago, I had an idea. I was going to use lidar datasets to create 3D interactive maps of parks and trails…

    2 Comments

Explore content categories