Impact of Careful Naming when Using GitHub Copilot

Today, I continue my investigation of how I can better leverage tools such as GitHub Copilot, and their impact on the work of software developers. I recently investigated how such tools can benefit from Literate Programming methodology.

I this new post, I am investigating the importance of carefully naming of functions, parameters and variables, and the impact on the performance of the tool.

Naming Things

More than 20 years ago, David Thomas and Andrew Hunt wrote in The Pragmatic Programmer:

The beginning of wisdom is to call things by their proper name. — Confucius

What’s in a name? When we’re programming, the answer is “everything!”

We create names for applications, subsystems, modules, functions, variables — we’re constantly creating new things and bestowing names on them. And those names are very, very important, because they reveal a lot about your intent and belief.

We believe that things should be named according to the role they play in your code. This means that, whenever you create something, you need to pause and think “what is my motivation to create this?”

This is a powerful question, because it takes you out of the immediate problem-solving mindset and makes you look at the bigger picture. When you consider the role of a variable or function, you’re thinking about what is special about it, about what it can do, and what it interacts with. Often, we find ourselves realizing that what we were about to do made no sense, all because we couldn’t come up with an appropriate name.

Naming has always been a very hard and important problem in computer science, and this reality won’t change any time soon, if anything, it will get even more important in the coming new era composed of LLMs and Copilot like systems and tools.

The current premise we live with since roughly last Christmas, is that software developers productivity will experience a major boost helped by the new type of tooling that is becoming available, namely GitHub Copilot and its integration in VS Code. If the premise is true, and I have no indication at the time of this writing that it won’t, then the next immediate question become: how can we best use those tools to get the most pleasant and effective productivity boost?

Today, I am investigating the aspect of naming.

Meaningful vs. Meaningless

For this investigation, I will implement the exercise #10 of chapter 5.0.0 of The Art of Computer Programming:

10. [15] You are given a tape containing one million words of data. How do you determine how many distinct words are present open the tape?

The implementation will be in Python that will only require a handful of functions. This goes against the intent of the exercise, but I was lacking imagination to find something to code for this post.

For this experimentation, I created two empty and distinct workspaces. I loaded each of the workspace in different VS Code instances. The purpose here is to make sure that Copilot didn’t get any hint from elsewhere in the Workspace about my intents.

Then, I purposely didn’t write any comments, any text of any kind other than pure, uncommented, Python code. The rough structure of the implementation is:

Use a book from the Gutenberg project as the source of token. In this case, we will use Marcel Proust’s translation of John Ruskin’s La Bible D’Amiens
Tokenize the book in words
Create a set of distinct words/tokens

Hopefully Meaningful Naming

Read the full investigation here...

Conclusion

As we saw with those two examples, the proper naming of things is very important to get the most of this new kind of tooling. I agree that the second example is extreme, but in my experience they are not uncommon names that we can find in code bases. I didn’t try to obfuscate every name, the names where just too generals and a bit useless.

When David and Andrew wrote twenty years ago:

those names are very, very important, because they reveal a lot about your intent and belief

They considered those names very very important such that your intent and belief could be properly communicated to whoever read your code in the future (including you a few months from then). Today, this assertion stands true, but its scope is broader. Names are very, very important, because they also instruct assistant tools such as GitHub Copilot to more easily guess your intent and belief to help you write better code faster.

What I personally like with this new family of tools such as GitHub Copilot is how I think they will shape the software developers of tomorrow, how it will force them to be more careful about their writing, and in this case their naming. The better the writing, the most precise and unambiguous it is, the more power they will be able to harness from those LLMs.

I start to envision that the general productivity of software developers will experience an important boost in the coming five to ten years, but also (and more importantly to me) an overall increase in the quality of the code and systems they produce. All this because the tool became a huge incentive for them to care about those mundane non-code details such as writing mundane humans words.

Today, I feel that a lot of developers wonder if their job is at sake. In my next post, I will start outline what I am currently feeling around those questions. I don’t think developers job are at sake, but the way they work will definitely have to change, and the way we train future generations of software developers will have to change as well.

Impact of Careful Naming when Using GitHub Copilot

Frederick Giasson

Naming Things

Meaningful vs. Meaningless

Hopefully Meaningful Naming

Conclusion

More articles by Frederick Giasson

Explore content categories

Naming Things

Meaningful vs. Meaningless

Hopefully Meaningful Naming

Conclusion

More articles by Frederick Giasson

Data Reliability Engineering

Literate Programming at the dawn of LLMs

Measuring the Influence of Expanded Knowledge Graphs on Machine Learning

Disambiguating KBpedia Knowledge Graph Concepts

Extended KBpedia With Wikipedia Categories

Leveraging KBpedia Aspects To Generate Training Sets Automatically

Dynamic Machine Learning Using the KBpedia Knowledge Graph

Building and Maintaining the KBpedia Knowledge Graph

Create a Domain Text Classifier Using Cognonto

Improving Machine Learning Tasks By Integrating Private Datasets

Explore content categories