Creating A Programming Language

A few weeks ago, I discovered a freeCodeCamp video about building a programming language using Python. After finishing the 2 hour long video, I created a very simple scripting language. I found this whole process fascinating and was intrigued to learn more. Then I stumbled across this book, Crafting Interpreters, which covers the topic in more detail. 

First, let’s outline the process…

In order to create a programming language you have to build an interpreter or compiler to process your source code. In this article we will cover interpreters (specifically a tree-walking interpreter). One line of source code is entered and executed at a time (like an interactive prompt or REPL). This results in either an output or a variable saved to memory.

Now let’s get into the details of how interpreters work.

The first step in the process is to break down the source code into tokens via lexical analysis (or lexer). Let’s have an input string: “var a = 2”. The lexer scans this linear string, and “words” are formed called lexemes. Our linear string is now multiple strings: “var”, ”a”, ”=”, and ”2”. The lexemes are then combined with other data to create token objects. We now have the tokens: [var, a, =, 2].

Next is the parsing stage. In a normal programming language, the parser has two main functions: If a valid sequence of tokens is sent, an abstract syntax tree (similar to a binary tree) is created. If an invalid sequence of tokens is sent, a syntax error is returned. Whether or not a sequence is valid is based on the grammar rules of the language.

We’ll assume a valid sequence of tokens was sent to the parser. The abstract syntax tree is then traversed post-order (from leaf nodes to the root node), producing an output or saving a variable to memory. For this little language, we will use a dictionary to hold all the variables.

Let's run through this whole process with an example:

Input: 2 + 4 * 2 - 3 → Lexer: [2, +, 4, *, 2, -, 3] → Parser: [[2, +, [4, *, 2]], -, 3] → output: 7

Abstract Syntax Tree of Input:

Article content

I hope you found this post interesting and helpful. I’m excited to dive deeper into this subject and create more complex languages in the future.

To view or add a comment, sign in

More articles by Steven Hedges

  • How I Study Foreign Languages

    When I’m not studying programming languages, I enjoy studying foreign languages (well, currently just Portuguese)…

Others also viewed

Explore content categories