An Overview of the q language

An Overview of the q language

'q' is a proprietary language build on top of an existing 'k' language (developed by ex-Morgan Stanley computer scientist Arthur Whitney) and the kdb+ database (The 'q' language / kdb+ is free for non-commercial use). They are all proprietary products of Kx Systems and it is perhaps more widely known (if not the de facto) in the finance and investment banking industry more than any other industries. As such, it is aimed at time-series based data, large volumes and high transfer rates. Very complex applications can be written in the 'q' language with an extremely small footprint. A large number of Altair's real time visualisation software Panopticon's customers connect to kdb+ databases and utilises q.

A recent exploration into the depths of the q language reveals a very sophisticated and expressive computational programming language. It is both like and unlike various programming languages. Perhaps the first point is the vector programming underlying its core, yet this language may not be known widely in the HPC community (or perhaps I have not looked wide enough while I was in HPC, it happens). Vector supercomputers once ruled the HPC world, but when they approached extinction, the vector processing capability were re-incarnated into commodity CPUs and is now powering our PCs (since the days of Intel MMX, SSE, etc...). And it appears that around that similar time shortly after the Millennium, 'q' was born independently of Vector Supercomputers or vectorised CPU instructions, but rather tracing its origin back to the 60s from APL (A Programming Language), thus having both vector and functional programming influence.

For reference, a free book is available at: https://code.kx.com/q4m3/0_Overview/

And a Tutorial series is at: https://www.youtube.com/playlist?list=PLypX5sYuDqvrwBD2EMWadIMiTqJZmVsqm

For those coming from scientific, computational, HPC background, thinking in terms of vectors will ease our understanding to 'q'. For the general programmers who are familiar with C, Java, Python, and the rest, the good advice from the YouTube link would be to throw away your understanding of control flow, loop, if-else, threads, shared globals, objects, inheritance. The syntax itself is short, succinct and raw, yet expresses beautifully the operations it performs. The following will describe some of the noteworthy things about 'q' which are different than the current popular programming languages.

'q' is similar to many scripting language, ie interpretative (no compilation), dynamically type (no pre-declaration of data type). The two fundamental data structures are lists and dictionary, again no surprises here. However it is the vector operations that are perhaps new to most modern day programmers. For example a vector 1 2 3 can be added directly to another vector 10 20 30. Yet these vector expressions should be reminiscent of those in Matlab and Fortran, but the similarities don't extend much further. The syntax is perhaps quite different (though perhaps not too dissimilar to other functional programming languages). The vector (list) items are delimited by spaces (not commas). The mathematical operators has equal precedence (no BODMAS rule) and the order is designed so that the operations are performed in the right-to-left order. For example: 2*3+4 is actually 2*7, which is 14. This right-to-left and single line way of writing code is perhaps the biggest change a programmer of other languages need to overcome.

Control flow and loops goes out the window because an operation acts on all the items in the vector. There is no point in having an explicit index to loop over. As such, an operation over 10 or 10 million items can be written in one line. In fact, entire functions or methods are written in one line, as shown in examples later. In terms of data types, there are the familiar char, boolean, integer, float and so on. Strings are like in Fortran or old C, a string in q is actually a character of arrays. The really most interesting data type I found is the Date representation. Instead of some arbitrary representation having an initial value at 1970 or 1900; q's date are actually integers starting from 1 Jan 2000. This means date computations benefit from the speed of raw integer operations, as well as being able to perform fancy date arithmetic.

List operations are powerful, since this is one of the core data structures of q. List, or vectors can be operated with another vector or scalar, not dissimilar at all to their mathematical counterparts. In addition to aggregate operations like 'max', 'min', 'sum', there are more interesting versions 'maxs', 'mins', 'sums' which actually returns a cumulative vector of the aggregation process. The highlight of the 'q' language of its expressiveness and power would be its function definitions. Instead of describing, ponder upon the following one-liner, and consider how such functions would need to be written in Python, Java, C, etc:

  Algorithm for Newton Raphson with initial point as 1.0

{[xn]xn+(2-xn*xn)%2*xn}\[1.0]

  Algorithm for Fibonacci Sequence, inital points 1 1 , for 10 iterations

{x, sum -2#x}[10;1 1]

  Algorithm for Idealised Profit in Trading (sym is symbol column, px is price column of table t )

select max px - mins px from t where sym=`appl

As for Data Analytics and complex data processing, there are tables, qSQL and a few more useful functions. Tables are collection of columns, like Fortran/Matlab but opposite to C-based languages. Columns in tables generally mean they are of the same data type that means they are easily stored as vectors. For q, this means performing operations on large datasets are extremely fast. Combining the single line functional syntax with vector based operations, very complex yet efficient programs can be written in a terse manner. The qSQL syntax enables very SQL-like expressions to be written. This may be a comfort for SQL programmers, but it still pays to understand the underlying concepts of q, since there are fundamental differences that can lead to incorrect expectations. Native functions like 'xbar' that creates bucketing are extremely powerful when applied to time-series based analysis. In an example of 10million rows, a certain select operation took 150ms. 

Most programming languages have the capability of File I/O. q also has File I/O for both text and binary files (again there is the similarity with Fortran's binary files capability, rather than the streams based files I/O of other languages). Not surprisingly by now, q's way or reading and writing files are one-liners. There is no need to open or close a file or the many other lines of instructions needed by other languages. Besides File I/O, q is also capable of Interprocess Communications and Asynchronous processes. Together, these capabilities enable q to write client-server applications without using any other external tools.

From this brief high level exploration, q seems to be an interesting, powerful and useful programming language. It has the expressiveness of functional programming, the power of vector processing and the applicability to real high volume data computational needs. The fact that it is born within the Financial Trading sector by no means exclude it from being used in other industries. Wherever there is large numerical, time-based datasets that requires computation, q may be a good choice. These other industries that may benefit include: data science, scientific research, defence, telecommunications, etc. In terms of providing real-time visualisations of the large and fast datasets, Altair's Panopticon with kdb+/q would make a formidable combination.

A transcribe of the YouTube link showing the syntax of q is found at https://xtechnotes.blogspot.com/2020/04/notes-q.html


I really like this language and want to build whole Backtesting and Visualization System with help of q language as this will be fast and smooth. I have developed some indicators from scratch initially for learning. learning more day by day.

Like
Reply

"Instead of some arbitrary representation having an initial value at 1970 or 1900; q's date are actually integers starting from 1 Jan 2000. This means date computations benefit from the speed of raw integer operations, as well as being able to perform fancy date arithmetic" Pardon my Q naiveness, but how is this different from the Unix epoch time that revolves around 1970/1/1 instead of 2000/1/1?

To view or add a comment, sign in

Others also viewed

Explore content categories