From lines of code to Davos
This week I have been briefly liberated from my ivory tower, and have been granted the powers of the Davos Man. I tore myself away from vim (text editors loved by hackers and opaque to humans) to do a bit of background reading on what I had just let myself in for. It was very exciting to discover I had now become one of the gold collared (I better let payroll know they have been making a mistake) global elite who are "committed to the improvement of the state of the world". But I also appear to be a "citizen of nowhere", a migrant who is responsible for the decline in the national economy, and worst of all, a creator of technology that will result on joblessness in the future and surely armageddon. Alright, no pressure then...
I am one of the lucky researchers who will be giving an insight into our work in the WEF IdeasLab sessions. In a modern research environment where hyper-specialism is the norm, it can be difficult to explain your research to academic in the next office not to mind in 5 minutes to a lay audience. However The Fourth Industrial Revolution does at least give the right framework for understanding the motivation and significance of what we are doing in the world of data and computational science.
It is hard to find an area of science and engineering that is not becoming increasingly digital. Imagine that it is possible for us to reconstruct accurate images, km's into the Earth, using a airgun, array of microphones and a supercomputer - no digging or drilling required! Even though I work in this area it still makes me think of Arthur C. Clarke's third law "Any sufficiently advanced technology is indistinguishable from magic." It has only been possible to do this in recent years because it involves collecting lots of data (petabytes) and modern supercomputers - the oil and gas industry have some of the largest supercomputers in the private sector just for this one problem.
There is something very satisfying about supercomputers themselves and they are big, have lots of flashing lights and often have to have their own purpose built building. The kind of thing you can tie a ribbon around for a grand opening. However, what gets overlooked is that it is nothing but a hunk of metal and an over priced light show without the software that does the actual work.
While we regularly hear of kids writing apps in their bedrooms and promises of learning to program in an hour etc, what you need to understand about the kind of software used in science and engineering is that it can be staggering complex - requiring deep expertise within the application itself, as well as expertise in a wide range of different fields in mathematics, computer science, computational science etc. As polymaths are rare, such software in the best cases tend to be developed within teams of specialist.
Other than the complexity of science and engineering software, the sheer scale of it is problematic. Codes can quickly run into millions of lines of complex software, and the quality of the software is highly variable as the software development is not normally seen as a funding priority and training is severely limited. This brings with it a whole host of additional problems including reproducibility of research, verification, reliability, software reuse, extensibility, commercialisation, code modernisation etc.
I think it is striking that there are so many similarities to this crunch and the disruptive changes in computing in the 1950's. Up to that period all software was written in machine code - it was pretty painful to write even the simplest of algorithms. In 1952 Grace Hopper developed the first compiler - put simply this is software that reads a high level programming language that is easier for humans to write algorithms and automates the generation of machine code that you can then run on your computer. In 1957 Fortran was released by IBM and it is still one of the most commonly used programming languages used scientists and engineers on supercomputers today. From that point many new programming languages and compilers were developed with all kinds of features and abstractions allowing us to develop the vast array of software that we see today. But for me the most striking papers from that time was Hopper's "Automatic Programming - Definitions" in 1954. The title says it all really - the big idea of that period was to bring automation to programming.
Back to today, I would argue that the biggest problem that is that scientists and engineers are still trying to program computers at a layer of abstraction that was conceived in the 1950's. Higher level languages such as Python and other interpreted languages have existed for some time but they lack the raw performance that demanding science and engineering applications frequently require. Therefore we are usually presented with the choice between either ease of use or performance, but never both.
An alternative approach is to develop domain specific languages. These are programming languages that have been specifically designed to solve a particular kind of problem. When a problem is already well defined, for example within a mathematical framework, it becomes possible to express sophisticated algorithms both simply and briefly. Because a domain specific language is bespoke for a specific purpose it is relatively straightforward to write your own compiler that takes this high level source code and automatically generate source code in a lower level language such as C. Similarly because the domain specific language clearly defines a pattern, we know exactly what parallel programming patterns and optimisations can be performed. We can even manipulate all of the equations and explore mathematically how the problem might be simplified. We have cases where a mathematical equation consisting of thousands of terms (ie it would take you a few sheets of paper to write out a single equation) and code is generated that outperforms anything that an expert human programmer could implement - and it happens in seconds and does not make typo's.
What we have found is that by increasing automation in software development it not only becomes easy for application developers to develop complex applications that run fast on parallel computers, but the layers of abstraction actually facilitate collaboration between the application experts, numerical analysis's and computer scientists in a way that just has not happened in traditional software projects. One of the reasons for this is that traditional software in this space might have in the order of a million lines of code - developers in the midst of this software feel like the greek Titan, Atlas, who has condemned to hold up the sky for eternity. If you are an Atlas you spend all your time just trying to stay of your feet - not innovating and creating! In contrast code generation platforms such as ours only have in the order of a few thousand lines of code in high level language such as Python, so even the idea of redesigning and rewriting from scratch is not a scary idea and indeed we have done just that a few times already as we refine our ideas.
Going back the fourth industrial revolution I see this as approach as having three key ramifications. The obvious one is work efficiency. We are already able to reduce development time from years to hours. While at high performance computing conferences we are asked only about how fast our code runs (and the answer is very fast!) I think the real benefit is that we have made the human programmer thousands of times faster.
Secondly, we are creating smart tools that have great potential for making suggestions and helping to guide the human programmer (is there an intern out there that would like to help integrating Alexa this summer?) It always seems that we get worried that technology is going to make people jobless but technology has always been about extending human capabilities - leading to greater discoveries and better outcomes for all.
Increasing automation in software also has a positive impact upon skills and training. The skills crunch and the worry that all the benefits of the fourth industrial revolution will be concentrated in a small group of tech savvy elites are both symptoms of the fact that we are still trying to develop technology at too low a level of abstraction - again think back to the 1950's when everyone programmed in machine code and how much worse it would be if we continued like that. The fact is that short of genetically engineering the whole human race, it is not possible to create a workforce of super-humans who are experts in all things. By creating smart software technologies that facilitate collaboration, we make knowledge and skill manageable again. This is not only important for our children as they make their way though the education system, but also for the existing workforce who are increasingly under pressure upskill.
While the Davos call for responsive and responsible leadership is clearly focused on the surge in populist politics and the theme it is also critical in how software technologies and the fourth industrial revolution is going to evolve. It is anomalous that in recent years there has been a growing understanding that open access to publicly funded research is vital to the health of the economy and wellbeing of society. There are already tough policies in countries such as the UK regarding open access to journal articles and indeed data arising from publicly funded research. However, while open source software as a well formed concept and clearly critical to the democratization of technology, there are little or no policies regarding software arising from publicly funded research. Much of industry also struggle to understand the importance of software and are unwilling to invest in something that they won't own exclusively. There are exceptions, for example my teams research on automating software development for seismic imaging is fully funded through our Intel Parallel Computing Centre where the only condition of our funding is that our technology is open source. But this needs to become the norm rather than the exception.
Additionally, while software forms the very fabric of the digital world and will define the digital world, programming is still siloed within computing departments rather than being integrated into the wider curriculum across schools and universities. Indeed, there has been a wide understanding for years that it is nearly impossible to have a successful academic career when their primary output is software - scientists and engineers who are also proficient in software development exist in the world in spite of the education system rather than because of it.
There is much that government and industry to create an eco-system for software technologies to thrive and to make the fourth industrial revolution work for everyone. Indeed it is a time for change.
Really glad to hear how Intel PPC has been important in the context of your research. Congratulations for all the achievements so far!