Disruptive Technologies - When does Software write Software?
How Smart Technologies affect Software Development – a Perspective
Everything gets “smart”, the trends of the digital business like the Internet of things, smart cities, intelligent traffic control or smart factories – all presupposes intelligent machines. The knowledge of a machine is its software. Software is the link between all participants of our in future increasingly interconnected and intelligent world – that between the machines and especially that between humans and machines.
In households and in almost all industrial companies, service and production companies: software plays a central role in almost all areas of our daily lives. It optimizes processes, automates and enhances productivity. "Smart" machines come along with new business as well as society-economic concepts.
Big Data and Deep Learning ensure that machines are equipped with more and more knowledge and thus enable them to make decisions independently. They analyze their environment and the behavior of its users. With Machine-Based Learning the machines get increasingly smarter. And once the technologies are implemented, even without much human intervention.
Machine-processable knowledge about the things of the real world is an enterprise asset across all sectors already for a long time. Firstly, as an estimated central knowledge resource with the advantages of semantic search and anytime online availability. On the other hand, for enterprises, also a marketable economic good.
Knowledge is not only the basis for assistance and recommendation systems, but once standardized also for facilitating knowledge exchange between different partners, clients and suppliers as well as between industrial companies among each other.
An even wider horizon gives us knowledge about processes and procedures, the “know-how”, because this allows the machines not only to put their skills into action, so to change things or states actively, but also to even optimize actions according to diverse criteria automatically.
Once a machine comes with motivation it finally acts not only as an assistant, but as a competent autonomous instance that can even develop own strategies and team skills. Prominent examples are competing teams of soccer robots or self-driving cars, presented live at the last Frankfurt International Car Fair (IAA) – based on High Performance CPUs or GPUs, but all this mainly with software.
But what impact has this Digital Revolution on software developers? Not only referring to the kind of software we develop in the future, but in particular to our software development itself?
To answer this question let’s first have a look at the semantic web and the knowledge itself, and then how it is processed and maintained by machines.
Data, Information and Knowledge
For a common terminology, let’s briefly discuss the difference between data, information and the actual knowledge itself (Fig.1).
Figure 1: From Characters to Competence
Initially, data is only a structured compilation of letters, numbers and symbols, although in a specific syntax, but without reference. Data neither follows certain patterns nor relates to any context. So pure data contains no discernible meaning for machines. For example, the number 15, this can be an age, a distance or a weight. Or take the word “break”. Even for us humans this string is ambiguous, related to the type (noun or verb) as well as related to the affected objects.
Data obtains a meaning only when it is used in a certain context in addition to their syntax and if it has a specific purpose. The following JSON data about the author of this article for example contains his age:
{ name: "Alexander Schulze",
country: "Germany",
age: 47,
birthday: "01/26/1968",
phone: "+49-2407-902486" }
Listing 1: JSON record of the author
In the context of the field age the number 47 becomes information. And this allows our software to make decisions as well as to answer questions like who, what, where, when or how many – in a list of authors, for example, how many of them are older than 30.
To run this query, however, the software needs to know that the field age implies a time in years. And it does that just because it is programmed explicitly in its source code. If the name of the field is changed in the database, the software can no longer work because it no longer knows its meaning and also cannot detect it. More formally expressed the software is the knowledge of the machine represented in code.
Although code analyzers in the field of error detection and code optimization are already making promising progress (for example, the SQE project), the knowledge in the source code, or even in the binary code is very difficult to extract again. One could therefore understand the act of programming as a unidirectional knowledge transfer from man to machine.
A key problem is that the identifiers used are arbitrary, i.e. not subject to any formalism, which allow a conclusion on their semantics. On the one hand, this means of course a certain freedom for us developers – at least within the scope of the syntax of the respective programming language. On the other hand, exactly this arbitrariness makes it difficult to compare and exchange information from different sources and especially hard to interpret and to process.
So we are already in the midst of a new trend, namely the increased introduction of conventions, such as in the eCl@ss project . Here it is, among other things, about product classification to facilitate the sharing of articles and services and their characteristics. For us web and mobile developers particularly relevant for the e-commerce sector.
What we have not even touched yet: In the example above with the author record we only imply that the number 47 in relation to the age of a person is an information about time measured in years. For food, mayflies or atomic events a number more likely will be associated with units like days, hours or microseconds.
Let’s be honest to ourselves: How often do we find lines like the below ones in our code?
var timeout = 2000; /* timeout in milliseconds */
var expiration = 14; /* expiration in days */
var age = 47; /* age in years */
Listing 2: Value assignments without units
The unit is an essential part of a numerical information. Without a unit, no qualified comparison for a machine and thus no autonomous decision is possible.
The unit not only implies a numeric data type, but also provides a meaning. A property A with the value 5 and the unit kg clearly describes a mass – regardless of the identifier. Another property B with the value 2000g then is comparable with A, if both units kgand g are interlinked with the base unit mass – provided you have appropriate conversion factors.
The same applies to identifiers: For example, if for our authors record the birthday field is linked with the Date class, then the system knows that the given string is a date. Is StartDate then a subclass of Date, then the system knows that on that date starts something. Are now the results of the functions today() and now() cross-linked with the subclass CurrentDate, then the machine can determine the age of a person autonomously – without that this was explicitly implemented in the program code.
So knowledge is created through the interconnection of information. On the one hand, knowledge allows logical conclusions, and on the other hand the recognition of contradictions and inconsistencies. So in future the quality of our data and the intelligibility of our code will improve accordingly.
Smart Knowledge Bases
Now the question arises: How are we going to manage our future increasingly comprehensive knowledge in databases? Knowledge bases already differ conceptually significantly from traditional databases – both relational SQL table models, such as in MySQL, as well as the NoSQL approach, such as in MongoDB with its documents and collections.
Knowledge bases follow the concept of graph databases. A graph consists of nodes, the actual elements of a database, and edges, the connections between these elements. Then there are the properties that allow to describe both the elements themselves as well as their connections among each other. Data properties describe the concrete values for elements and object properties the relationships between the elements. All these terms are superordinately called resources.
The Semantic Web is a special derivative of a graph database, built of so-called ontologies. These manage their content in statements, which again are represented as triples of subject, predicate and object. An example:
Mozilla | isManufacturerOf | Firefox
Subject and predicate are always resources, i.e. either nodes or properties. For object properties, the object can be a resource, but for data properties also be a literal, i.e. a concrete value. Example:
Mozilla | was_founded | 2003
The ontologies of the Semantic Web have been standardized by the World Wide Web Consortium (W3C) and are represented in the Web Ontology Language (OWL), which in its version 2.0 meanwhile is known as OWL2. A major strength of this language is its capability to classify objects. In ontologies objects are called individuals and each individual can be assigned to one or multiple classes. Statements, so-called axioms, could be for instance:
Firefox | is_a | Browser
Chrome | is_a | Browser
A subsequent axiom in the ontology, like:
Browser | supports | JavaScript
allows to logically conclude the following two axioms:
Firefox | supports | JavaScript
Chrome | supports | JavaScript
This means, although these two axioms are not explicitly specified in the knowledge base they can be queried as such.
The secret behind that is inference. So called reasoners, the inference engines for an ontology, generate these additional statements at run-time following logical rules. Indeed, depending on the size and complexity of an ontology the reasoning process can take a while. However, for the various purposes variably comprehensive and powerful reasoners are available.
In this article I would like to first give you an overview of new smart technologies and their impact on us as software developers. In a subsequent article, we will go deeper into the features and the efficient use of ontologies in real applications. As a good start I recommend you to study the Pizza Ontology as well as the documentation of Protégé, one of the leading free ontology maintenance tools.
Once machines possess certain knowledge, this will simplify the future software development significantly. Many things that we previously have programmed tediously or defined explicitly and independently, in future will be inferred logically with less but interlinked statements.
Unlike traditional databases, knowledge bases are designed as hierarchically and modularly organized ontologies. They contain not only data and information, but also abstract concepts and classes as well as concrete facts and processes. Many ontologies are already freely available on the Internet – some with general, others with specialized content. It is expected that the portfolio gets wider here and that with incipient standardization and centralization processes we more and more will use existing knowledge rather than to develop it ourselves.
The newly launched technology project “Enapso” follows exactly this trend. Based on a generic concept machine processable knowledge about hardware, operating systems, platforms, programming languages and their syntax, libraries and their APIs, about architectures, data models, processes, algorithms and best practices as well as about performance, resource utilization and safety criteria is provided centrally in an Internet portal. Knowledge, both static and such about processes and methodologies, can be exchanged and used by developers in future either publically or restricted to companies or projects.
Smart Requirements
So far we have discussed extensively about the expected influences of smart technologies to the software development itself. Let’s go a bit back now.
After analyzing existing or desired business processes a software life cycle typically begins with the specification of requirements – perhaps with a classic specification or with a product backlog in an agile software development environment. But whether waterfall model or Scrum, a significant drawback of all these specifications is that they all are written in natural language.
This means that also the requirements only we humans can understand and process. Similar to the program code the knowledge of the business analysts or the sales colleagues is unidirectionally transferred from man to machine – and as free text, from which it is difficult to extract again or just to be processed by a machine. Not to mention the naturally occurring misunderstandings or incompleteness.
Imagine, your system can “understand” your requirements. For that, first of all let’s define our requirements as the description of the target state of an object that reacts on specific events in a certain way, so as an object that comes with certain modifiable properties and a certain behavior. These are the functional requirements. Furthermore, there are other aspects such as performance, resource utilization, scalability or security, i.e. the non-functional requirements.
Certainly, the list of all possible properties and actions of an object is long, but it is finite, namely corresponding to the scope of the platforms, programming languages and libraries used. Basically it can be said that when a database provides the knowledge about terms and their meanings, then requirements can be defined based on these terms and so understood by a machine.
Therefore, an essential aspect of a smart requirement management is to manage the requirements no longer as free text, but within ontologies with an appropriate semantic support. Thus, the machine can then – in the way like the knowledge – check software requirements against consistency and integrity. Potential contradictions or incompleteness in the specification will be uncovered in the future by the machine, which previously was difficult and could be done only in often extensive dialogues between us and our customers.
In addition, requirements in future can be compared to existing knowledge and thereby even complex feasibility statements can be made in real time. Firstly, we will be able to easily verify whether certain requirements can be implemented at all. Conversely, the machine will identify and report what knowledge needs to be supplemented in order to create a solution for a given problem.
Ontologies as a basis for requirements management will therefore allow assistance systems in order to obtain self-consistent, satisfiable and measurable specifications, an essential aspect for reliable project calculations. If the underlying knowledge base then considers complexity and dependencies, security and performance aspects, it even will be possible to analyze the necessary resources as well as to make more accurate predictions of time and costs.
In company-specific knowledge bases custom object templates and defaults can be managed. Even processes can be modeled in ontologies – for example, through the integration of business process modeling (BPMN). This know-how makes it possible in the future to rapidly generate usable results even from merely coarse requirement specifications.
Is specified, for example, that a class Address consists of certain fields and that the term Manager implies the capabilities to list, create, delete and update, already with a simple request AddressManager, without further details, a first prototype can be produced. Rapid Application Development (RAD) cannot be quicker. Of course, the option to refine the rough requirements within agile processes, and thereby to customize the solution and bring it to perfection, always will remain.
Another advantage will be that change requests repeatedly approached to us in the real developer life in future can be reviewed already during their specification process in real time and checked on their impact on all aspects of the final product. Increased time and costs then can be reported immediately and transparently to your customer – processes, which previously often were difficult to predict, to measure and particularly hard to argue. Here, in the future smart technologies will support us massively.
Overall, a smart requirements management will help to move strategic decisions – for example in the choice of technologies, architectures and methodologies – more and more to the specification phase of software. Thereby, coordination processes get accelerated, risks get recognized earlier and the software development on the whole gets more transparent and more cost-effective for our customers.
Of course, this trend towards Smart Software Development also will have an impact on our everyday developer life. So it is expected that a more intelligent knowledge and requirements management leads to a certain polarization of the developer community. While one group due to the mentioned reasons dedicates oneself to the setup, the standardization and the centralization of developer knowledge, the others want to just benefit thereof and focus on the specification of business processes – so ultimately on what is the desired solution, and less on how this can be achieved. But how can this work?
Smart Agents and Machine Based Learning
Knowledge bases will classify real world objects, including software. They describe how these objects are linked among each other, their dependencies, their skills and their behavior. Processes describe algorithms and established best practices, and Machine-Based-Learning helps the machines to independently gain experience and to learn from it.
Experience gained from measurements against non-functional requirements, such as CPU and memory usage, execution speed, network load or energy consumption, as well as simulations or the processing of user feedback lead to a continuous, horizontal and vertical expansion and improvement of the knowledge base and the processes.
Knowledge and experience will ultimately allow qualified decisions on the optimal technology and architecture to solve given tasks. If these tasks are formulated semantically they can be processed by machines and connected to the knowledge. Already today, autonomous software agents are equipped with intelligent algorithms for knowledge and requirements based planning and strategy development. Mostly they cooperatively take over responsibility for finding solutions (Fig. 2).
Figure 2: Smart Solutions
Certainly, from an automatic software generation, we are still far away. In particular, first the necessary knowledge base needs to be created for this purpose. But code-parsing tools, Big Data analysis, Deep Learning technologies and Natural Language Processing (NLP) will push this process continuously.
Conclusion
Even though it might still appear in visionary distance, already today we are working with extremely useful code assistance and optimization systems to improve security, performance and stability of our software, thus increasing our competitiveness. In future, the gradual decoupling of definition and implementation processes will automate the software development further and thus exclude sources of errors, make our results more transparent and improve its quality. We should begin to prepare us and to actively participate in this trend.