A Philosophical Essay On the Important Differences Between Real Thinking and Doing A Great Impression Of It
First, I should say up front that I am not by any stretch of the imagination, a “data scientist”--I've just had the remarkably good fortune of getting to work closely with many of them. Until I left the company last year, I was a senior certified Managing Consultant and Master Inventor at IBM, where among other things, I led the development of more than 20 SDLC process related patents that are largely derived from an amazingly comprehensive, descriptive, and accurate (as good as 99+%) way of modeling and then predicting both the quantity as well as the qualitative aspects of software defects that will occur in any future test phase and/or in production of any given software product or system.
This very detailed model, developed and very carefully statistically validated over the course of decades in thousands of SDLC projects at IBM Research (long before AI/Watson was even a significant glint in IBM’s eye) was pioneered by a brilliant woman named Kathy Bassin, who aptly named it The Butterfly Model. It was a nod to Chaos Theory, as at the time, even attempting to develop a comprehensive classification and analysis methodology capable of delivering these kinds of predictive results across any SDLC project regardless of project size, complexity, and/or platform/technology, was simply thought to be an impossible nut to crack across the majority of the testing industry. Sadly, The Butterfly Model never received the level of attention it deserved, and today, with none of the experts still at the company, it is little more than an interesting IT artifact there that no one remaining can practically use….a kind of real world embodiment of the classic philosophical question: if a tree falls in the woods and there is no one to hear it, does it really make a sound?
But for about a decade before Kathy Bassin and I both left IBM, we worked relentlessly to champion it, which resulted in a number of very successful and unique services IBM continues to deliver globally that are related to pieces of it, albeit in simpler and less dazzling forms (but still head and shoulders above anything else available in the industry in terms of descriptive analytics on defects). Examples include IBM’s “Defect Reduction Method”; “Defect Analysis Starter” and their “Test Process Optimization Workbench” modeling software. But none of these forms truly replicated The Butterfly Model in terms of its predictive accuracy and descriptive power.
One of the major reasons for this had to do with IBM’s difficulty in training and then maintaining the skill among practitioners. Many people far smarter than me seemed to have a lot of trouble understanding the model at a big picture level, and Kathy and I were always somewhat at a loss trying to understand why. She and I rarely had the time to talk about our personal backgrounds in our working relationship, but many others at the company had noticed that there was something special/unusual about the way we were able to talk to each other around the concepts and how they could be applied.
Eventually, we stumbled across the answer when we finally talked about our personal educational backgrounds with each other one day. Given our positions at IBM, our degree paths did not (at least in any apparent way) lend themselves to such a discussion. On that day, I worked up the nerve to finally say out loud to her what I’d been thinking for a long time, which was: The Butterfly Model looks to me like a musical theory way of understanding software. As it turned out, I wasn’t crazy after all as I learned that day--we were both very serious students of classical music.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It probably seems almost quaint today in AI/data mining contexts like this one on LinkedIn to talk about music as it relates to these topics, but it is worth noting that in the earliest days of AI exploration, classical (western based) music was one of the first areas in which publicly consumable AI experimentation using pattern recognition techniques began. Anyone studying this body of knowledge in depth can quickly understand why: music as it is defined in the west, has an enormously complex and rich base of information that is organized *extremely* algorithmically at every single level of its construction—but at the same time, the resulting possibilities in the form of a “composition” are literally endless.
Furthermore, the algorithmic rules of music (largely captured by two enormous subareas of theory known as the “harmonic” study of music--which can be thought of as the vertical rules and algorithms that occur in a single point in time, and the “melodic” study of music--technically encoded in something called “16th century species counterpoint” and representing the horizontal rules and algorithms that occur over multiple points in time) can be manipulated in ways that will reliably produce a composition that, as an example, sounds (at least to the untrained ear) as if it was composed by J.S. Bach (or any other particular composer and/or musical genre/timeframe).
As early as 1965, in a groundbreaking example, Ray Kurzweil dramatically illustrated the possibilities on Steve Allen’s “I’ve Got a Secret” television program by performing on the piano a composition that was created by a computer he designed. This work represented some of the earliest publicly accessible applications of pattern recognition techniques in computing.
So, the first idea I’d like to posit in this forum is that for robust AI/pattern recognition work today in any context, when you think about resource diversity and the right mix of skills for creating predictive modeling teams, it’s a good idea to look at more than just people with very strong mathematical/statistical “data scientist” educational backgrounds. Obviously, data scientists have to represent the backbone of any AI endeavor today to realistically and accurately operationalize any ideas. But, don’t overlook the contributions of people who come into this field from unusual places or origin, such as from the serious in-depth study of classical music. The algorithmic knowledge and deeply intuitive pattern recognition skills of these individuals will bring three-dimensionality to predictive modeling work that people without this kind of hands-on experience will typically find difficult to replicate.
Second, I’d like to suggest that the downside to NOT diversifying AI/pattern recognition teams in terms of background and skills will typically result in models that fail to anticipate the kinds of things we often come to believe are really only discoverable after trial and error hypothesis testing. Certainly, the scientific method in this regard cannot be minimized and should control overall, but by the same token, we need to remain cognizant of the fact that trial and error in the AI/pattern recognition and machine learning contexts do in fact come with certain real world dangers that are more frequently than not playing out in the form of defects in software products we as an industry (in my opinion) have become far too complacent in accepting.
An excellent recent example of this was the outcome of the NHTSA’s Tesla investigation, which can be seen as a stunning reversal of precedent with respect to consumer product safety laws in many ways, given the findings that the autopilot’s failure to recognize and differentiate the side of a semi-truck from the surrounding sky was not found to be a “defect” in the autopilot product design or implementation.
Third, I think discussions among professionals working in the IT space at any level where software defects are a concern (and I’m hard pressed to think of any area of IT where they are not) need to become more philosophical as we continue to rapidly break mind-bending barriers across all kinds of technological fronts well represented by the rise of the IoT and "smart" technologies. We are in a place a wise person I once knew liked to call “getting ahead of our skis”.
In that respect, I’d like to summarize why I find myself firmly on John Searle’s side with respect to his now famous The Chinese Room thought experiment, summarized here and excerpted below:
“Suppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a computer program, produces other Chinese characters, which it presents as output. Suppose…that this computer performs its task so convincingly that it comfortably passes the Turing test: it convinces a human Chinese speaker that the program is itself a live Chinese speaker. To all of the questions that the person asks, it makes appropriate responses such that any Chinese speaker would be convinced that they are talking to another Chinese-speaking human being.
The question Searle wants to answer is this: does the machine literally “understand” Chinese? Or is it merely simulating the ability to understand Chinese?”
Although in our professional lives, we don’t typically invest a lot of time talking about the implications of this thought experiment….those implications are already playing out in real world applications like autonomous cars, where the distinction between “understanding” and “simulating understanding” becomes much more meaningfully clear.
Recommended by LinkedIn
I am particularly sensitive to these considerations because in many respects, at the highest philosophical level—I deeply identify with IBM’s Watson—perhaps the world’s most recognizable AI entity in personified form today. Not with Watson’s creators—but with Watson itself. Because regardless of the abstract mathematical methodologies driving how AI does work today and will work in the future—they are all still arguably at best merely doing an impression of how to think, which is not at all the same thing in real world application as thinking.
The reason for this is because of my very unusual background with respect to classical music training, which stems from the fact that I am completely tone deaf. This means, I cannot hear and therefore really understand the vertical “harmonic” information in the structure of music in any meaningful way as a practicing musician, and very little of the horizontal “melodic” information—even though I can understand all of it theoretically on paper when away from my instrument.
Most people quit studying classical music as intensively as I did (I began attending graduate level music programs in the summers starting at age 13) if they are found to be tone deaf—there are simply too many requirements of playing an instrument competently that require the ability to recognize and then understand the meaning of “pitch”. Many of the best musicians in fact, tend to have an innate ability called “perfect pitch” which means, they always know exactly what the name of a given frequency of sound is upon hearing it, and that serves as a primer for understanding both the vertical and horizontal structures of music algorithmically in a way that I will never be able to comprehend because of my deafness. But that doesn’t mean I can’t very reliably mimic comprehension....at least on the surface.
There are a small number of individuals like me who persevere at an instrument even though we don’t really understand the information we are processing. We can, through sheer hard work, appear to overcome this deficit to anyone who may observe us playing our instrument (which will invariably be a fixed pitch instrument like the piano, rather than something like the violin, where tone deafness and the inability to precisely control pitch would reliably drive all listeners out of the room).
Some of us are even able to fool the best experts, and I was one such individual. After years of intensively studying the piano as a child and teenager (and being told by teacher after teacher I should quit) I reached a level of proficiency by my late teens that would have met the technical entry requirements for any music performance program in the world. Ultimately, I landed a spot in one of the best programs in the country when I turned eighteen, with a highly-sought after teacher who never realized I had this kind of deafness at my audition.
But, within about 8 weeks of trying and failing to keep up with people who did not have this defect in hearing (or any other musical defect for that matter) this teacher gave me the musical equivalent of a Turing test, which I failed miserably. This resulted in me being asked to leave the program, and the program itself changing its audition procedures so that no one like me could ever slip through their doors again.
While I meet the apparent definition of a virtuoso at the piano, in that, there is no piece of music that has ever been or ever will be written for that instrument that I could not teach myself to play competently, given enough time and practice—I am still in many fundamental ways, merely doing an impression of a virtuoso pianist in exactly the same way John Searle posits in his Chinese Room thought experiment. We may on the surface appear to be identical with the thing we simulate, but we are far from truly functionally interchangeable, because under that surface veneer, there remain crucial differences between us.
For example, I would not be able to respond to off-the-cuff questions about the harmonic structure of music I can nevertheless execute as if I understand it. If a teacher told me to start playing a piece “where it changes to D minor” or “right after the VII-I cadence” I would have no idea what they were talking about relative to that piece of music (even though I intellectually know what those terms mean outside of any specific musical context)—whereas my not-tone deaf counterpart would be able to quickly and easily comply with such a request. This is because my counterpart would not understand the piece of music independent from its harmonic and melodic structure (which they cannot help but hear) whereas my understanding of the piece of music would not (and could not) be built upon any such logic since I cannot identify it via sound.
The reason for these differences is founded in my counterpart’s ability to assimilate often complex sound information and correlate it directly, quickly, and correctly (in ways humans today still do not operationally understand at all, in truth) to the semantic representation of it (as illustrated by the examples “where it changes to D minor” or “right after the VII-I cadence”). Technically, I could study the information in the pages of the musical composition and eventually locate and point at the positions that match these semantic labels in the pages of the music directly, but I would not be able to locate and play at the positions that match these semantic labels by memory (no matter how much time I was given) as my counterpart would.
For someone like me to attempt this, it would be like trying to memorize a long passage in Shakespeare’s Hamlet by learning and memorizing each sequential letter in each sequential word of the passage, and then using this information to attempt to locate “the part where Hamlet holds Yorick’s skull in his hand”. While one could devise a process to do this (and readily encode a machine to do it), it would in no way realistically map to the processes that are actually used in what humans call thinking relative to that task. It is instead, a mechanical workaround to get us to the same result via a completely different process, along the lines of Searle's Chinese Room.
That is not to say that in order to be good or useful, AI needs to replicate thinking in the same way that humans do...but it is critical in my opinion to understand there are important implications of falling short of that standard in any number of real world contexts where AI is being used today to simulate human thinking.
Consider the fundamental difference between “intelligent” cars and the intelligent human occupant: the former understands programmatically all of the “rules of the road” such that it can perform discrete driving tasks presumably as well--if not better--than its human occupant (or potentially, better than any human occupant, which provides a good incentive overall to build intelligent cars in the first place).
But, as evidenced recently by ethical hackers and Chrysler’s Jeep self-driving cars, there are security exposures that can readily result in rapid disaster: despite Chrysler closing gaps discovered on earlier tests, hackers have demonstrated that they can still remotely “take over” one of these vehicles by providing malicious inputs to the car’s decision making model/processing that could, for example, manipulate the steering and/or brakes inappropriately for conditions when the car is traveling at any speed—a scenario that is obviously life threatening for any occupant as well as those who may have the unfortunate experience of coming into contact with it under those circumstances.
Even though these autonomous cars are incredibly powerful today at pattern recognition tasks as they relate to environmental inputs, they still fail—profoundly, in fact—to comprehend when circumstances have changed in a way that are inherently life threatening for the human occupant. In contrast, the human, unlike the intelligent machine, immediately understands in this real world Jeep example, that the behavior being invoked by the hackers is malicious, and if he could, he would take steps to eliminate and/or reverse the consequences of each malicious act as it unfolds in that car.
The intelligent car however makes no distinction between the “normal” inputs of the driving environment versus those artificially fed to it from a malicious source; in fact, it could not contemplate the possibility of a malicious source of input in its design/implementation to date, given it responded 100% compliantly to the hacker’s commands with no attempt to correct, bypass, or otherwise discontinue their inputs—let alone question them.
Even though the car is clearly “intelligent” with respect to its ability to respond to a variety of inputs correctly—maybe even better than a human could under the same conditions--it still cannot think such that it can recognize the difference between legitimate and illegitimate inputs as even the worst human driver could in the same circumstances, and in this example, a human could readily pay for that fine distinction with his or her life.
In my opinion (as an IT professional who cares deeply about software defect prevention) it is critical that our industry begin to hold the hard discussions around the limitations that come with these incredible AI capabilities. Specifically, we need to develop less tolerance for failure in product testing, instead of relying upon testing as the primary means to ferret out and then shore up any AI/pattern recognition modeling weaknesses and/or exposures.