Static and Dynamic typing
From my experience, the best way to find bugs and defects in your software is obviously to have paying customers—a good customer can smoke out a brand new bug every other week. If you do not have customers, I've found that a four-year-old with an infinite supply of chocolate is an acceptable replacement.
In practice, detecting bugs is best left to the computer : before every push to production, at the very least, an automated process should run through the program and stop everything if a defect is found. This process is usually a mix of testing, which runs the code with various inputs and examines its behavior (unit tests, regression tests, randomized smoke tests, and so on), and static analysis, which examines the code and provides a mathematical proof that certain properties are always true (static type checking, abstract interpretation, and so on).
The Static versus Dynamic type-checking debate is mostly about where the cursor should be placed between the Testing and Static Analysis ends of this automation spectrum.
A common argument on the Dynamic side is that static analysis can only detect very simple errors—accessing a field that does not exist, or calling a function with the wrong arguments—and so unit testing is necessary to ensure, with reasonable certainty, that the more complex errors do not happen: does code react properly to unusual Unicode normalization ? Is a list filtered and ordered as it should ? Does a running total take into account only the relevant data ?
And since those unit tests must be present, the argument continues, then any errors that are caught by the static type system would also be caught by the tests, simply because the broken code would be executed and either trigger a run-time error, or fail to produce the expected result. You don't need to write « type checking » tests to catch these, it is a side-effect of your usual unit tests.
With sufficient test coverage, this argument is correct. On the other hand, the Static side argues that, with a good type system and sufficient type coverage, you can eliminate the need for most unit tests and even provide guarantees that automated testing could never achieve, such as the absence of data races.
In a real-world project, there is no such thing as « sufficient » coverage. It all boils down to the relative cost of improving test coverage and type coverage, and the expected improvement in terms of code correctness. And before you argue that everyone should have 99.99% code coverage—which is likely true—I must remind you of the difference between code coverage and path coverage:
def f(n):
if n mod 3 == 0
print("Fizz")
if n mod 5 == 0
print("Buzz")
assert.that({ f(3) }).prints("Fizz")
assert.that({ f(5) }).prints("Buzz")
These two tests provide 100% code coverage, for both definitions of code coverage: every line is executed at least once, and each branch (including the implicit empty else for each if) is executed at least once. But they provide only 50% path coverage, because there are two execution paths that are not tested:
assert.that({ f(15) }).prints("FizzBuzz")
assert.that({ f(11) }).prints("11") # Oops.
The argument of the Dynamic side—that unit tests execute unsound code and therefore detect type errors—relies on path coverage, not on code coverage, because many such errors are path-dependent. The ability of unit tests to replace static type checking is determined by the distance between code coverage and path coverage, on which the structure of the code has a strong impact.
The closer your code is to a decision tree, the smaller the divergence between path coverage and code coverage. For a true decision tree, the two are equal:
def f(n):
if n mod 3 == 0
print("Fizz")
if n mod 5 == 0
print("Buzz") # Path "15n"
else
nop # Path "3n but not 15n"
else
if n mod 5 == 0
print("Buzz") # Path "5n but not 15n"
else
nop # Path "not 3n or 5n"
Conversely, projects that rely heavily on factoring out common operations have a poor code-to-path ratio, because the factored code can be called from a variety of places. This is a very cruel situation where improving code quality reduces the ability to measure test coverage:
def getProfileUrl(profile):
url = "/profile/" + profile.id
if profile.useCustomUrl
url = url + "/" + profile.slug
return url
def renderComment(comment):
return commentTemplate(
name: comment.author.name,
url: getProfileUrl(comment.author),
text: comment.text)
def renderPicture(profile):
if profile.useCustomUrl && profile.slug == null
# Some legacy profiles use an auto-generated slug
profile.slug = slug(profile.name)
return pictureTemplate(
src: profile.pic,
url: getProfileUrl(profile))
# Unit tests:
alice = {
pic: "http://",
id: 42,
name: "Alice",
useCustomUrl: false }
bob = {
pic: "http://",
id: 13,
name: "Bob",
slug: null, # Bob is a legacy user
useCustomUrl: true }
renderComment({ text: "Test.", author: alice })
renderPicture(alice)
renderPicture(bob)
# Not covered:
renderComment({ text: "Test.", author: bob })
With the first three tests, code coverage is 100% (the tests for renderPicture provide full coverage for getProfileUrl), and yet the fourth test reveals an execution path that attempts to access an undefined slug. If you have been working with unit tests for any period of time, you have probably seen such a bug escape the notice of the automated testing suite—I've even seen this happen in C#, despite the static type-checking, because the type system cannot (yet) express that a reference is not null.
Constructs like pure map or reduce keep the path-to-code ratio low, while arbitrary loops or recursion increase it, especially when combined with complex or recursive data types.
Depending on the nature of a software project—or even of a single module—the path-to-code ratio might be heavily in favor of testing (a reasonable number of tests can cover enough execution paths to replace static type checking entirely), or heavily in favor of static analysis (which can guarantee properties across all execution paths, that could not be covered without an absurd number of test cases).
For instance, applications with many independent code silos (such as the actions in an API server or web application), or with fairly homogeneous data (such as image or sound processing) can expect reasonable coverage from automated tests alone, without having to rely on static type-checking. Applications with complex data structures (such as compilers) or with complex multi-threading patterns (such as servers) will likely benefit from using static type-checking in addition to automated tests. This is tied to the architecture of the application, rather than its purpose. A compiler written without complex data structures or algorithms does not need static analysis—though I would argue that it is hard to write a truly useful compiler in such a way.
And although my initial introduction was tongue-in-cheek, it is acceptable in many cases to declare that entire categories of defects are too costly to detect in-house and should be left for the customers to discover. A compiler's resilience to obscure bugs in processor microcode firmware, for instance, is likely not worth the effort except for top-tier projects like GCC.
For disaster recovery, we design plans against a risk model, which describes what could happen and how—should we only plan for one data center going down, or for all of Amazon Web Services to go dark ?
For security, we defend against an attacker model (or threat model). Encrypting passwords sent over the wire is enough if the attacker can only intercept network messages, but what about an attacker than can run code on the same physical machine as our application ? Or even as the same user, on the same operating system as our application ?
It makes sense that, for defect detection, we should first conceive a defect model that describes how the application can fail—a model initially designed from the experience of the team in working on similar projects, then continuously refined as code is written, new defects are found, new features are requested, and new people leave or join the team. This defect model should serve as a basis for picking the right tools of defect detection—both from the Dynamic toolbox and from the Static toolbox—but also for defect mitigation: fail-safes that prevent uncaught defects from breaking too many things, and audit logs that can trace the consequences of critical but hidden defects long after they have occurred.
Finally, since the ability for tests or static analysis to detect defects is so dependent on how the code is structured, it is only natural for developers to evolve, over many years of experience, a way to write code that is adapted to their preferred method of defect detection. After years of Java or C#, a proponent of the Static approach that jumps into JavaScript or Python is likely unaware of the many techniques that are used on the Dynamic side of the river to compensate for the absence of type checking ; someone going from Dynamic to Static would need to learn about the kinds of code that the type-checker can or cannot be convinced to accept. This sudden shift can only breed frustration for a developer switching sides, because they have the expectations and ambitions of an experienced professional and yet, in this specific case, the skills of a beginner, forced to reinvent common idioms and unable to recognize code smells, stepping blindly into undetected defects or obscure type errors.
And a frustrated craftsman blames his tools.