TDD Helping Algorithm Development
Extreme Programming (XP) introduced automated testing as a first-class part of software development in 1999, and it used testing both in development and maintenance. Tests help during development, later with enhancement, and finally in understanding the code. This article shows an example of applying testing practices during the initial construction phase of an algorithm. A previous companion article describes how a complete body of regression tests aids modifications for performance (viewTest Driven Development Really Works.) Please see my GitHub repository that hosts code this article (see reference #3.) Now let’s see how testing helps during the construction phase, and is called Test Driven Development (TDD.)
Benefits of Testing
Testing enhances building functionality through feature isolation and incremental delivery, preserves application stability through regression, and facilitates debugging. TDD offers "observability” into the operation of application components so the components can be tested and modified without an entire application “context” and deployment during development cycles.
Background
I attended an excellent lecture on Test Drive Development (TDD) given by Mark Shead (see resources #1), in which he used a “Arabic to Roman Numeral” conversion algorithm to demonstrate how TDD enabled exploring code and algorithm options. He of course started with the TDD mantra of:
- Write a failing test first.
- Write code that makes the test succeed.
- Repeat.
An important goal of this approach is to keep the software very close to working all the time. This highly incremental approach works well when the algorithm being implemented is straight-forward and does not require significant abstraction or research. Given the purpose of Mark’s lecture, which was to illustrate TDD in action, this orthodox approach was reasonable.
Often however, algorithms will require significant abstraction and research during their implementation. For example, suppose we are defining the notion of distance between two “points” a and b. We might start with a pair of one-dimensional points and compute the distance between them as:
- D = ABS(Ax – Bx)
We easily write the appropriate tests for one dimension. Next we solve the problem in two dimensions (the X-Y plane), using Euclidian distance:
- D = SQRT((Ax – Bx)^2 + (Ay – By)^2)
Again, we easily write our tests and proceed to three dimensions. Soon we realize that there is a general formula for the distance between two points (a and b), for N > 0 dimensions:
- D = SQRT( SUM( (a[i] – b[i])^2, i=1 .. N ) )
Now we have to refactor both our solution and our tests to take this more general approach. A little research in advance, revealing the potential abstraction simplifying the problem, would have enabled us to write less code, and require much less refactoring to take the algorithm to the next implementation stage.
The lessons learned from this initial example are:
- You may need some research into your algorithm prior to writing any code at all.
- Refine concepts and representations prior to writing code for tests and the implementation.
- Bear in mind that algorithm exploration will often result in refactoring tests.
The Roman Numerals TDD Demo
Borrowing from Mark’s presentation, we are developing a Roman Numeral Conversion utility and concentrating on converting dates, represented as binary integers, to a string representation using Roman Numerals. We review the requirements and proposed solution during our design review. From the review, we note that:
- The Wiki reference gives a good overview of Roman Numerals (reference #1.)
- Based on reviewing above, the conversion problem is related to the change-making problem (reference #2.)
- Our colleagues noted that this problem is also similar to radix conversion, as non-unity Roman Numerals are all multiples of five.
Our application of Roman Numerals is primary intended for dates, and we place an upper limit of 3,999 that we will convert. Negative integers are disallowed. Finally, we noted a nuance in representing numbers with Roman Numerals: there is an additive notation and a subtractive notation. For example, the number four has two potential representations: “IIII” (additive) and “IV” (subtractive). In general, the subtractive representation is preferred, but traditional clock representations sometimes improperly use the additive form only.
Create a Project
We begin by creating a Maven-based eclipse project. We layout the structure and initial classes in the project:
We initially entered a little more code than the orthodox TDD approach would dictate, but we will get good value from that initial entry. Defining the interface first allows us to control how users will see the converter, and it allows us to later modify the implementation in many ways by separating API from implementation.
Our Converter interface holds definitions of the problem that defines aspects of the solution algorithm:
We begin testing by checking preconditions and validations done by the initial implementation. We do this before beginning functionality development. Our first functional test is with a simple clock time representation, and traditionally it fails. Here is our initial test class:
Algorithm Development
Reviewing the common greedy algorithm approach for making change (money), we note that the “largest coins” are used first, with remaining amounts of change likewise decreased using the next largest coin, until the entire change amount is accumulated. We apply this approach to our converter to implement the additive notation for Roman numerals. Once we have a basic Roman numeral conversion algorithm, we refine it to use the subtractive notation as well.
Simple Additive Notation
Restating the first two of four rules from reference #4 defines the additive notation:
1) If one or more letters are placed after another letter of greater value, add that amount.
- VI = 6 (5 + 1 = 6)
- XXVII = 27 (10 + 10 + 5 + 1 + 1 = 27)
- MDC = 1,600 (1,000 + 500 + 100 = 1,600)
2) A letter cannot be repeated more than three times.
- 30 = XXX (10 + 10 + 10 = 30)
- 40 = XL (50 - 10 = 40) You cannot write 40 as XXXX
We ignore the second rule for now, and the simple additive algorithm is:
The corresponding unit tests are:
Subtractive Notation
We can further reduce the remaining Arabic value to be converted using subtractive notation. One can think of this leftover amount as representing the remainder of the Arabic number after the initial representation. Here are the final three rules from reference #4 outlining the subtractive notation:
3) If a letter is placed before another letter of greater value, subtract that amount.
- IX = 9 (10 - 1 = 9)
- XL = 40 (50 - 10 = 40)
- CML = 950 (900 + 50 = 950)
4) You can only subtract powers of 10 (I, X, C).
- 95 = XCV (100 - 10 + 5 = 95)
- You cannot write 95 as VC because V is not a power of 10.
5) You cannot subtract more than one number from another number.
- 18 = XVIII (10 + 5 + 1 + 1 + 1 = 18)
- You cannot write 18 as IIXX.
We refactor our algorithm to allow us to develop the subtractive notation, and we now incorporate the rule #2 above limiting consecutive symbols to three. Two clock unit tests need to be commented out because the simple additive notation can’t use more than three consecutive symbols. Our initial refactoring, combining additive and subtractive notation, is:
We also have specific tests for the subtractive notation:
Another Implementation Approach
I can just hear Mark complaining that the iterative solution we have developed is not a “functional” solution, and that he would like to see a functional implementation. We can easily create a new functional implementation and reuse our tests to validate that it is correct. Here is a recursive solution:
We leverage our previous iterative tests as follows:
Performance
While violating many unit-test dictums, it is fun to add a little performance comparison between our implementations to the unit test. Because of the stability offered by good regression test coverage, we are able to play around with optimizations. On my machine, we have:
- ConverterImpl required 11,872.8 us for 10,000 operations.
- ConverterImplRecursive required 18,173.1 us for 10,000 operations.
The recursive solution requires about 50% more computational effort. Good testing allows us to be courageous and try different solution approaches and improve our code. You may review the performance changes in my GitHub repository (reference #3.)
Resources
- Mark Shead (https://www.garudax.id/in/markshead/), president of Xeric Corporation (http://www.xeric.net/.)
- My previous article on TDD: https://www.garudax.id/pulse/test-driven-development-tdd-really-works-donald-trummell-1c/.
References
- Wikipedia discussion of Roman Numerals: https://en.wikipedia.org/wiki/Roman_numerals.
- Wikipedia discussion of the Change-Making Problem: https://en.wikipedia.org/wiki/Change-making_problem.
- My GitHub repository hosting this article: https://github.com/DonaldET/DemoDev/tree/master/dev-topics-algorithms/dev-topics-romannumerals.
- Rules for writing roman numerals: http://www.solano.edu/academic_success_center/forms/math/Roman%20Numerals.pdf.