Language Performance Comparison

Programming language comparisons are always interesting and rife with sentiment, as is this comparison, which is inspired by a Python and C++ comparison from author Naser Tamimi. We can learn a great deal from considering his article and following his recommendations to explore our language choice options. He wrote an interesting article about considering C++ for data science problems found at How fast is C++ compared to Python.

The author provided an algorithm with “identically” coded Python and C++ implementations for which he timed the results. After comparing execution times, it was no surprise that C++ was significantly faster than Python running the algorithm. I was moderately surprised that C++ was twenty-five times faster than Python in his article. I wanted to investigate his finding and develop my own observations and recommendations. I also wanted to add the Java programming language for consideration by those seeking additional performance in Data Science computing.

The complete Python and Java source code supporting this article are available from my GitHub repository at DemoDev K-MER Algorithm. This article shows only the main algorithm sections of the code examples.

Recommendations for Language Performance Evaluation

Changing an implementation language is a significant risk and should be undertaken with more rigor than emphasized in Naser’s article. These risks inform my modifications to his recommendations. I suggest:

A. Choose the test algorithm carefully, not just any simple algorithm will properly evaluate your language comparison.

B. Make sure the algorithm is “clean” and understandable to begin your language comparison.

C. Code the implementation of the algorithm for each language just as an experienced programmer in that language would code it (e.g., “Pythonic” and idiomatically for Python.)

D. Compare how well simple optimizations are used in each target language.

Hint “C” above suggests that we make use of well-known coding paradigms in each language to represent the algorithm in that language.

Spoiler Alert: we can easily achieve a 34X speed improvement over Python. Read on to see the process and qualifying conditions associated with this finding.

A. Language Performance Testing Algorithm

Naser chose to compute K-MERs, a concept from computational genetics that is a combinatorial generation problem. He does an excellent job of presenting the problem domain in plain language, but I will simplify it even further (see a detailed explanation at Wiki K-MER explanation.)

Compute all possible strings of length k where each character of the string is drawn from the sequence ‘A’, ‘C’, ‘G’, ‘T’, with each letter representing a base. For example, using k = 2, we have sixteen possible K-MER strings with two bases:

1. AA 5. CA 9. GA 13. TA

2. AC 6. CC 10. GC 14. TC

3. AG 7. CG 11. GG 15. TG

4. AT 8. CT 12. GT 16. TT

The algorithm Naser chose to test is basically a simple “odometer” that iterates through all combinations of bases. You can see the right-most position of the K-mer in the above example cycle through [‘A’, ‘C’, ‘G’, ‘T’]. It does this for each base of the left-most character in the generated string. We will use a K-mer of length 13 in our performance timing, as did Naser, so we can compare results.

Choose Wisely

Naser has made the algorithm choice for us because it is of interest to his readers, and we will stick with the K-mer domain problem as well. We will also use the “odometer” algorithm to compare Python and Java as was done in his article.

B. Make sure the algorithm is “clean” and understandable

Naser’s sample Python program was unfortunately flawed with unnecessary code, which we cleaned up as shown below. Here is the comparison of the inner portion of the code before and after cleanup.

The cleaned-up version is also 2.5% faster than the obfuscated original version as a bonus. We will next explore coding in an idiomatic manner to best use the features of each language.

C. Code the algorithm for each language idiomatically

The test code proposed by Naser used dynamic string slicing support in Python syntax, but there is no string slicing support in Java syntax. One must use methods of a String instance to accomplish substring concatenation. We refactor the K-mer list construction component in Python and Java to use similar (but idiomatic) coding patterns in both languages (see method build_by_append.)

Naser’s code uses a function call (convert) to cycle through each base in the nucleotide sequence. We replace this function with a dictionary/map instance that is often used to represent state transitions. The map key is the current base and the associated value is the next base to use (nucleotides_rotation/nucleotidesRotation.) Python uses a dictionary to implement a map.

While standardizing idioms in each language, the refactored Python above avoids concatenating empty strings at the beginning and end of each pass. These changes combine to make our Python implementation 1.7x faster than Naser’s published Python version. Idiomatic coding has performance benefits.

Now we are testing on the same basis as the Python/C++ test in Naser’s article with this minor refactoring. We note that Java is only 7x faster than Python at this point. This Java speedup result was suspicious to me. That is because more than two decades of Java Virtual Machine development have led to close parity with C++ performance, so I would have expected Java to be 25x faster than Python as well.

A closer examination of the C++ code in Naser’s article reveals that the C++ example in his article is not dynamically creating strings and concatenating them like the Python and Java examples here. Using a statically allocated array speeds up C++ significantly.

Remove dynamic string creation

We replace dynamic string creation with static arrays for both Python and C++ implementations and considerable improvements are seen.

With both Python and Java using the same static array approach, we see that Java is thirty four times faster than Python. It is interesting to note that this minor cleanup sped up the original Python code by a factor of 2.6x.

Make use of well-known coding paradigms to represent the algorithm in the target language

Static allocation is a common approach taken by experienced coders in either language. Based on both experience and testing, a dictionary/map lookup is faster than a function invocation full of “if-then” tests and is a common representation for state transitions.

D. Compare how well simple optimizations perform in each target language

Now we apply some simple C-like optimizations to both Python and Java code implementations. We can change the representation of a base to be {0, 1, 2, 3} instead of characters {‘A’, ‘C’, ‘G’, ‘T’}. With a representation change, we next replace the dictionary/map state transition with an array lookup, which is faster than a hash lookup.

The final optimizations show the Java version is 68x faster than the equivalent Python version. We have come a long way. This optimized Java version is more than 187x faster than the original Python version from Naser’s article. This is an invalid comparison with respect to pure language performance perhaps, but is still impressive.

Performance Analysis

Here are the collected performance statistics for our three stages of optimization and testing of Python versus Java. We begin by comparison of Python and Java at each stage of refactoring:

Our first observation is that Java responds well to optimization, and becomes increasing faster than Python as we continue to tune the code. Let’s look at the improvement each stage of optimization achieves compared to the prior un-optimized version:

Our second observation is both Python and Java benefited greatly from removing dynamic memory management and using static storage. Additionally, Java tended to respond better to performance optimizations than Python. Finally, our last observation is that dynamic memory allocation is fundamentally slow, which we see from our “idiomatic comparison” above, where Java is only 7x faster than Python.

Summary

Java is significantly faster than Python in many areas and responds to tuning better than Python. Comparing language implementations for a specific algorithm requires understanding the coding idioms of each language. Finally, significant improvements are often available by applying common coding optimizations and may increase performance enough to keep the existing programming language. I hope you found this both thoughtful and interesting,

Language Performance Comparison

Donald Trummell

Recommendations for Language Performance Evaluation

A. Language Performance Testing Algorithm

Choose Wisely

B. Make sure the algorithm is “clean” and understandable

C. Code the algorithm for each language idiomatically

Remove dynamic string creation

Make use of well-known coding paradigms to represent the algorithm in the target language

D. Compare how well simple optimizations perform in each target language

Performance Analysis

Summary

More articles by Donald Trummell

Explore content categories

Recommendations for Language Performance Evaluation

A. Language Performance Testing Algorithm

Choose Wisely

B. Make sure the algorithm is “clean” and understandable

C. Code the algorithm for each language idiomatically

Remove dynamic string creation

Make use of well-known coding paradigms to represent the algorithm in the target language

D. Compare how well simple optimizations perform in each target language

Performance Analysis

Summary

More articles by Donald Trummell

TDD Helping Algorithm Development

Setting up a Python Project with Virtual Environment, PyBuilder, and PyCharm

It Needs to be Really Fast!

Space verses Time Trade-offs and Algorithm Analysis

Apache Ant and DevOps Practices

Test Driven Development (TDD) Really Works

Pilot Error and Java Autoboxing

It Just Doesn't Add Up! - Part 1

Lies, Damn Lies, and Algorithm Analysis

Explore content categories