Writing "Better" Code

Writing "Better" Code

Although I cannot boast that I am a hardcore programmer or a hacker for that matter, but being a Data Scientist, I code a lot to build and validate my models. The obvious languages of choice for faster prototyping in this case becomes either R or Python. At our workplace we mostly work with R. Although I find R to be very useful for faster coding and less (or no) buggy codes, but coming from a procedural language background, I find that R lacks performance when written in a procedural manner. Although R is a functional language, but there are certain things that needs to be accomplished in our day to day work that is either difficult to write with functional expression(s) or no built in libraries/packages are available in R for that task.

"Adversity is the mother of inventions", but obviously without re-inventing the wheels, we try our best to write a good piece of software. When I say that the software is"good", it does not necessarily imply that it must satisfy all the software engineering principles laid down in those  well written SWE bibles. It is one thing to write how a software must be written and completely another thing when we write it for real world usage. We do not write the procedural part of a code in R as it is quite time consuming and so we dedicate that workload to C++ using the Rcpp libraries. Following are some of the day to day observations and approaches that I found helps me to write a "better" software :

  • Use efficient algorithms and data structures. Well nothing new here. Hash, hash and hash wherever possible. Do not repeat any operation on part or parts of the input. Cache the results. Don't hesitate to use advanced data structures wherever required if it gives more than say 3x to 4x speedup.
  • Use built in libraries wherever possible. These libraries are optimized for speed and memory. E.g. std::sort for sorting or std::nth_element for median finding etc.
  • Optimize your codes as much as possible. If you are using C++ then pass pointers or addresses, const pointers to functions as arguments. This way it avoids copying of memory and hence reduces memory consumption. I use C++11, which comes with lambda expressions. Lambdas are short and crisp and also improves performance over for loops or while loops. E.g. std::for_each, std::transform etc.
  • For any matrix operations, use vectorized built in libraries. These are orders of magnitude times faster than for loops or while loops. E.g. matrix multiplication in R :  (t(m) %*% m) or perform an outer product of two matrices where instead of multiplication use OR operation : outer(u, v, function(x, y) x | y).
  • Parallelize the code wherever possible. The benefit of functional programming is most visible in the aspect of parallelism and concurrency as it mitigates shared states. If you are doing multiple independent operations on input data, then you can use mclapply in R or Message Passing Interface in C++ to distribute the operations across cores. Threading with C++ or JAVA can be useful but generally threading has lower performance over multi-core distributed computing and if number of cores is not a limitation, then definitely go for distributed computing.
  • When working with large data sets where approximate results do no harm, use approximation (probabilistic) algorithms instead of polynomial or exponential time deterministic algorithms. Probabilistic algorithms generally uses multiple hash functions to divide your operations into smaller subsets. Operation on each subset has some error bounds due to the approximations. But when multiple votes are taken, the overall probability of error comes out to be quite low. This methods are much faster and suitable for real time analytics. E.g. HyperLogLog counting, Min-Sketches, Min-Hashing, Reservoir Sampling etc.

// Writing comments on your code is just a fad.

To view or add a comment, sign in

More articles by Abhijit Mondal

  • Don't miss out on the math !!! Even if you are a programmer

    Here I will show how maths play a significant role in one of the most common mathematical operations used by many…

  • The Lean Startup of Skills Development

    Given the pace at which technological developments are progressing and a never-ending amount of things to learn and…

  • From EM to "EM"beddings

    Expectation Maximization is a quite an old tool/concept in the Machine Learning domain. Although it is an old tool but…

  • The "Cost" of my Uber Ride

    Quite often, I take the Uber Pool ride to my office in the morning hours of Bangalore's heavy traffic. Although I get a…

  • Matrix Reloaded

    With the recent speculation from Elon Musk that all of us might be living inside a computer simulation or a video game…

  • The Future lies with Decentralization, or does it ?

    Although Bitcoins and other Cryptocurrencies are hailed as the greatest revolution in financial technology, since it is…

  • How cryptocurrency taught me a better concept of "money"

    First of all let me admit that I do not have a formal economics or finance education and until sometimes back, like…

  • Approximating Big Data

    I have always held the believe that a good enough algorithm is sufficient to solve a problem in the most efficient…

  • Mathematics is Everywhere !!!

    Thankfully I had persisted my interested in mathematics and so I realized that maths is a "Beautiful Monster"…

    2 Comments
  • Too many competitors averaging the talent pool

    I live in Bangalore, the city of start-ups. Almost every "main" and every "cross" in Indiranagar and Koramangala has a…

    3 Comments

Others also viewed

Explore content categories