The Magic of Azure Functions
One of the more interesting aspects of using different components of Azure for building out solutions is that there are often multiple ways of solving the same problem. Take for example Event Hubs vs Azure Functions vs Event Grid. All 3 tools can solve the same problem. All 3 can take inputs of data, process the data according to a set of rules, and then spit out a result. However, the more I analyze the results of each one of these situations, the more I am beginning to believe Azure Functions represent a truly revolutionary way of solving these kinds of problems.
What makes Azure Functions (AF's) so great? First is the fact that AF's are (theoretically) infinitely scalable when called asynchronously. A process can spin up as many AF instances as desired, and once completed, the AF's return their data, and shut down. You are simply charged for the number of calls you make, and the amount of memory you consume in each call. The second reason AF's are so great is because they are...well...cheap! The cost of processing is like nothing I have ever seen. Hundreds of thousands of records can be processed, and the cost that results is fractions of a penny. In fact, the cost of a database feeding AF's often far exceeds the cost of AF's themselves.
So why are AF's so inexpensive? My theory is that Microsoft charges less for AF's due to the fact that AF's consume compute that is either old hardware that can no longer be used (after all Microsoft uses pre-packaged containers of compute and storage that ages, and needs to be refreshed), or "opportunistic" compute on top of infrastructure that has excess capacity that isn't easily monetized. Think of it as "scrap compute". Whatever the reason, AF's represent incredibly inexpensive compute that developers can leverage to build extremely inexpensive computational systems. As long as you can live with the time needed to spin up the function asynchronously (sometimes a second or more), live within the memory constraints of a single function call (currently 1.5GB), and deal with variable response times (eg noisy neighbor issues), you can be rewarded with the lowest cost server compute infrastructure available on the market today.
Even better, Azure Functions 2.0 now support .NET core. This means you now have the ability to use your language of choice. Since .NET core supports Python 3.6, this raises an interesting question. Is it cheaper to run data science calculations in Azure Functions, or on more traditional hadoop'esque Big Data infrastructures like HD Insight or Azure Data Bricks/Spark clusters? My next project will be to benchmark these options, but if the pattern holds with other benchmarks I have done, my guess is Azure Functions will win the day on price/performance. And this is in spite of the fact that HD Insight/Data Bricks likely runs on GPU optimized servers. No doubt each server is much faster at processing data, but there is simply no substitute for distributing your job across numerous generic ultra-low cost servers.
It will be an interesting bake-off!