.NET for Apache Spark Preview

.NET for Apache Spark Preview

That's right! You haven't read it wrong.

Shout out to the BI professionals and Data Scientists out there: have you ever considered using a Microsoft technology to interact with Spark?

Apache Spark is a unified analytics engine for large-scale data processing

Up until now, it was only possible to interact with Spark through Scala, Java, Python and R.

Before going any further, I'd like to take a moment to recognise the great work Microsoft's been doing with .NET Core. It may sound cliché, repeated, and so on, but I'm always amazed with all the possibilities .NET Core provides and most important its interoperability. I have even posted a while ago a case study about a project I did where I've leveraged .NET Core, SQL Server and R: https://www.garudax.id/pulse/case-study-sql-server-r-rodrigo-romano/

That said, the .NET for Apache is compliant with the .NET Standard, which means that you could run it either on the full version of the framework or in the .NET Core environment (Windows, Linux and MacOS).

Furthermore, as a part of the whole collaboration and community-driven effort that is an integral part of the .NET Core, the code for it is Open Source and it is available on Github https://github.com/dotnet/spark and also there is a JIRA item created by Microsoft into the Apache Spark project just waiting for your input and collaboration.

It is important to remind though that this is the first preview of the tool and compared with the Python and Scala, its performance is pretty good.

No alt text provided for this image
The chart above shows the per query performance of .NET for Apache Spark versus Python and Scala. .NET for Apache Spark performs well against Python and Scala . Furthermore, in cases where UDF performance is critical such as query 1 where 3B rows of non-string data is passed between the JVM and the CLR .NET for Apache Spark is 2x faster than Python.

So, if you'd like to read more about this incredible announcement follow this link https://devblogs.microsoft.com/dotnet/introducing-net-for-apache-spark/

Exciting new times!

To view or add a comment, sign in

More articles by Rodrigo Romano

  • Case Study: SQL Server and R

    As a consultant, very often you find yourself in a position where you'd have to decide to take the "same" path as usual…

    3 Comments
  • MVP Summit 2019

    The 2019 version of the MVP Summit was held in Microsoft headquarters from March 17th to 21st. In my opinion, the MVP…

    2 Comments

Others also viewed

Explore content categories