#34 - Apache Iceberg, Big Data, Data Architecture and Open-source with Dipankar Mazumdar

#34 - Apache Iceberg, Big Data, Data Architecture and Open-source with Dipankar Mazumdar

Have you heard of Apache Icerberg?

I received a masterclass from Dipankar Mazumdar on the state-of-the-art of data architectures, and much more..

Who is Dipankar?

Dipankar is a passionate Data Engineering and Data Science Advocate and a community-focused individual currently working at Dremio . He is also the author of “Apache Iceberg-The Definitive Guide” published with O’Reilly. Prior to this, he led the R&D Advocacy team at Qlik where he worked on key developer strategies & educated the worldwide developer community in the areas of Machine Learning/Visualization.

His mission?

His primary focus is to assist engineering teams in building and scaling robust data platforms using open-source solutions like Apache Iceberg, Apache Arrow & Project Nessie.

I truly enjoyed when he explained to me how he focuses on prioritising core concepts and focusing on business problems over technology.

I had an amazing time in this episode and I hope you will too! Please give me some feedback so that we can improve the show 😉

We discuss topics like:

  • How to find what truly drives you in the data field? Exploring different roles and fields can be a great way of finding out.
  • The learning curve is tough but worth it, remember that you over estimate what you can do in a year, and underestimate what you can do in 10 years.
  • Research and masters benefits for a deeper understanding of specific areas.
  • How can participating in the community and sharing knowledge help to build a personal brand.
  • What is a developer advocacy or data advocacy role isand how it brings together diverse skills and experiences.
  • Apache Iceberg takes advantage of a new architectural paradigm called "Lake House Architecture.
  • Apache Iceberg is utilized by major analytical organizations, including Google and Snowflake.
  • The concept of a lakehouse combines the capabilities of a data warehouse with the scalability and cost perspective of a data lake.
  • Data warehouses, typically due to their closed nature, may present limitations, hence the importance of exporting data to data lakes.
  • Open source solutions in data, just like Netflix and Apple, are becoming more prevalent, and companies need to consider total cost of ownership, maintenance, upgrades, and infrastructure when choosing an option.

🔊 Listen to this episode now!

🎙️ Podcast 👉 http://smartlink.ausha.co/let-s-talk-ai/

📹 Youtube 👉 https://www.youtube.com/@lets-talk-ai

Keep learning, keep creating, keep building, and let's have a positive impact!

Warm regards,

Thomas

Dipankar Mazumdar 🥑 is definitely your go-to for all things Apache Iceberg! Great episode, Thomas Bustos 👏

Thank you so much for having me in your amazing podcast Thomas. I truly enjoyed our genuine conversation on these array of topics✌🏻

To view or add a comment, sign in

More articles by Thomas Bustos

Others also viewed

Explore content categories