#34 - Apache Iceberg, Big Data, Data Architecture and Open-source with Dipankar Mazumdar

Thomas Bustos

Published Aug 2, 2023

Have you heard of Apache Icerberg?

I received a masterclass from Dipankar Mazumdar on the state-of-the-art of data architectures, and much more..

Who is Dipankar?

Dipankar is a passionate Data Engineering and Data Science Advocate and a community-focused individual currently working at Dremio . He is also the author of “Apache Iceberg-The Definitive Guide” published with O’Reilly. Prior to this, he led the R&D Advocacy team at Qlik where he worked on key developer strategies & educated the worldwide developer community in the areas of Machine Learning/Visualization.

His mission?

His primary focus is to assist engineering teams in building and scaling robust data platforms using open-source solutions like Apache Iceberg, Apache Arrow & Project Nessie.

I truly enjoyed when he explained to me how he focuses on prioritising core concepts and focusing on business problems over technology.

I had an amazing time in this episode and I hope you will too! Please give me some feedback so that we can improve the show 😉

Recommended by LinkedIn

Warriors for Your Data Lakehouse Journey - Part 1

Sunil Mandowara Maheshwari 1 year ago

Unlocking NYC Taxi Data Insights: Data Analysis with…

Nguyễn Tuấn Dương 1 year ago

Real-time Universal DataLakeHouse: Harnessing…

Soumil S. 2 years ago

We discuss topics like:

How to find what truly drives you in the data field? Exploring different roles and fields can be a great way of finding out.
The learning curve is tough but worth it, remember that you over estimate what you can do in a year, and underestimate what you can do in 10 years.
Research and masters benefits for a deeper understanding of specific areas.
How can participating in the community and sharing knowledge help to build a personal brand.
What is a developer advocacy or data advocacy role isand how it brings together diverse skills and experiences.
Apache Iceberg takes advantage of a new architectural paradigm called "Lake House Architecture.
Apache Iceberg is utilized by major analytical organizations, including Google and Snowflake.
The concept of a lakehouse combines the capabilities of a data warehouse with the scalability and cost perspective of a data lake.
Data warehouses, typically due to their closed nature, may present limitations, hence the importance of exporting data to data lakes.
Open source solutions in data, just like Netflix and Apple, are becoming more prevalent, and companies need to consider total cost of ownership, maintenance, upgrades, and infrastructure when choosing an option.

🔊 Listen to this episode now!

🎙️ Podcast 👉 http://smartlink.ausha.co/let-s-talk-ai/

📹 Youtube 👉 https://www.youtube.com/@lets-talk-ai

Keep learning, keep creating, keep building, and let's have a positive impact!

Warm regards,

Thomas

Let's talk AI

2,880 followers

+ Subscribe

Dremio 2y

Dipankar Mazumdar 🥑 is definitely your go-to for all things Apache Iceberg! Great episode, Thomas Bustos 👏

2 Reactions

Fouad Touchene 2y

A Must listen

3 Reactions

Dipankar Mazumdar 2y

Thank you so much for having me in your amazing podcast Thomas. I truly enjoyed our genuine conversation on these array of topics✌🏻

2 Reactions

Thomas Bustos 2y

🎙️ Podcast 👉 http://smartlink.ausha.co/let-s-talk-ai/ 📹 Youtube 👉 https://www.youtube.com/@lets-talk-ai

#34 - Apache Iceberg, Big Data, Data Architecture and Open-source with Dipankar Mazumdar

Thomas Bustos

Have you heard of Apache Icerberg?

Who is Dipankar?

His mission?

Recommended by LinkedIn

We discuss topics like:

Let's talk AI

2,880 followers

More articles by Thomas Bustos

Others also viewed

Architecting for the Petabyte: Lessons in Apache Iceberg Design

The Flip of Flafka

My Non-Fiction Library: Books on Data Lakehouses, Apache Iceberg, AI, and Beyond

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

DATA Pill #003: Apache Airflow at Scale, One-stop MLOps portal and more

Apache Paimon: The Foundation of a Real-Time Data Lakehouse

Data Lake or Data White Water Rafting? (Part 2)

Introduction to Data Engineering Concepts |17| Apache Iceberg, Arrow, and Polaris

Inside the Data Lakehouse: Choosing the Right Open Table Format

End-to-End Data Pipeline Series: Tutorial 2 - Storage

Explore content categories

Have you heard of Apache Icerberg?

Who is Dipankar?

His mission?

Recommended by LinkedIn

We discuss topics like:

Let's talk AI

2,880 followers

More articles by Thomas Bustos

Maps Before Opinions: The Early Stage Founder Operating System

From Community to Customers: Max Berthelot's Path to YC and Preventive Health

From Meta to YC: Anmol Sood's Philosophy on Intuition, Pain, and Building Alai

From 15 Robotics Competitions to Building Reachy Mini: Remi Fabre's Open-Source Robot Revolution

The AI Engineer Playbook: Your Path to Becoming a Top 1% Builder

From 10 to 150 Engineers: Adam Haney's Playbook for Building in the AI Era

From Quant Analyst to AI Podcast Pioneer: Kevin Smith's Journey with Snipd

From Data Scientist to AI Engineer: Alessandro Romano's Real Talk on Building Solutions

From Physics to Production AI: Miguel Otero's Mission to Cut Through the AI Hype

From 15 Years of AI Curiosity to Building the Future: Nathan Labenz's Journey Through the AI Revolution

Others also viewed

Architecting for the Petabyte: Lessons in Apache Iceberg Design

The Flip of Flafka

My Non-Fiction Library: Books on Data Lakehouses, Apache Iceberg, AI, and Beyond

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

DATA Pill #003: Apache Airflow at Scale, One-stop MLOps portal and more

Apache Paimon: The Foundation of a Real-Time Data Lakehouse

Data Lake or Data White Water Rafting? (Part 2)

Introduction to Data Engineering Concepts |17| Apache Iceberg, Arrow, and Polaris

Inside the Data Lakehouse: Choosing the Right Open Table Format

End-to-End Data Pipeline Series: Tutorial 2 - Storage

Similar topics

Data Lakes and Warehousing

Explore content categories