Name: Spark Community Builds mapInArrow for Vectorized Processing | Apache Spark posted on the topic | LinkedIn
Uploaded: 2026-04-06T23:41:51.791Z
Duration: 43 s
Channel: Apache Spark
Description: Why did the Spark community build mapInArrow? 🤔 According to Apache Spark PMC member Hyukjin Kwon, the motivation was simple: enable vectorized processing of nested data without the overhead of Pandas conversion. 🔗 Watch the full breakdown of how Spark 3.3 introduced this shift: https://lnkd.in/ewpKz_tU #ApacheSpark #DataEngineering #ApacheArrow #Python

Apache Spark

25,790 followers

Why did the Spark community build mapInArrow? 🤔 According to Apache Spark PMC member Hyukjin Kwon, the motivation was simple: enable vectorized processing of nested data without the overhead of Pandas conversion. 🔗 Watch the full breakdown of how Spark 3.3 introduced this shift: https://lnkd.in/ewpKz_tU #ApacheSpark #DataEngineering #ApacheArrow #Python

Transcript

The motivation behind it, that mirror was that this, to be absolutely honest, it was to enable vectorized processing of nested data. There's another API called the map in Pandas that can do most of the things that map in Iowa can do. Initially I thought that the needs of dealing with this kind of nested data is not very large. And then in the end, yes, like I was like convinced, like, OK, like a nested data dive is actually pretty common. And then like in production is pretty important. My impression is that the arrow is more like a developer API. So OK, then why don't we have it just one API? Matty narrow because they in theory that will cover all the use cases right. We added and then surprisingly the community really liked it. So I was also surprised that this used a lot in like a different like ecosystems to directly integrate it with arrow.

To view or add a comment, sign in

More Relevant Posts

Sandipamu Pujitha
2w
Report this post
Day 25 of my learning journey ✅ Today’s topic: Python OOP Concepts Focused on: → Constructor vs Normal Method → Types of Variables in Python Classes Key takeaways: 1. `*init*` runs automatically when an object is created, while normal methods run only when called. 2. Instance variables are unique to each object → `self.name` 3. Static variables are shared by all objects → `college_name = "GQT"` 4. Local variables exist only inside a method Understanding these basics makes class design and memory management much clearer. #LearningJourney #Day25 #Python #OOP #Constructor #Variables #Coding #GlobalQuestTechnologies
Like Comment
To view or add a comment, sign in
Manjula K
1w
Report this post
🚀 Day 11/30 – Python Challenge Exploring sets in Python and how they handle unique data! 🐍 🔹 Key Concepts Covered: * Creating sets * Understanding that sets store only unique values * Adding elements using add() * Iterating through set elements 💻 Mini Task: Created a set of numbers, observed how duplicate values are automatically removed, added a new element, and displayed all values using a loop. 🎯 Learning Outcome: Learned how sets are useful for storing unique data and performing operations where duplicates are not needed. Understanding different data structures step by step 🚀 #Python #CodingChallenge #LearningJourney #DataStructures #StudentDeveloper #Day11
Like Comment
To view or add a comment, sign in
Harshit Maheshwari
3w
Report this post
🔁 Python Revision – Functions Continuing my Python learning journey 🐍 In this session, I focused on: ✔️ Functions and their importance ✔️ Function definition and calling ✔️ Parameters and return values ✔️ *args and **kwargs basics Functions are helping me write more structured, reusable, and clean code instead of repeating logic. A big thanks to Krish Naik for his amazing content and clear explanations, which made these concepts much easier to understand 🙌 Documented my practice in a Jupyter Notebook and shared it as a PDF to track my progress. Now moving towards applying these concepts in real-world problems 📊 Next: working with real datasets and mini projects 🚀 #Python #Functions #LearningJourney #DataAnalytics #Coding #KrishNaik
Like Comment
To view or add a comment, sign in
Dracco Research
2w
Report this post
🚀 DAY 8 – DRACCO Python Course Today, we dive into Data Structures (Lists) 🧠 Learning how to store and manage multiple values in one place. From simple data to structured collections, this is a big step forward. Understanding lists opens the door to more powerful Python programs. We’re building real skills, one step at a time 💡 #Python #LearnPython #DraccoResearch #CodingJourney
Like Comment
To view or add a comment, sign in
Pranav Agarwal
3w
Report this post
If you get this right… your Python fundamentals are solid 😏 Python Series — Day 4 🧠 This one has confused a lot of developers (including me once 👀) What will be the output? def add_item(item, lst=[]): lst.append(item) return lst print(add_item(1)) print(add_item(2)) print(add_item(3)) Options: A. [1] [2] [3] B. [1] [1, 2] [1, 2, 3] C. Error D. Something unexpected 👀 Don’t rush this one. 👉 Think: Is the list created every time… or reused? Drop your answer 👇 Let’s see who gets it right 🔥 Answer tomorrow 🚀 #Python #CodingChallenge #LearningInPublic #Tech #DataEngineering
Like Comment
To view or add a comment, sign in
Projects Based Learning

3,163 followers
6d
Report this post
🔍 Identify glass types with this Machine Learning project using Apache Spark! 🚀 https://lnkd.in/dcE8ZTCk #MachineLearning #ApacheSpark #DataScience #BigData #Programming #Python #100DaysOfCode
Like Comment
To view or add a comment, sign in
Harshit Maheshwari
3w
Report this post
🚀 Python Practice – Function Examples Taking my Python learning a step further by practicing real-world function-based problems 🐍 In this session, I worked on: ✔️ Temperature Conversion (Celsius ↔ Fahrenheit) ✔️ Password Strength Checker ✔️ Shopping Cart Total Cost Calculator ✔️ Palindrome Checker ✔️ Factorial using Recursion These examples helped me understand how functions can be used to solve practical problems and write reusable, structured code. A big thanks to Krish Naik Sir for his amazing teaching and clear explanations 🙌 Documented all my practice in a Jupyter Notebook and shared it as a PDF to track my progress. Learning by building real logic step by step 📊 #Python #Functions #Practice #LearningJourney #DataAnalytics #Coding
Like Comment
To view or add a comment, sign in
Harshit Maheshwari
2w
Report this post
🚀 Python Practice –Matplotlib Continuing my Python Practice journey 📊🐍 In this session, I explored different types of plots and when to use them: ✔️ Line Chart – for trends over time ✔️ Bar Chart – for comparing categories ✔️ Histogram – for distribution of data ✔️ Pie Chart – for showing proportions ✔️ Scatter Plot – for relationship between variables Understanding the right type of chart helps in presenting data more clearly and making better decisions. Learning not just how to visualize data, but when to use each visualization effectively 💡 A big thanks to Krish Naik for his amazing teaching and guidance 🙌 Excited to apply these visualizations in real-world data projects 🚀 #Python #DataVisualization #Matplotlib #DataAnalytics #LearningJourney #Coding #KrishNaik
Like Comment
To view or add a comment, sign in
Manjula K
2w
Report this post
🚀 Day 10/30 – Python Challenge Learning about tuples in Python today! 🐍 🔹 Key Concepts Covered: * Creating tuples * Accessing elements using index * Iterating through tuple elements using loops * Understanding immutability (tuples cannot be changed) 💻 Mini Task: Created a tuple of numbers, accessed the first element, and used a loop to display all the values. 🎯 Learning Outcome: Understood how tuples are used to store fixed collections of data and how they differ from lists. They are especially useful when data should not be modified. Building a strong foundation in data structures step by step 💪 #Python #CodingChallenge #LearningJourney #DataStructures #StudentDeveloper #Day10
Like Comment
To view or add a comment, sign in
Manjula K
1w
Report this post
🚀 Day 15/30 – Python Challenge Learning file handling in Python! 🐍📂 🔹 Key Concepts Covered: * Opening files using open() * Writing data to a file ("w" mode) * Reading data from a file ("r" mode) * Closing files properly 💻 Mini Task: Created a file, wrote some text into it, and then read the content back and displayed it. 🎯 Learning Outcome: Understood how Python can interact with files, which is essential for storing and retrieving data in real-world applications. Getting closer to real-world programming step by step 🚀 #Python #CodingChallenge #LearningJourney #FileHandling #StudentDeveloper #Day15
Like Comment
To view or add a comment, sign in

25,790 followers

View Profile Follow

Transcript

More Relevant Posts

Explore content categories