💡 𝐉𝐚𝐯𝐚 𝐒𝐭𝐫𝐞𝐚𝐦𝐬: 𝐦𝐨𝐫𝐞 𝐭𝐡𝐚𝐧 𝐣𝐮𝐬𝐭 𝐟𝐚𝐧𝐜𝐲 𝐥𝐨𝐨𝐩𝐬 If you're still using for loops everywhere, you're probably leaving readability (and sometimes performance) on the table. Java Streams bring a declarative approach to data processing — you describe what you want, not how to iterate. 🔹 𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬 Streams process data in a pipeline: Source → Collection, array, etc. Intermediate ops → map, filter, sorted Terminal ops → collect, forEach, reduce 🔹 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 List<String> names = List.of("Ana", "Bruno", "Carlos", "Amanda"); List<String> result = names.stream() .filter(name -> name.startsWith("A")) .map(String::toUpperCase) .sorted() .toList(); 🔹 𝐊𝐞𝐲 𝐦𝐞𝐭𝐡𝐨𝐝𝐬 filter() → select data map() → transform data flatMap() → flatten nested structures reduce() → aggregate values collect() → build results 🔹 𝐖𝐡𝐲 𝐢𝐭 𝐦𝐚𝐭𝐭𝐞𝐫𝐬 ✔ Cleaner and more expressive code ✔ Easy parallelization with .parallelStream() ✔ Encourages immutability and functional style ⚠️ 𝐁𝐮𝐭 𝐛𝐞𝐰𝐚𝐫𝐞: Streams are powerful — not always faster. Overusing them in hot paths can hurt performance. 👉 𝐑𝐮𝐥𝐞 𝐨𝐟 𝐭𝐡𝐮𝐦𝐛: 𝐔𝐬𝐞 𝐒𝐭𝐫𝐞𝐚𝐦𝐬 𝐟𝐨𝐫 𝐜𝐥𝐚𝐫𝐢𝐭𝐲 𝐟𝐢𝐫𝐬𝐭, 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐥𝐚𝐭𝐞𝐫. #Java #SoftwareEngineering #CleanCode #TechTips #Backend
Java Streams for Cleaner Code and Performance
More Relevant Posts
-
🚀 Day 20/100: Data Types Deep Dive – Precision, Size & Memory 📊🧠 Today’s learning focused on the science behind data storage in Java. Writing efficient code is not just about logic—it’s about choosing the right data type to optimize memory usage and performance. Here’s a structured breakdown of what I explored: 🏗️ 1. Primitive Data Types – The Core Building Blocks These are predefined types that store actual values directly in memory. 🔢 Numeric (Whole Numbers): byte → 1 byte | Range: -128 to 127 short → 2 bytes | Range: -32,768 to 32,767 int → 4 bytes | Standard integer type long → 8 bytes | Used for large values (L suffix) 🔢 Numeric (Floating-Point): float → 4 bytes | Requires f suffix double → 8 bytes | Default for decimal values 🔤 Non-Numeric: char → 2 bytes | Stores a single Unicode character boolean → JVM-dependent | Represents true or false 🏗️ 2. Non-Primitive Data Types – Reference Types These types store references (memory addresses) rather than actual values: String → Sequence of characters Array → Collection of similar data types Class & Interface → Blueprint for objects 💡 Unlike primitives, their default value is null, and they reside in Heap memory, with references stored in the Stack. 🧠 Key Insight: Primitives → Store actual values (Stack memory) Non-Primitives → Store references to objects (Heap memory) ⚙️ Why This Matters: Choosing the correct data type improves: ✔️ Memory efficiency ✔️ Application performance ✔️ Code reliability at scale 📈 Today reinforced that strong fundamentals in data types are essential for writing optimized, production-ready Java applications. #Day20 #100DaysOfCode #Java #Programming #MemoryManagement #DataTypes #SoftwareEngineering #CodingJourney #JavaDeveloper #10000Coders
To view or add a comment, sign in
-
𝐂𝐚𝐩𝐲𝐌𝐎𝐀 0.13.x 𝐑𝐞𝐥𝐞𝐚𝐬𝐞 (𝐀𝐩𝐫𝐢𝐥 2026) We’re pleased to share the latest release of CapyMOA! This release introduces new algorithms, datasets, and infrastructure improvements, alongside a number of fixes that improve reliability and usability. 𝐍𝐞𝐰 𝐒𝐭𝐫𝐞𝐚𝐦 𝐋𝐞𝐚𝐫𝐧𝐞𝐫𝐬 • PLASTIC [ECML-PKDD 2024] • LAST (Local Adaptive Streaming Tree) [SAC 2024] • DEMS [CIKM 2025] • Feature Importance for Decision Trees and Ensembles of Trees [IEEE Big Data 2019]. See the tutorial here: https://lnkd.in/eHj_xgm4 𝐎𝐧𝐥𝐢𝐧𝐞 𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐚𝐥 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 • Learning to Prompt (L2P) [CVPR 2022] • New Domain Incremental Datasets: DomainCIFAR100, RotatedFashionMNIST, RotatedMNIST 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐔𝐩𝐝𝐚𝐭𝐞𝐬 • TorchStream: renamed and extended to support both classification and regression • Improved learner templates and tooling for easier extension 𝐈𝐦𝐩𝐫𝐨𝐯𝐞𝐦𝐞𝐧𝐭𝐬 𝐚𝐧𝐝 𝐅𝐢𝐱𝐞𝐬 • Addressed some issues across the library, including prediction and Java integration • Added code coverage to strengthen testing and maintainability • Updated MOA backend and refined internal components 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 & 𝐃𝐞𝐬𝐢𝐠𝐧 • Improved documentation across key components and algorithms • Updated homepage and guidance materials • Introduced our new CapyMOA logo to the website! A big thank you to everyone who contributed to this release and CapyMOA. Heitor Murilo Gomes, Anton Lee, Nuwan Gunasekara, Yibin Sun, Guilherme Cassales, Jia Liu, Marco Heyden, PhD, Vitor Cerqueira, Maroua Bahri, Yun Sing Koh, Bernhard Pfahringer, Albert Bifet, Sekar Minati More details: https://lnkd.in/ehbmp_dF
To view or add a comment, sign in
-
🚀 Day 17 – equals() and hashCode(): A Crucial Contract Today I explored why "equals()" and "hashCode()" are so important—especially when using collections like "HashMap" or "HashSet". --- 👉 By default: - "equals()" → compares object references - "hashCode()" → generates a hash based on memory location But in real applications, we override them. --- 💡 The contract I learned: ✔ If two objects are equal using "equals()", they must have the same "hashCode()" --- ⚠️ What happens if we break this? - "HashMap" may fail to retrieve values - Duplicate entries may appear in "HashSet" - Leads to very tricky bugs --- 👉 Example scenario: Two objects look identical (same data), but: - "equals()" returns true - "hashCode()" is different 👉 Result: Collections treat them as different objects 😬 --- 💡 Real takeaway: Whenever overriding "equals()", always override "hashCode()" properly. This is not just theory—it directly impacts how collections behave internally. #Java #BackendDevelopment #HashMap #JavaInternals #LearningInPublic
To view or add a comment, sign in
-
🚀 Day 19 — Transactional Outbox, Dual-Write Safety, and Reliable Event Delivery One of the easiest ways to break a distributed system is this: ✅ Database write succeeds ❌ Event publish fails Or even worse: ✅ Event publish succeeds ❌ Database write fails At first, both operations look simple. But together, they create one of the most dangerous problems in event-driven systems: 👉 dual-write inconsistency That was my biggest takeaway from Day 19. Because in real systems, when business state changes, the event about that change also matters. And if those two do not stay aligned, downstream systems start operating on incomplete or incorrect reality. Today’s focus was: Transactional Outbox, Dual-Write Safety, and Reliable Event Delivery 📦 What I covered today 📘 ⚠️ Dual-write failure mode fundamentals 🧾 Transactional outbox pattern 🔁 Relay / poller based event publication 📤 At-least-once delivery and retry behavior 🪪 Idempotency implications for consumers ⚖️ Ordering and throughput trade-offs What stood out to me ✅ Writing to the database and broker separately is risky because either side can fail independently ✅ Transactional outbox solves this by storing state change + event intent in the same local transaction ✅ If the transaction commits, both records exist; if it aborts, neither exists ✅ Relay failures should not delete events — they should leave them pending and retry ✅ At-least-once delivery means duplicates are possible, so idempotent consumers are mandatory ✅ Monitoring outbox backlog is important because delayed publishing becomes an operational signal, not just an implementation detail I also implemented a small Transactional Outbox Simulation in Python and Java to make the concept more practical. 🛠️ ➡️ Git: https://lnkd.in/dixudzYY That helped me understand a simple but important idea: 📌 The hardest part is not publishing events 📌 The hardest part is publishing them reliably after state changes 📌 Good distributed design is often about removing small windows where inconsistency can happen This is one of those topics that looks straightforward in diagrams, but becomes much more meaningful when you think about retries, crashes, duplicate publishes, and downstream services depending on those events to stay correct. System Design is slowly becoming less about only changing state and more about making sure every important state change can be observed, published, and recovered reliably. On to Day 20 📈 #SystemDesign #DistributedSystems #BackendEngineering #SoftwareEngineering #ScalableSystems #TransactionalOutbox #EventDrivenArchitecture #ReliableEventDelivery #DualWriteProblem #Microservices #DataConsistency #AtLeastOnceDelivery #Idempotency #OutboxPattern #BackendDevelopment #CloudComputing #TechLearning #EngineeringJourney #SystemArchitecture #Java #Python #Kafka #RabbitMQ #SoftwareArchitecture #DevelopersIndia
To view or add a comment, sign in
-
Built a Spark pipeline for 3B rows. Took 4 days. Looked “fine” on paper. It wasn’t. Here are the exact errors and signals that showed up, and what they were really trying to say: 1. "java.lang.OutOfMemoryError: Java heap space" Translation: you tried to process too much data in one place. Typical causes: - Huge partitions - pandas in the middle of a Spark pipeline - reading large CSV shards into driver memory 2. "GC overhead limit exceeded" Translation: JVM is spending all its time cleaning memory, not doing work. Usually means: - memory pressure is already critical - you're close to a crash 3. Executor lost / container killed Seen as: - “ExecutorLostFailure” - “Container killed by YARN / Kubernetes” Translation: - executor hit memory limit or got OOM-killed - often caused by skew or massive shuffle 4. Driver crashes / notebook just dies No clean error sometimes. Translation: - driver ran out of memory - very common when using pandas ("read_csv") on large files 5. Slow jobs with low CPU usage No explicit error. Just pain. Translation: - I/O bottleneck - single-threaded processing hiding inside a “distributed” pipeline 6. Tiny files / too many stages / constant job triggers Symptoms: - many small writes - frequent stage execution Translation: - poor batching strategy - excessive overhead per operation 7. “Response too large” (BigQuery) Translation: - trying to pull data via API instead of exporting - wrong data movement strategy The real lesson None of these are “Spark problems”. They’re architecture problems. - Distributed system → forced through single node - Parallel pipeline → serialized with pandas - Columnar systems → converted to CSV The fix (what actually works) - Keep everything distributed end-to-end - Use Parquet, not CSV - Avoid pandas in big data paths - Repartition aggressively - Let Spark read directly from storage One line summary If your big data pipeline feels slow, somewhere in the middle, you probably turned it into a small data pipeline. And that’s where everything breaks.
To view or add a comment, sign in
-
Built a cloud-native ETL pipeline that actually scales (9B rows written in 1hr) - here’s the architecture in a nutshell: → BigQuery for high-performance querying → GCS as a staging layer → Spark for distributed transformations → Snowflake as the final warehouse Key design principles: • Each system is used for what it’s best at — no forced compromises • No single-machine bottlenecks — everything is horizontally scalable • Columnar, strongly-typed formats end-to-end for efficiency • Clean authentication across both execution layers (Python driver + JVM executors) Result: a pipeline that handles billions of rows reliably without falling apart under load. This is what happens when architecture is intentional, not accidental. #DataEngineering #BigData #ETL #CloudArchitecture #Spark #Snowflake #BigQuery
Data Consultant @ZoomInfo | 8+ Years in Data Engineering, BI & SaaS | Expert in SQL, Python, Databricks, AWS & Tableau | Proven Leadership in Building Scalable Data Pipelines & Actionable Insights | MBA-Business Analytic
Built a Spark pipeline for 3B rows. Took 4 days. Looked “fine” on paper. It wasn’t. Here are the exact errors and signals that showed up, and what they were really trying to say: 1. "java.lang.OutOfMemoryError: Java heap space" Translation: you tried to process too much data in one place. Typical causes: - Huge partitions - pandas in the middle of a Spark pipeline - reading large CSV shards into driver memory 2. "GC overhead limit exceeded" Translation: JVM is spending all its time cleaning memory, not doing work. Usually means: - memory pressure is already critical - you're close to a crash 3. Executor lost / container killed Seen as: - “ExecutorLostFailure” - “Container killed by YARN / Kubernetes” Translation: - executor hit memory limit or got OOM-killed - often caused by skew or massive shuffle 4. Driver crashes / notebook just dies No clean error sometimes. Translation: - driver ran out of memory - very common when using pandas ("read_csv") on large files 5. Slow jobs with low CPU usage No explicit error. Just pain. Translation: - I/O bottleneck - single-threaded processing hiding inside a “distributed” pipeline 6. Tiny files / too many stages / constant job triggers Symptoms: - many small writes - frequent stage execution Translation: - poor batching strategy - excessive overhead per operation 7. “Response too large” (BigQuery) Translation: - trying to pull data via API instead of exporting - wrong data movement strategy The real lesson None of these are “Spark problems”. They’re architecture problems. - Distributed system → forced through single node - Parallel pipeline → serialized with pandas - Columnar systems → converted to CSV The fix (what actually works) - Keep everything distributed end-to-end - Use Parquet, not CSV - Avoid pandas in big data paths - Repartition aggressively - Let Spark read directly from storage One line summary If your big data pipeline feels slow, somewhere in the middle, you probably turned it into a small data pipeline. And that’s where everything breaks.
To view or add a comment, sign in
-
Problem :- Two Sum (LeetCode 1) Problem Statement :- Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to the target. Assume exactly one solution exists, and you may not use the same element twice. Approach 1 :- Brute Force => Nested Loop i - Check every pair of elements ii - If nums[i] + nums[j] == target => return indices iii - Time Complexity : O(n²) class Solution { public int[] twoSum(int[] nums, int target) { for (int i = 0; i < nums.length; i++) { for (int j = i + 1; j < nums.length; j++) { if (nums[i] + nums[j] == target) { return new int[]{i, j}; } } } return new int[]{}; } } Approach 2 :- Optimal => HashMap i - Store number and its index in a HashMap ii - For each element, check if (target - current) exists iii - Time Complexity : O(n) class Solution { public int[] twoSum(int[] nums, int target) { HashMap<Integer, Integer> map = new HashMap<>(); for(int i = 0; i < nums.length; i++) { int complement = target - nums[i]; if (map.containsKey(complement)) { return new int[]{map.get(complement), i}; } map.put(nums[i], i); } return new int[]{}; } } Key Takeaway :- Instead of checking every pair, we store previously seen elements and directly find the required complement efficiently. #Java #DSA #LeetCode #CodingJourney #LearnInPublic #SoftwareEngineering #HashMap
To view or add a comment, sign in
-
-
🚀 Day 12 — Probabilistic Data Structures & Bloom Filters Not every system needs a perfect answer. Sometimes, “probably yes” and “definitely no” is more than enough - especially when you’re dealing with massive scale. That’s exactly what today was about. Understanding how systems trade a bit of accuracy for huge gains in speed, memory, and cost efficiency. 📘 What I covered today 🧠 Probabilistic Data Structures 🌐 Bloom Filters & Membership Checks ⚖️ Memory vs Accuracy Trade-offs 🚫 False Positives (but no false negatives) 🛡️ Cache Penetration Protection 🔁 Duplicate Detection & Filtering 📊 Cardinality & Sketch-based Thinking 💡 What stood out to me ✅ Exact solutions don’t always scale — approximation is a design choice ✅ Bloom filters are powerful for filtering before hitting expensive systems ✅ “Definitely NOT present” is more valuable than “maybe present” in many cases ✅ False positives increase with usage — capacity planning matters ✅ These are used everywhere: caching, databases, search systems, distributed pipelines 🛠️ Built today Implemented a simple Bloom Filter in Python & Java 📌 Add elements using multiple hash functions 📌 Check membership efficiently 📌 Estimate false positive probability Simple structure… but insanely powerful in real-world systems. ➡️ Git : https://lnkd.in/dFYQeQD6 🔥 Realization System Design is no longer just about correctness. It’s about making smart trade-offs: 👉 Speed vs Accuracy 👉 Memory vs Cost 👉 Precision vs Scalability And today made that very clear. 📈 On to Day 13. #SystemDesign #DistributedSystems #BackendEngineering #SoftwareEngineering #ScalableSystems #BloomFilter #DataStructures #ProbabilisticDataStructures #Caching #PerformanceEngineering #SystemArchitecture #TechLearning #EngineeringJourney #ComputerScience #BackendDeveloper #AIEngineering #DataEngineering #HighPerformanceSystems #LowLatency #BigData #InterviewPreparation #TechCareers #DevelopersIndia #LearningInPublic #100DaysOfCode 🚀
To view or add a comment, sign in
-
🚀 Day 23/30 – DSA Challenge 📌 LeetCode Problem – Remove Duplicates from Sorted List 📝 Problem Statement Given the head of a sorted linked list, delete all duplicates such that each element appears only once. Return the modified linked list. 📌 Example Input: 1 → 1 → 2 → 3 → 3 Output: 1 → 2 → 3 💡 Key Insight Since the list is sorted, 👉 duplicates will always be adjacent. So we don’t need extra space or hashing. 🔥 Optimal Approach – Single Traversal 🧠 Idea Traverse the list and compare: Current node Next node If they are equal → skip the next node Else → move forward 🚀 Algorithm 1️⃣ Start from head 2️⃣ While current != null && current.next != null 3️⃣ If: current.val == current.next.val 👉 Skip duplicate: current.next = current.next.next 4️⃣ Else: Move to next node ✅ Java Code (Optimal O(n)) class Solution { public ListNode deleteDuplicates(ListNode head) { ListNode current = head; while (current != null && current.next != null) { if (current.val == current.next.val) { current.next = current.next.next; } else { current = current.next; } } return head; } } ⏱ Complexity Time Complexity: O(n) Space Complexity: O(1) 📚 Key Learnings – Day 23 ✔ Sorted data simplifies problems ✔ Linked list manipulation requires careful pointer handling ✔ No extra space needed when duplicates are adjacent ✔ Always check current.next != null to avoid errors Simple structure. Clean pointer logic. Efficient solution. Day 23 completed. Consistency continues 💪🔥 #30DaysOfCode #DSA #Java #InterviewPreparation #ProblemSolving #CodingJourney #LinkedList #LeetCode
To view or add a comment, sign in
-
-
Ever tried building a “global filter API” by joining multiple datasets into a single response? Sounds simple… until it isn’t. Recently, I worked on combining data from multiple sources into one API using native SQL joins. On paper, it looked efficient — one query, one response. Reality was different. ⚠️ Challenges I faced: LEFT JOIN created duplicate and bloated rows SELECT * caused column order mismatches during DTO mapping Handling array fields from DB to Java was tricky Inconsistent data types across sources (BigDecimal vs Double, Timestamp vs LocalDateTime) Trying to map everything into a single DTO led to tight coupling The biggest pain: splitting combined query results back into meaningful structures 💡 Key learnings: Avoid SELECT * in complex joins — always map explicitly Native queries + DTO mapping = order matters more than you think One “global” response is not always a good design Sometimes, separate APIs or structured responses are cleaner and scalable Debugging mapping issues can take more time than writing the query itself In the end, what seemed like a query problem turned out to be a design problem. How do you handle multi-source joins in your APIs? 🤔 #Java #SpringBoot #BackendDevelopment #SQL #DatabaseDesign #APIDesign #Microservices #SoftwareEngineering #CodingChallenges #Developers #TechLearning #CleanCode
To view or add a comment, sign in
-
More from this author
Explore related topics
- How to Optimize Data Streaming Performance
- Coding Best Practices to Reduce Developer Mistakes
- How to Improve Array Iteration Performance in Code
- Clean Code Practices For Data Science Projects
- Stream Processing Engines
- Improving Code Readability in Large Projects
- How to Improve Code Maintainability and Avoid Spaghetti Code
- How Developers Use Composition in Programming
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Well explained. Streams shine when used for clarity and expressiveness — but like any abstraction, they should be applied thoughtfully, especially in performance-critical paths.