Building a Distributed File System with Java and TCP Sockets

💻 Diving deeper into Operating Systems & Distributed Systems Lately, I’ve been exploring core OS concepts like process management, multithreading, and synchronization—and I realized the best way to truly understand them is to build something that depends on them. So, I built my own Distributed File System (DFS) from scratch using Java 🚀 Rather than relying on high-level frameworks, I focused on understanding how real systems handle data distribution, failures, and communication at a low level. 🔧 What’s happening behind the scenes? ⚡ Socket-Based Communication Implemented direct TCP socket communication between nodes to enable fast and efficient data transfer without relying on REST APIs. 🧵 Concurrency & Thread Management Designed the system to handle multiple client requests simultaneously using thread pools and concurrent data structures, ensuring safe and efficient execution. 🛡️ Fault Tolerance & Replication Integrated a replication strategy to ensure data availability. Even if a node fails unexpectedly, the system can recover and continue serving requests seamlessly. 📡 Heartbeat Monitoring System Built a mechanism for continuous health checks of nodes, allowing the system to detect failures in real time and respond accordingly. 📊 Interactive Monitoring Interface Created a lightweight frontend dashboard to visualize file distribution and track node activity dynamically. 🧠 Key Takeaways Working on this project helped me connect theoretical OS concepts with real-world system design challenges—especially around network communication, synchronization, and fault handling. It also gave me a deeper appreciation for how large-scale systems maintain reliability under unpredictable conditions. 🔗 Project Repository: https://lnkd.in/gP-DAtj2 I’d love to hear your thoughts or feedback! #DistributedSystems #OperatingSystems #Java #BackendDevelopment #SystemDesign #ComputerScience #Networking

GitHub - niharika-1806/Distributed-File-System github.com

To view or add a comment, sign in

More Relevant Posts

Arnav Jain
4w
Report this post
Lately, my teammates and I have been diving deep into the inner workings of Operating Systems—studying process states, concurrency, and thread scheduling. Instead of just reading about these concepts, we decided to put the theory into practice by building a system that actually relies on them. I’m incredibly proud to share that our team just finished engineering a fault-tolerant Distributed File System (DFS) from scratch using Java! 🚀 Instead of relying on heavy abstractions or frameworks like Spring Boot, we wanted to understand the raw mechanics of how enterprise storage systems (like HDFS or AWS S3) manage data, network traffic, and hardware failures. Here is what we built under the hood: ⚡ Custom TCP Protocol: Bypassed REST entirely for internal node communication, utilizing raw TCP Sockets for ultra-low latency binary streaming. 🧠 Concurrent Memory Safety: Designed the Master Node using a ConcurrentHashMap and thread pools to handle asynchronous web requests with constant-time lookups and zero memory corruption. 🔄 Auto-Recovery & Fault Tolerance: Engineered a replication algorithm (Factor of 3) with background Heartbeat daemons. If a Data Node OS process is terminated mid-operation, the Master instantly detects the failure and self-heals the download using backup replicas. 📊 Real-Time Visual Dashboard: Built a decoupled, asynchronous JavaScript/HTML/CSS frontend to map file chunks and monitor live node health. Building this collaboratively forced us to navigate complex systems engineering challenges together—from breaking network socket buffer deadlocks to managing disk I/O with Java NIO. It was an incredible way to bridge the gap between OS theory and real-world distributed architecture, and I couldn't have asked for a better team to build this with! 🤝 A huge shoutout to Harkeerat Singh and Niharika Berry for the late-night debugging sessions and brilliant code contributions. If you want to see the code or run a "Chaos Monkey" test on our cluster yourself, check out the repository here: https://lnkd.in/g7pJP7Zf #SoftwareEngineering #Java #DistributedSystems #ComputerScience #Networking #BackendEngineering #Teamwork #SystemsDesign

GitHub - Harkeerat9406/Distributed-File-System github.com
Like Comment
To view or add a comment, sign in
Harkeerat Singh
4w
Report this post
I’ve been spending a lot of time recently looking under the hood of Operating Systems—studying process states, concurrency, and thread scheduling. Instead of just reading about these concepts, I wanted to put the theory into practice by building a system that actually relies on them. I just finished engineering a fault-tolerant Distributed File System (DFS) from scratch using Java! 🚀 Instead of relying on heavy abstractions or frameworks like Spring Boot, I wanted to understand the raw mechanics of how enterprise storage systems (like HDFS or AWS S3) manage data, network traffic, and hardware failures. Here is what I built under the hood: ⚡ Custom TCP Protocol: Bypassed REST entirely for internal node communication, utilizing raw TCP Sockets for ultra-low latency binary streaming. 🧠 Concurrent Memory Safety: Designed the Master Node using a ConcurrentHashMap and thread pools to handle asynchronous web requests with constant-time lookups and zero memory corruption. 🔄 Auto-Recovery & Fault Tolerance: Engineered a replication algorithm (Factor of 3) with background Heartbeat daemons. If a Data Node OS process is terminated mid-operation, the Master instantly detects the failure and self-heals the download using backup replicas. 📊 Real-Time Visual Dashboard: Built a decoupled, asynchronous JavaScript/HTML/CSS frontend to map file chunks and monitor live node health. Building this forced me to navigate complex systems engineering challenges, from breaking network socket buffer deadlocks to managing disk I/O with Java NIO. It was an incredible way to bridge the gap between OS theory and real-world distributed architecture. If you want to see the code or run a "Chaos Monkey" test on the cluster yourself, check out the repository here: https://lnkd.in/gdsA2Hwm #SoftwareEngineering #Java #DistributedSystems #ComputerScience #Networking #WebDevelopment #BackendEngineering

GitHub - Harkeerat9406/Distributed-File-System github.com
Like Comment
To view or add a comment, sign in
Rohith Kumar Karnati
1w
Report this post
A recent issue reminded me that performance optimizations can sometimes become production problems. We had an API that: 1️⃣ Fetches initial details 2️⃣ Extracts IDs from the response 3️⃣ Makes another database call to fetch larger secondary data To speed up step 3, parallel processing was introduced using a fixed thread pool. Sounds reasonable — until load testing began. Under heavy traffic, thread creation kept increasing across instances until limits were hit, leading to: ⚠️ "Can't create new native thread" The interesting part? The optimization worked for individual requests. But at scale, the resource model didn’t. A request with a small number of IDs didn’t always need dedicated worker threads, yet threads were still being allocated repeatedly under concurrent load. The fix was moving to a shared/reusable thread pool model with better resource control. 💡 My takeaway: Code that is fast in isolation may fail under concurrency. When designing for performance, it’s important to ask: - How does this behave at 1 request? - How does this behave at 1000 requests? - What resources grow with traffic? Scalability is often less about speed, more about control. #BackendEngineering #Java #PerformanceTesting #Scalability #Concurrency
Like Comment
To view or add a comment, sign in
Ujith B
2w
Report this post
🚀 Spring Boot Tip for Faster Applications One simple improvement that can make a big difference in Spring Boot applications is database connection pooling. Instead of opening a new database connection for every request, Spring Boot uses HikariCP by default to manage connections efficiently. Why it matters: ⚡ Faster response times 📉 Reduced database overhead 🔁 Better handling of high traffic A few useful configurations: • maximumPoolSize – controls the number of connections • connectionTimeout – how long a request waits for a connection • idleTimeout – closes unused connections Optimizing database connections is a small change that can significantly improve application performance. Sometimes performance improvements don’t come from complex architecture, but from tuning the fundamentals. #SpringBoot #Java #BackendDevelopment #JavaDeveloper #Microservices #SoftwareEngineering #TechTips
Like Comment
To view or add a comment, sign in
Iago Bertoletti Ribeiro
1w Edited
Report this post
Are you still loading everything into memory? Let me ask you something. How many times have you seen this in a codebase? repository.findAll() ☠️ Looks harmless, right? Until it isn’t. That single line can: • Pull millions of records into memory • Fill your Hibernate context • Trigger massive GC pressure • And eventually… crash your application This is not a performance issue. This is an architectural flaw. Most systems don’t fail because of complexity. They fail because of unbounded data processing. You are not controlling how much data your system loads, processes, or returns. And memory… has limits. So what’s the right approach? Stop thinking: "How do I get all the data?" Start thinking: "How do I guarantee I NEVER load too much?" The solution is simple (but often ignored): • Use pagination for APIs • Use streaming for large exports • Process data in controlled chunks • Return DTOs instead of heavy entities • Set hard limits Golden rule: Never load, process, or serialize everything at once. Always paginate, stream, or limit. Most production outages I’ve seen had one thing in common: Someone assumed the data would always be small. It never is. #SoftwareArchitecture #Java #SpringBoot #DistributedSystems #SoftwareEngineering #DesignPatterns
3 Comments
Like Comment
To view or add a comment, sign in
Khamdam Saparov
5d Edited
Report this post
One of the most overlooked performance killers in backend systems: Excessive Logging Many applications have clean architecture, optimized queries, and scalable infrastructure — yet still suffer from performance loss because of excessive logging in frequently executed flows. Common examples: • Logging inside loops processing thousands of records • Debug logs with expensive string construction • Serializing large objects only for logging • Writing too many synchronous logs under load Simple view: Request Processing Time Business Logic = 120 ms Database = 80 ms Logging Overhead = 95 ms Total = 295 ms Better approach: • Use parameterized logging (log.info("User {}", id)) • Avoid logs inside heavy loops • Use async logging where appropriate • Keep DEBUG logs disabled in production • Log signals, not noise Lesson: Sometimes the system is slow not because of the database or business logic — but because we are logging too much. Good logging helps production. Bad logging becomes production load. #Java #SpringBoot #BackendDevelopment #Performance #Logging #SeniorDeveloper #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Ana Lau Gutiérrez
3w
Report this post
🧠 A small tip when designing REST APIs When designing REST APIs, it’s very common to return entities directly from the database. However, this can create problems such as: ⚠️ Exposing internal fields ⚠️ Tight coupling between API and database models ⚠️ Serialization issues with lazy-loaded relationships 🚀 A better approach is using DTOs (Data Transfer Objects). Example: Instead of returning the entity: Order Return a response model like: OrderResponse This helps you: ✅ Control exactly what the API exposes ✅ Keep your domain model clean ✅ Avoid serialization problems 💬 Do you usually return entities directly, or do you prefer using DTOs? #Java #SpringBoot #RESTAPI #BackendDevelopment #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Aaron Barczewski
2w
Report this post
I recently worked on a system that needed to run migrations across thousands of tenant databases. My first instinct was to add multithreading. It didn’t help. The issue wasn’t concurrency—it was where I was applying it. I was scaling threads, but not scaling the system. That experience forced me to step back and rethink how I approach high-throughput systems—especially when it comes to identifying bottlenecks and understanding tradeoffs. I ended up redesigning the system using a queue-based model with a worker pool, which significantly improved throughput. But more importantly, it changed how I think about system design. I wrote a short breakdown of the problem, the mistake, and the principles I took away: https://lnkd.in/gQU5SP86 Curious how others approach throughput bottlenecks—what’s a system design mistake that changed how you think?

Designing High-Throughput Systems: Tradeoffs, Patterns, and Practical Lessons — Aaron Barczewski aaron-barczewski.com
Like Comment
To view or add a comment, sign in
Alex Chevrier
2w
Report this post
I've been building a distributed messaging system from scratch in Java 25. This week I measured a 655ms GC pause in production-like conditions. Here's what I found and how I fixed it. The problem JFR showed 43 G1 evacuation pauses over 100 seconds, growing monotonically from 3ms to 655ms. Two root causes: 1- ByteBuffer.allocate(4 + 8 + N) on every single append — short-lived allocations flooding Eden, triggering frequent evacuations 2- HashMap<Long, Long> holding every offset→position mapping on-heap — boxing overhead accumulating in Old gen, driving remembered set scan cost higher on every mixed GC The fix Panama's MemorySegment API: - Pre-allocate a 12-byte header slab off-heap (Arena.ofConfined()). Reuse it on every append. Zero allocation per write. - Replace the HashMap with a flat off-heap array: (maxSegmentSize / 12) * 16 bytes, pre-allocated once per segment. Entries are naturally sorted by offset, so lookup is binary search. - Scatter-gather write via FileChannel.write(ByteBuffer[]) — one syscall, header + payload, no copy. The result (JMH SampleTime, 1.5M samples) - p99.99: 306µs → 70µs (−77%) - p100: 40ms → 12ms (−70%) - Max GC pause: 655ms growing → 2ms flat No GC tuning flags. No hardware change. Just moving objects off the heap that had no business being there. -> Code available at: https://lnkd.in/gfANhFti Step 3 (zero-copy reads via FileChannel.transferTo()) next. #Java #DistributedSystems #PerformanceEngineering #JVM #OpenJDK

GitHub - alchevrier/distributed-messaging: From-scratch distributed messaging system — append-only log, binary TCP over NIO, MurmurHash3 partitioning, and hand-rolled Raft consensus with leader election, log replication, and integration tests. Phase 5: off-heap storage + JMH benchmarks on bare-metal Linux. github.com
Like Comment
To view or add a comment, sign in
Thrilokesh Lakshmisetti
1w
Report this post
🚀 Deep Internal Flow of a REST API Call in Spring Boot 🧭 1. Entry Point — The Gatekeeper DispatcherServlet is the front controller. Every HTTP request must pass through this single door. FLOW: Client → Tomcat (Embedded Server) → DispatcherServlet 🗺️ 2. Handler Mapping — Finding the Target DispatcherServlet asks: “Who can handle this request?” It consults: * RequestMappingHandlerMapping This scans: * @RestController * @RequestMapping FLOW : DispatcherServlet → HandlerMapping → Controller Method Found ⚙️ 3. Handler Adapter — Executing the Method Once the method is found, Spring doesn’t call it directly. It uses: * RequestMappingHandlerAdapter Why? Because it handles: * Parameter binding * Validation * Conversion FLOW : HandlerMapping → HandlerAdapter → Controller Method Invocation 🧭 4. Request Flow( Forward ): Controller -> Service Layer (buisiness Logic) -> Repository Layer -> DataBase 🔄 5. Response Processing — The Return Journey Now the response travels back upward: Repository → Service → Controller → DispatcherServlet -> Tomcat -> Client. ———————————————— ⚡ Hidden Magic (Senior-Level Insights) 🧵 Thread Handling * Each request runs on a separate thread from Tomcat’s pool 🔒 Transaction Management * Managed via @Transactional * Proxy-based AOP behind the scenes 🎯 Dependency Injection * Beans wired by Spring IoC container 🧠 AOP (Cross-Cutting) * Logging, security, transactions wrapped around methods ⚡ Performance Layers * Caching (Spring Cache) * Connection pooling (HikariCP) ———————————————— 🧠 The Real Insight At junior level i thought: 👉 “API call hits controller” At senior level i observe: 👉 “A chain of abstractions collaborates through well-defined contracts under the orchestration of DispatcherServlet” #Java #SpringBoot #RestApi #FullStack #Developer #AI #ML #Foundations #Security
Like Comment
To view or add a comment, sign in

120 followers

6 Posts

View Profile Connect

Building a Distributed File System with Java and TCP Sockets

More Relevant Posts

Explore related topics

Explore content categories