Python Directory Sync Tool with SHA-256 Hashing and Metadata

Built a Python-based Directory Sync Tool to compare and synchronize files between two directories with reliability and control. Instead of relying only on file names or timestamps, the tool uses a combination of metadata and SHA-256 hashing to accurately detect new, modified, and missing files. Key highlights: • Recursive directory scanning with structured metadata (name, extensions, size, hash) • Efficient change detection using size-first filtering followed by hash comparison • Memory-efficient hashing using chunk-based file reading (handles large files) • Synchronization support with metadata preservation using shutil.copy2 • Safe cleanup by optionally removing extra files from the destination While building this, I focused on moving beyond a basic script and treating it like a real tool, structuring the code into clear components, improving output readability, and adding validation and error handling to make it more reliable in real use. GitHub:https://lnkd.in/gt-Ec3rF #Python #CLI #GitHubProjects #SoftwareDevelopment #LearningByBuilding #SystemsThinking

To view or add a comment, sign in

More Relevant Posts

Barathi Selvan
1w
Report this post
📁 Day 24: Mastering File Systems & Mail Merging I just automated a tedious administrative task by building a Mail Merge program that handles file I/O and string manipulation with ease! 💡 Fun Fact: The concept of "Mail Merge" dates back to the 1970s with early word processors like WordStar, saving humans millions of hours by separating the content from the contact list. 🚀 Key Features: • File Automation: Reads template files and recipient lists dynamically. • String Cleaning: Uses .strip() and .replace() to personalize content. • Scalable Output: Automatically generates and saves unique letters for every name. Explore the automation logic here: 🔗https://lnkd.in/gwdWmATu #Python #Automation #FileHandling #PythonProjects #Coding #100DaysOfCode
Like Comment
To view or add a comment, sign in
Wayne Griffin Jr
1w
Report this post
I’ve been focusing on strengthening fundamentals by building small, intentional utilities instead of relying on built-ins. Recently, I put together a Python toolkit that implements: *average calculation from scratch *max/min via iteration (no built-ins) *occurrence counting with explicit control flow *palindrome detection using a two-pointer approach *a reporting function that composes results from multiple helpers The goal wasn’t complexity—it was clarity. Writing the logic out step-by-step and debugging edge cases (like initialization bugs and boundary conditions) made a noticeable difference in how I approach problems. This kind of work is where patterns start to stick—loops, conditionals, and data flow all working together instead of in isolation.
Like Comment
To view or add a comment, sign in
Thomas Cionek
3w
Report this post
𝗧𝗵𝗲 𝗗𝗕 𝗰𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗹𝗲𝗮𝗸𝗲𝗱 𝗶𝗻 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻. 𝗧𝗵𝗲 𝗳𝗶𝗹𝗲 𝗵𝗮𝗻𝗱𝗹𝗲 𝘀𝘁𝗮𝘆𝗲𝗱 𝗼𝗽𝗲𝗻. 𝗧𝗵𝗲 𝗿𝗼𝘄 𝗹𝗼𝗰𝗸 𝗵𝘂𝗻𝗴 𝗶𝗻𝗱𝗲𝗳𝗶𝗻𝗶𝘁𝗲𝗹𝘆. These aren't edge cases—they are the inevitable result of making resource cleanup the caller's responsibility. In Python, 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗿𝘀 move that responsibility from the developer to the type itself. The resource becomes self-healing. 🔹 __exit__ is called even if an exception is raised—that is the safety guarantee. 🔹 @contextmanager lets you write the same protocol with 'yield'—no class needed. 🔹 Any resource with an acquire/release lifecycle belongs in a context manager. The 𝘸𝘪𝘵𝘩 statement isn't just syntactic sugar—it’s a contract. The caller writes business logic; the object handles the cleanup. #Python #SoftwareEngineering #BackendDevelopment #SoftwareArchitecture #CleanCode
1 Comment
Like Comment
To view or add a comment, sign in
Aarush Aggarwal
1w
Report this post
Spent 5 days chasing ghosts—DLL hell and ABI mismatches. I followed the agentic debugger down the wrong path as it hallucinated at a wrong layer because it misread the WinError 1114 as a load-path issue rather than a missing export. The actual fix was two lines. I used TORCH_LIBRARY when I needed PYBIND11_MODULE. The Architecture Gap: - Use TORCH_LIBRARY to register ops into the PyTorch C++ Dispatcher (accessed via torch.ops). It fires static C++ constructors at DLL load time but does not create a PyInit_* function. Python can't "see" it as a module. - Use PYBIND11_MODULE to generate the standard Python C Extension entry point. This generates the PyInit_{name} entry point Python needs to "see" the module. The error was literal: "dynamic module does not define module export function." No PyInit_* existed because TORCH_LIBRARY isn't meant to be imported directly. {just correcting the record} #CPP #PyTorch #SystemsProgramming #MachineLearning #barebones #3D
Like Comment
To view or add a comment, sign in
Zhi Xuan Chong
3w
Report this post
Worked on an Event Scheduler project in today’s CPSC 335 (Algorithms) class, implementing it in Python using a combination of data structures. I used a min-heap for efficient priority handling, a hash table for O(1) lookups, and a sorted list for time-based queries. This exercise was a great way to see how different data structures can be combined to balance performance and functionality, and how trade-offs play a key role in algorithm design.
Like Comment
To view or add a comment, sign in
Sateesh Sonkamble
1w
Report this post
𝗦𝗼𝗹𝘃𝗲𝗱 𝘁𝗵𝗲 𝗠𝗶𝗻𝗶𝗺𝘂𝗺 𝗪𝗶𝗻𝗱𝗼𝘄 𝗦𝘂𝗯𝘀𝘁𝗿𝗶𝗻𝗴 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝗶𝗻 𝗣𝘆𝘁𝗵𝗼𝗻 Today, I worked on the Minimum Window Substring problem and implemented it using three different approaches to understand how we can optimize code step by step. 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵𝗲𝘀 𝗨𝘀𝗲𝗱 • Brute Force Approach • Better Approach using HashMap and Count • Optimized Sliding Window Approach 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝗦𝘁𝗮𝘁𝗲𝗺𝗲𝗻𝘁 Given two strings: S = "ADOBECODEBANC" T = "ABC" The goal is to find the smallest substring in S that contains all characters of T. Output: "BANC" 𝗧𝗶𝗺𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗖𝗼𝗺𝗽𝗮𝗿𝗶𝘀𝗼𝗻 • Brute Force: O(n² × m) • Better Approach: O(n²) • Optimized Sliding Window: O(n) 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴𝘀 • How HashMaps help in tracking character frequencies • Why repeated checking inside loops makes code slower • How the Sliding Window technique improves performance • How to think from brute force to optimized solution The Sliding Window approach was the best solution because it processes the string efficiently using left and right pointers. GitHub Repository: https://lnkd.in/gPy8Kcam #Python #DSA #Algorithms #SlidingWindow #Programming #CodingInterview #LeetCode #SoftwareDevelopment #PythonDeveloper
Like Comment
To view or add a comment, sign in
Chhavi Mendiratta
2w Edited
Report this post
𝗧𝗵𝗲 𝗣𝘆𝘁𝗵𝗼𝗻 "𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲" 𝗧𝗿𝗮𝗽 I noticed something interesting today : changed a value inside a function, and it reflected outside too I didn’t return anything. I didn’t re-assign the variable ➡️ 𝗧𝗵𝗲 𝗖𝗮𝘁𝗰𝗵 : Python functions don't always create a new "𝗰𝗼𝗽𝘆" of your data Instead, they often work with 𝗮 𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 to the original object ▪️𝗠𝘂𝘁𝗮𝗯𝗹𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝘀 (Lists, Dicts, Sets) are modified in place. Any change inside the function affects the original data directly. ▪️𝗜𝗺𝗺𝘂𝘁𝗮𝗯𝗹𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝘀 (Integers, Strings,Tuples) are safe because they can't be changed in place. ➡️ 𝗧𝗵𝗲 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆 : If you’re working with Lists or Dictionaries and want to keep your original data safe, you must be explicit: update(my_list.copy()) Small detail, but missing it can lead to hours of debugging bugs #Python #30DaysOfCode #SoftwareEngineering #LearningInPublic #Day19
4 Comments
Like Comment
To view or add a comment, sign in
Varol Krallich
5d
Report this post
Refactored my Piotroski F-Score module into a fully standalone Python component today. I removed QuantConnect dependencies and redesigned the file so it can score stocks from either: 1. Local CSV fundamentals. 2. Online financial statements (via yfinance). What’s inside: - A clean PiotroskiFactors dataclass for standardized inputs. - Core PiotroskiScore logic across all 9 F-Score signals. - Input adapters: - compute_piotroski_from_csv(...) - compute_piotroski_from_online(...) Why this matters: - Portability: run it in any Python environment. - Reusability: drop it into screening pipelines, notebooks, or APIs. - Transparency: explicit factor construction and scoring logic. - Extensibility: easy to plug into broader quant workflows. This refactor is part of a broader effort to make my quant stack platform-agnostic, testable, and production-friendly. Next step: add a simple CLI and batch scoring across a universe of tickers. If you’re working on fundamental factor models, I’d love to compare approaches for handling missing/dirty statement data across providers. #QuantFinance #AlgorithmicTrading #Python #MachineLearning #TradingSystems #DataScience #RiskManagement #TimeSeries #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Rohit Tiwari
1mo Edited
Report this post
I used to think tuples were just “lists with stricter rules”… but today showed me they have their own vibe. 🐍 Day 06 of my #30DaysOfPython journey was all about tuples, and this topic made one thing really clear: sometimes the best data structure is the one that stays put. A tuple is an ordered and unchangeable collection of different data types, created using round brackets (). Today I explored: 1. Creating tuples with tuple() 2. Accessing items using positive and negative indexing 3. Slicing tuples with positive and negative indexes 4. Checking whether an item exists using in 5. Counting items with count() 6. Finding item positions with index() 7. Joining tuples using + operator 8. Converting tuples to lists with list() 9. Deleting the whole tuple using del What stood out to me today was how tuples are built for stability. They are not meant to be edited over and over again — and that actually makes them really useful when you want data to stay consistent. One more day, one more topic, one more layer of Python making sense. Github Link - https://lnkd.in/gHwugKTU #Python #LearnPython #CodingJourney #30DaysOfPython #Programming #DeveloperJourney
Like Comment
To view or add a comment, sign in
Shryesth Pandey
2w
Report this post
A 40ms API became a 4ms API. Here's the only thing that changed. We were making 3 separate DB queries to assemble a response. Each was fast in isolation. Together, they were sequential — each waited for the previous. The fix: run them concurrently. In Python (asyncio), this went from: result_a = await get_a() result_b = await get_b() result_c = await get_c() To: result_a, result_b, result_c = await asyncio.gather(get_a(), get_b(), get_c()) That's it. No caching, no infra change, no complex refactor. The mental model that helps: always ask "are these operations actually dependent on each other?" before assuming they need to run in sequence. Most API latency problems aren't hard — they're just unexamined. #BackendDevelopment #PythonAsyncio #APIOptimization #SoftwareEngineering

9 Comments
Like Comment
To view or add a comment, sign in

886 followers

98 Posts

View Profile Connect

Python Directory Sync Tool with SHA-256 Hashing and Metadata

More Relevant Posts

Explore content categories