Git internals and SHA-1

LWN reminds us that Git still uses SHA-1 by default. Commit or tag signing is not a mitigation, and to understand why you need to know a little about Git’s internal structure.

Git internally looks rather like a content-addressable filesystem, with four object types: tags, commits, trees and blobs.

Content-addressable means changing the content of an object changes the way you address or reference it, and this is achieved using a cryptographic hash function. Here is an illustration of the internal structure of an example repository I created, containing two files (./foo.txt and ./bar/bar.txt) committed separately, and then tagged:

Graphic showing an example Git internal structure featuring tags, commits, trees and blobs, and how these relate to each other.


You can see how ‘trees’ represent directories, ‘blobs’ represent files, and so on. Git can avoid internal duplication of files or directories which remain identical. The hash function allows very efficient lookup of each object within git’s on-disk storage.

Tag and commit signatures do not directly sign the files in the repository; that is, the input to the signature function is the content of the tag/commit object, rather than the files themselves. This is analogous to the way that GPG signatures actually sign a cryptographic hash of your email, and there was a time when this too defaulted to SHA-1. An attacker who can break that hash function can bypass the guarantees of the signature function.

A motivated attacker might be able to replace a blob, commit or tree in a git repository using a SHA-1 collision. Replacing a blob seems easier to me than a commit or tree, because there is no requirement that the content of the files must conform to any particular format.

There is one key technical mitigation to this in Git, which is the SHA-1DC algorithm; this aims to detect and prevent known collision attacks. However, I will have to leave the cryptanalysis of this to the cryptographers!

So, is this in your threat model? Do we need to lobby GitHub for SHA-256 support? Either way, I look forward to the future operational challenge of migrating the entire world’s git repositories across to SHA-256.

This article first appeared on my personal blog at: https://retout.co.uk/2022/06/29/git-internals-and-sha1/

To view or add a comment, sign in

More articles by Tim Retout

  • Prevent DOM-XSS with Trusted Types — a smarter DevSecOps approach

    It can be incredibly easy for a frontend developer to accidentally write a client-side cross-site-scripting (DOM-XSS)…

  • Reflections on OSSF London 2021

    On Tuesday I attended the Open Source Strategy Forum in London, which is a meeting of the Fintech Open Source…

  • GCP - Planning for the Worst

    Last month, Google Cloud published Planning for the Worst: Reliability, Resilience, Exit and Stressed Exit in Financial…

    1 Comment
  • Maglev load balancers

    Maglev is the codename of Google’s Layer 4 network load balancer, which is referred to in GCP as External TCP/UDP…

  • Google Workspace Super Admins

    I recently had cause to remind myself of Google Workspace administrator account best practices. Briefly: Set up…

Others also viewed

Explore content categories