Working and Understanding git ( Part 0)

Pratik Dam

Published Oct 14, 2015

In the beginning there were always a requirement of keeping data safe .We used cp , sync , rsync etc to do and then were CVS , RCS , VSS which were centralized . The idea was mostly collaboration driven and multiple people could have their versions . But most of the time there was a lot of network activity . Enter git

It gives away every user his own version control system . And at what cost . A lot of previous version control systems used what is called Delta Storage what means every increment is stored but git uses Snap Shot Storage which means the entire thing . This may appear a huge overhead but not actually because of the data model of git .

This is a screenshot of how contemporary Version control Systems are Delta or Snapshot

Git considers all files irrespective of type as blobs and stores the size and blob along with a SHA1 sum . This blob is compressed with zlib and stored in git . So while git is taking snapshots , it continues to pack the blobs and stores them as a pack . All of this happens under the hood and what you see is a pack file . So it is not as a bad on storage as you would have imagined.

One of the main concepts in gits data model is that it is comprised of blobs(your stuff) , trees ( folders ) , commits(nest-able) and tags . Commit is the snapshot when you tell git to write to its database and Tag is a bunch of commits . All of these are immutable and that is what gives git the intense power of branching and merging .

We would see that later

Lets understand a bit about these entities with an example

           $ mkdir test-project
           $ cd test-project
           $ git init
           Initialized empty Git repository in .git/
           $ echo 'hello world' > file.txt
           $ git add .
           $ git commit -a -m "initial commit"
           [master (root-commit) 54196cc] initial commit
            1 file changed, 1 insertion(+)
            create mode 100644 file.txt
           $ echo 'hello world!' >file.txt
           $ git commit -a -m "add emphasis"
           [master c4d59f3] add emphasis
            1 file changed, 1 insertion(+), 1 deletion(-)

       What are the 7 digits of hex that Git responded to the commit with? It turns out that every object in the Git history is stored under a 40-digit hex name. That name is the    SHA-1 hash of the object's contents; among other things, this ensures that Git will never store the same data twice (since identical data is given an identical SHA-1 name), and that the contents of a Git object will never change (since that would change the object's name as well). The 7 char hex strings here are simply the abbreviation of such 4 character long strings. Abbreviations can be used everywhere where the 40 character strings can be used, so long as they are unambiguous.
       It is expected that the content of the commit object you created while following the example above generates a different SHA-1 hash than the one shown above because the commit object records the time when it was created and the name of the person performing the commit.
       We can ask Git about this particular object with the cat-file command.

           $ git cat-file -t 54196cc2
           commit
           $ git cat-file commit 54196cc2
           tree 92b8b694ffb1675e5975148e1121810081dbdff
           initial commit

       A tree can refer to one or more "blob" objects, each corresponding to a file. In addition, a tree can also refer to other tree objects, thus creating a directory hierarchy. You can examine the contents of any tree using ls-tree (remember that a long enough initial portion of the SHA-1 will also work):

           $ git ls-tree 92b8b694
           100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    file.txt
       Thus we see that this tree has one file in it. The SHA-1 hash is a reference to that file's data:

           $ git cat-file -t 3b18e512
           blob
       A "blob" is just file data, which we can also examine with cat-file:
           $ git cat-file blob 3b18e512
           hello world
       Note that this is the old file data; so the object that Git named in its response to the initial tree was a tree with a snapshot of the directory state that was recorded by the first commit.

       All of these objects are stored under their SHA-1 names inside the Git directory:
           $ find .git/objects

           .git/objects/info
           .git/objects/
           .git/objects/3b
           .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
           .git/objects/92
           .git/objects/92/b8b694ffb1675e5975148e1121810081dbdffe
           .git/objects/54
           .git/objects/54/196cc2703dc165cbd373a65a4dcf22d50ae7f7
           .git/objects/a0
           .git/objects/a0/423896973644771497bdc03eb99d5281615b51
           .git/objects/d0
           .git/objects/d0/492b368b66bdabf2ac1fd8c92b39d3db916e59
           .git/objects/c4
           .git/objects/c4/d59f390b9cfd4318117afde11d601c1085f241

       and the contents of these files is just the compressed data plus a header identifying their length and their type. The type is either a blob, a tree, a commit, or a tag.

       The simplest commit to find is the HEAD commit, which we can find from .git/HEAD:
           $ cat .git/HEAD
           ref: refs/heads/master

       As you can see, this tells us which branch we're currently on, and it tells us this by naming a file under the .git directory, which itself contains a SHA-1 name referring to a commit object, which we can examine with cat-file:

           $ cat .git/refs/heads/master
           c4d59f390b9cfd4318117afde11d601c1085f241
           $ git cat-file -t c4d59f39
           commit
           $ git cat-file commit c4d59f39
           tree d0492b368b66bdabf2ac1fd8c92b39d3db916e59
           parent 54196cc2703dc165cbd373a65a4dcf22d50ae7f7
           author 1143418702 -0500
           committer 1143418702 -0500

       The "tree" object here refers to the new state of the tree:
           $ git ls-tree d0492b36
           100644 blob a0423896973644771497bdc03eb99d5281615b51    file.txt
           $ git cat-file blob a0423896
           hello world!
       and the "parent" object refers to the previous commit:
           $ git cat-file commit 54196cc2
           tree 92b8b694ffb1675e5975148e1121810081dbdff
           initial commit

       The tree object is the tree we examined first, and this commit is unusual in that it lacks any parent.
       Most commits have only one parent, but it is also common for a commit to have multiple parents. In that case the commit represents a merge, with the parent references pointing to the heads of the merged branches.
       So now we know how Git uses the object database to represent a project's history:

"commit" objects refer to "tree" objects representing the snapshot of a directory tree at a particular point in the history, and refer to "parent" commits to show how they're connected into the project history.

"tree" objects represent the state of a single directory, associating directory names to "blob" objects containing file data and "tree" objects containing subdirectory information.
"blob" objects contain file data without any other structure.
References to commit objects at the head of each branch are stored in files under .git/refs/heads/.
The name of the current branch is stored in .git/HEAD

This is going to lead us to References which are like lightweight movable pointers and mutable . They point to commits so that we can branch and merge

So here the box in gray is the mutable pointer which we would use for branching and merging . Now we start changing code . Exclamation marks show change in code . Lets see what git does

Git makes new objects and just moves the Reference since it is mutable

We would cover branching and merging and workflows in another post

Supriyo Mitra 10y

Great stuff! Thanks for this post.

To view or add a comment, sign in

Working and Understanding git ( Part 0)

Pratik Dam

More articles by Pratik Dam

Others also viewed

k6_Integration with GitHub Actions_Grafana_Reporting_Publish Results

6 Common Git Problems and How to Fix Them

GitHub Actions: 6 advanced tips to supercharge your workflows (pt. 1)

Git features I didn't know exist but use everyday - a quick tour of worktrees and submodules

Git Bash commands or git commands

Important GIT Commands

Think Before You Undo: Git Revert, Reset, Restore.

Upload Jenkins secret file credential via API

Explore content categories

More articles by Pratik Dam

Some facts about Unit Testing

Diwali of a dreamer ...

Network Testing - Fault Emulation

Git hopefully useful commands

Wake Up Bangaluru .....

Online Buying , Offers and Discounts

The End of the Neanderthal

Commercializing Education In India

From Freelancing to Entrepreneurship

Bringing Orchestration and Provisioning Together(Part1)