Working and Understanding  git ( Part 0)

Working and Understanding git ( Part 0)

In   the beginning there   were always a    requirement  of  keeping data safe .We  used  cp  , sync , rsync etc  to    do  and then  were CVS  ,  RCS , VSS which    were  centralized . The  idea    was   mostly   collaboration driven   and   multiple  people   could   have  their  versions . But  most of  the   time there   was  a  lot of  network  activity . Enter  git 

It   gives    away every user   his  own  version control system .  And  at  what   cost  . A lot of   previous    version control systems  used   what is  called   Delta   Storage  what means   every increment  is    stored   but  git   uses  Snap Shot Storage  which means   the entire thing . This may appear  a  huge overhead   but not   actually because  of the  data  model of git  .

This  is  a  screenshot of  how  contemporary  Version control Systems  are Delta  or Snapshot

 

Git  considers all     files   irrespective of type  as  blobs and  stores the  size  and  blob along  with  a  SHA1 sum . This blob is    compressed  with   zlib and stored  in   git  . So  while    git  is taking  snapshots  ,  it  continues  to  pack  the  blobs   and  stores  them as a   pack  . All of  this  happens  under the  hood  and what  you  see  is  a  pack  file   .  So it  is not as a  bad  on storage  as   you  would have imagined.

 

One  of the  main  concepts in  gits  data  model   is  that   it is  comprised  of  blobs(your stuff) , trees ( folders ) , commits(nest-able)   and  tags  . Commit  is   the   snapshot  when you tell git   to  write  to its  database  and Tag  is a  bunch of  commits . All  of  these   are   immutable and  that  is  what   gives   git   the  intense power  of  branching  and  merging .

We would see   that later

Lets  understand a    bit  about these   entities with an  example

           $ mkdir test-project
           $ cd test-project
           $ git init
           Initialized empty Git repository in .git/
           $ echo 'hello world' > file.txt
           $ git add .
           $ git commit -a -m "initial commit"
           [master (root-commit) 54196cc] initial commit
            1 file changed, 1 insertion(+)
            create mode 100644 file.txt
           $ echo 'hello world!' >file.txt
           $ git commit -a -m "add emphasis"
           [master c4d59f3] add emphasis
            1 file changed, 1 insertion(+), 1 deletion(-)

       What are the 7 digits of hex that Git responded to the commit with? It turns out that every object in the Git history is stored under a 40-digit hex name. That name is the    SHA-1 hash of the object's contents; among other things, this ensures that Git will never store the same data twice (since identical data is given an identical SHA-1 name), and  that the contents of a Git object will never change (since that would change the object's name as well). The 7 char hex strings here are simply the abbreviation of such 4 character long strings. Abbreviations can be used everywhere where the 40 character strings can be used, so long as they are unambiguous.
       It is expected that the content of the commit object you created while following the example above generates a different SHA-1 hash than the one shown above because the commit object records the time when it was created and the name of the person performing the commit.
       We can ask Git about this particular object with the cat-file command.


           $ git cat-file -t 54196cc2
           commit
           $ git cat-file commit 54196cc2
           tree 92b8b694ffb1675e5975148e1121810081dbdff
           initial commit

       A tree can refer to one or more "blob" objects, each corresponding to a file. In addition, a tree can also refer to other tree objects, thus creating a directory hierarchy. You can examine the contents of any tree using ls-tree (remember that a long enough initial portion of the SHA-1 will also work):

           $ git ls-tree 92b8b694
           100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    file.txt
       Thus we see that this tree has one file in it. The SHA-1 hash is a reference to that file's data:

           $ git cat-file -t 3b18e512
           blob
       A "blob" is just file data, which we can also examine with cat-file:
           $ git cat-file blob 3b18e512
           hello world
       Note that this is the old file data; so the object that Git named in its response to the initial tree was a tree with a snapshot of the directory state that was recorded by the  first commit.

       All of these objects are stored under their SHA-1 names inside the Git directory:
           $ find .git/objects

           .git/objects/info
           .git/objects/ 
           .git/objects/3b
           .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
           .git/objects/92
           .git/objects/92/b8b694ffb1675e5975148e1121810081dbdffe
           .git/objects/54
           .git/objects/54/196cc2703dc165cbd373a65a4dcf22d50ae7f7
           .git/objects/a0
           .git/objects/a0/423896973644771497bdc03eb99d5281615b51
           .git/objects/d0
           .git/objects/d0/492b368b66bdabf2ac1fd8c92b39d3db916e59
           .git/objects/c4
           .git/objects/c4/d59f390b9cfd4318117afde11d601c1085f241

       and the contents of these files is just the compressed data plus a header identifying their length and their type. The type is either a blob, a tree, a commit, or a tag.

       The simplest commit to find is the HEAD commit, which we can find from .git/HEAD:
           $ cat .git/HEAD
           ref: refs/heads/master

       As you can see, this tells us which branch we're currently on, and it tells us this by naming a file under the .git directory, which itself contains a SHA-1 name referring to a  commit object, which we can examine with cat-file:

           $ cat .git/refs/heads/master
           c4d59f390b9cfd4318117afde11d601c1085f241
           $ git cat-file -t c4d59f39
           commit
           $ git cat-file commit c4d59f39
           tree d0492b368b66bdabf2ac1fd8c92b39d3db916e59
           parent 54196cc2703dc165cbd373a65a4dcf22d50ae7f7
           author 1143418702 -0500
           committer  1143418702 -0500


       The "tree" object here refers to the new state of the tree:
           $ git ls-tree d0492b36
           100644 blob a0423896973644771497bdc03eb99d5281615b51    file.txt
           $ git cat-file blob a0423896
           hello world!
       and the "parent" object refers to the previous commit:
           $ git cat-file commit 54196cc2
           tree 92b8b694ffb1675e5975148e1121810081dbdff
           initial commit

       The tree object is the tree we examined first, and this commit is unusual in that it lacks any parent.
       Most commits have only one parent, but it is also common for a commit to have multiple parents. In that case the commit represents a merge, with the parent references pointing to the heads of the merged branches.
       So now we know how Git uses the object database to represent a project's history:

  •            "commit" objects refer to "tree" objects representing the snapshot of a directory tree at a particular point in the history, and refer to "parent" commits to show how they're connected into the project history.
  •           "tree" objects represent the state of a single directory, associating directory names to "blob" objects containing file data and "tree" objects containing subdirectory information. 
  • "blob" objects contain file data without any other structure.
  • References to commit objects at the head of each branch are stored in files under .git/refs/heads/.
  • The name of the current branch is stored in .git/HEAD

 

 

This is  going to  lead   us    to   References  which  are like  lightweight movable pointers     and  mutable   . They  point  to  commits   so   that     we can branch  and    merge 

  

So  here   the   box  in gray  is the   mutable  pointer   which we  would use     for  branching and  merging  . Now   we   start   changing  code  . Exclamation  marks  show    change in  code   . Lets  see   what   git     does 

Git  makes  new objects    and   just  moves   the  Reference  since it is  mutable

We would  cover  branching  and   merging and  workflows  in  another  post

Great stuff! Thanks for this post.

Like
Reply

To view or add a comment, sign in

More articles by Pratik Dam

  • Some facts about Unit Testing

    Unit testing For example: in procedural programming a unit could be an entire module but is more commonly an individual…

  • Diwali of a dreamer ...

    Crackers, crackers everywhere No place for infants care Neither our grandparents can sleep While seeing pollution our…

  • Network Testing - Fault Emulation

    Network Emulation and generating all kinds of losses or reallife scenarios can be a useful topic for testing . We could…

  • Git hopefully useful commands

    Git has a huge toolbox but here are some useful commands that may help in your day to day work Viewing all details $…

  • Wake Up Bangaluru .....

    With gloom of darkness, sick and black as bile; the power cut strikes maliciously. Bangalore has become more of a mess,…

    1 Comment
  • Online Buying , Offers and Discounts

    Post every television commercial pause , Everyone hears a goblin making noise , "Buy, order online , unheard discounts…

  • The End of the Neanderthal

    Just dumped computer number four . They lived a couple years each – no less, no more.

  • Commercializing Education In India

    A gentleman was walking past a playground and noticed a young boy playing cricket with bat. A colleague of his was…

    2 Comments
  • From Freelancing to Entrepreneurship

    Which are you? Are you sure? A freelancer is someone who gets paid for her work. She charges by the hour or perhaps by…

    4 Comments
  • Bringing Orchestration and Provisioning Together(Part1)

    Configuring the application Create a new cookbook Create a webserver recipe for the application Create a database…

Others also viewed

Explore content categories