“git - the stupid content tracker” – description of man git written by Linus Torvalds

What are under the hood ?

git objects and some “pointers”

git basic objects relations

  • the ones in oval are objects

  • the ones in rectangles are pointers

  • HEAD points to the branch or commit where you are on

  • remote, (annotated) tag, branch points to a commit

  • a commit points to another commit (its parent if any) and a tree

  • a tree could point to multiple blobs and trees

decompress and compress the object

How to see what’s inside a git object ?

demo of decompress and compress using zlib and sha1

You can use decompress the filename to decompress the git object using zlib and compress the byte string to get the sha1 hash of the object, which is very similar to its filename.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
### compress_lib.py
import zlib  # A compression / decompression library
def decompress(filename):
    compressed_contents = open(filename, 'rb').read()
    decompressed_contents = zlib.decompress(compressed_contents)
    return decompressed_contents

from hashlib import sha1  # SHA1 hash algorithm
def compress(decompressed_contents):
    return sha1(decompressed_contents).hexdigest()
1
2
3
4
5
6
### interactive python
>>> content = decompress('./.git/objects/ac/505a3830f82c59ecbe11064e33987102c96de0') # input a filename of a git object
>>> content 
b'blob 1251\x00\n\nEos dicta neque ut. Ut doloremque ducimus et autem commodi non incidunt et. At dolore expedita et error in ipsum voluptas et. Amet voluptatibus in voluptas. Quidem nam harum porro earum enim impedit. Totam et consequatur amet veniam cupiditate magni.\n\nDebitis magni repellendus sunt quos et. Sapiente voluptatem ut eaque quas beatae nobis facere. Qui a eveniet recusandae placeat omnis esse. Accusantium provident sit quia voluptatem. Voluptatum mollitia consequatur omnis dignissimos suscipit.\n\nEx fuga ad nam voluptatibus sapiente nobis aspernatur. Quasi fugit culpa libero veniam dolorem et dolorem quidem. Aut ratione hic magni. Dolor nam architecto at nihil. Sunt odio temporibus voluptatem et et atque labore. Vel delectus atque sed et.\n\nAut perferendis fugit exercitationem aut praesentium labore itaque facilis. Optio et ea error soluta sunt quia deleniti. Dignissimos aspernatur molestias tenetur hic debitis. Mollitia quisquam molestiae et doloribus. Facere culpa veniam minima. Autem vel dolor molestiae quia.\n\nQuis dolores quis a molestiae. Aspernatur laudantium at animi. Suscipit aut accusantium et delectus excepturi maxime. Aut cupiditate eos ab id possimus suscipit. Et enim consequatur hic culpa facilis. Fuga molestias quis aut.\n\n'
>>> compress(content) # input the byte string and get the sha1 
'ac505a3830f82c59ecbe11064e33987102c96de0' # very similar to the filename

So

  • a git object could be decompressed using a compression library such as zlib
  • a git object is stored in such a manner that the concatenation of the its directory name (2 chars 'ac') and the file name (38 chars '505a3830f82c59ecbe11064e33987102c96de0') is actually the sha1 hash of the content with the object type (blob) and number of bytes (1251) included shown in content variable above

git cat-file

Instead of using the script above, git cat-file (man) is a builtin way to print the content of a git object.

demo of git cat-file

  • git cat-file -t is to print the type of the object
  • git cat-file -p is to pretty print the content of the object
  • man git-cat-file for more

I will use git cat-file -p for the rest of the article so you won’t see the type of the object or the size of the object in bytes that you can see by decompress using python script above.

And I will omit them when describing the content of a git object for the rest of the article.

inside tag, commit, tree, blob objects

What are inside the tag object, commit object, tree object, blob object ?

Let’s take a look at them.

demo of inside tag, commit, tree, blob

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
### a summary of demo
$ tree # show current directory (omit hidden files)
.
├── a.txt
└── directory
    ├── b.txt
    ├── c.txt
    └── d.txt 

$ cat .git/refs/tags/annotated_tag # annotated_tag is an object with hash below
f2b8234deeb7712edf24c03e8c700714ee1863d0

$ git cat-file -p f2b82 # what's inside the tag object
object fad735b6d6b1d5b90abfc97f3b332363c26bdf24
type commit
tag annotated_tag                              
tagger Alex Lai <contactme@alexlai.xyz> 1574626019 -0500

Yes, a annotated tag

$ git cat-file -p fad7 # what's inside the commit object
tree cf3ee3a86b13332bcc01f62b53cdcffe91b42785
author Alex Lai <contactme@alexlai.xyz> 1574626016 -0500
committer Alex Lai <contactme@alexlai.xyz> 1574626016 -0500

commit

$ git cat-file -p cf3ee # what's inside the tree object
100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
040000 tree 2edf42f04dba21ca928fdc9456b717bb0c6cf80f    directory

$ git cat-file -p 78981 # what's inside a blob object
a

$ git cat-file -p 2edf # what's inside another tree object
100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    c.txt
100644 blob 4bcfe98e640c8284511312660fb8709b0afa888e    d.txt

Now we know exactly what are in these objects, let’s examine them one by one.

blob object

1
2
$ git cat-file -p 78981 # what's inside a blob object
a  # ← file content

A blob stores the content of a file but not the filename nor the mode.

What do you think will happen to the .git/objects/ directory if we create 100 files with same content ?

demo of creating 100 same files

Answer: only one blob object is created.

tree object

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ git cat-file -p cf3ee # what's inside the tree object
100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
040000 tree 2edf42f04dba21ca928fdc9456b717bb0c6cf80f    directory
# ↑     ↑                 ↑                               ↑
#mode  type           sha1 hash                         filename

$ git cat-file -p 2edf # what's inside another tree object
100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    c.txt
100644 blob 4bcfe98e640c8284511312660fb8709b0afa888e    d.txt

A tree stores a list of hashes of trees and blobs with the names and modes of those trees and blobs.

It is like a directory.

commit object

1
2
3
4
5
6
7
$ git cat-file -p fad7 # what's inside the commit object
tree cf3ee3a86b13332bcc01f62b53cdcffe91b42785    # ← sha1 hash of the tree object
author Alex Lai <contactme@alexlai.xyz> 1574626016 -0500
committer Alex Lai <contactme@alexlai.xyz> 1574626016 -0500
                                      #     ↑           ↑
                                      #    time      timezone
commit  # ← comment/message

A commit stores the hash of a tree and author, committer, time, message, and the hash of its parent commit.

The first commit doesn’t have a parent.

An example of a commit with parent.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ git log --oneline
6ca1704 (HEAD -> master) child
7aef601 parent

$ git cat-file -p 6ca1
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904     # ← tree hash
parent 7aef6017a128d5c07bb71fff61a75f996c5c36f6   # ← parent commit hash
author Alex Lai <contactme@alexlai.xyz> 1574701198 -0500
committer Alex Lai <contactme@alexlai.xyz> 1574701198 -0500

child # ← comment/message

Let’s take a look at an example (rust repository) in the wild.

demo of rust commits

You can think of commits as a forest where each child stores the hash of its parent (or it will be an abandoned child).

You can imagine how messy the commits could be.

tag object

1
2
3
4
5
6
7
$ git cat-file -p f2b82 # what's inside the tag object
object fad735b6d6b1d5b90abfc97f3b332363c26bdf24 # ← object hash
type commit  # ← object type
tag annotated_tag # ← tag name
tagger Alex Lai <contactme@alexlai.xyz> 1574626019 -0500

Yes, a annotated tag  # ← comment/message

A tag (in this case it is an annotated tag) stores object, type, tag, tagger and a messenge.

It provides a permanent shorthand name for a particular commit while a branch “pointer” can move around.

Let’s look at tags still in the rust example.

demo of rust tag 1.39.0

The tag (or commit) can also be signed by your pgp secret key, ensuring the cryptographic integrity to a release or a version.

Tools to interact with git

Here are a list of tools I often use to interact with git:

  • git commands in shell, which I feel safe most
  • vim-fugitive (very powerful and interactive plugin, can replace the git commands in shell by :Git functions in vim)
  • emacs magit I am still new to this, it’s very interactive.

Here are some more gui tools I have tried:

  • gitk not bad, I use it sometimes to visualize
  • git gui well, I already have vim-fugitive
  • gitkraken (very good but it’s not free for private repositories)

references and resources