Mastering Git
Master Git by learning advanced tools and techniques that can help you resolve tricky issues with the revision control system
Join the DZone community and get the full member experience.
Join For FreeGit is a distributed revision control system. We learned in Understanding Git - DZone that Git stores different objects - commits, blobs, trees, and tags, in its repository, i.e., inside the .git
folder. The repository is just one of the four areas that Git uses to store objects. In this article, we'll explain the four areas in Git, we'll delve deeper into each of these areas, uncovering their significance in facilitating tracking changes made to files, maintaining a history of revisions, and collaboration among developers. Understanding these areas empowers you to harness Git's capabilities to the fullest.
The Four Areas
Git stores objects in four areas illustrated below. These four areas represent the flow of changes in a typical Git workflow.
Working Area
The working area is where you edit, create, delete, and modify files as you work on your project. It represents the current state of the project and contains all the files and directories that make up your codebase. It is important to remember that the modifications made in the working area are temporary unless changes are committed.
Repository
The repository area is where all the project's history, metadata, and versioned files are stored. In Git, the repository is represented by the .git
folder, which is typically located at the root of the project directory. The Git objects, which are immutable, are stored in the objects subdirectory.
- .git/objects
- blob
- tree
- commit
- tag
In order to gain a deeper understanding of Git, we should be able to answer the following questions when issuing a Git command:
- How does this command move information across the four areas?
- How does this command change the Repository area?
We'll explain these by taking illustrative Git commands. Let's review how Git maintains a project's history.
Project History
The Git objects linked together represent a project's history. Each commit is a snapshot of the working area at a certain point in time.
Branches are entry points to history. A branch refers to a commit, HEAD points to the current branch. Pictorially, this is represented as shown below.
This brings us to the third area in Git - the Index.
Index
The Index, also known as the Staging Area, is an intermediate step between the Working Area and the Repository. It helps in preparing and organizing changes before they are committed to the repository.
Let's see a basic Git workflow that touches the three areas that we've encountered so far - the working area, the index, and the repository.
Basic Workflow
Files in the working directory can be in different states, including untracked, modified, or staged for commit. We use the git add
command to move changes from the working directory to the staging area. The git commit
command saves the changes in the Git repository by creating Git objects, e.g., commit. This is illustrated below.
To see the differences in code made between these areas, we can use git diff
the command as shown below.
$ echo "DS 8000" >> storage_insights.txt
$ git diff
diff --git a/storage_insights.txt b/storage_insights.txt
index 972d3ae..210df65 100644
--- a/storage_insights.txt
+++ b/storage_insights.txt
@@ -2,3 +2,4 @@ Flash 9000
Storwize
XIV
SVC
+DS 8000
$ git add storage_insights.txt
$ git diff --cached
diff --git a/storage_insights.txt b/storage_insights.txt
index 972d3ae..210df65 100644
--- a/storage_insights.txt
+++ b/storage_insights.txt
@@ -2,3 +2,4 @@ Flash 9000
Storwize
XIV
SVC
+DS 8000
The git checkout
command that is used to move to a specific branch, copies changes from the repository area to both the working area and the index.
Remove File
The git rm
command is used to remove file(s) from both the working directory and the index to ensure that Git is aware of their removal, so the changes can be committed. The --cached
option removes the file(s) from the index but leaves them in the working directory, effectively untracking them without deleting them locally.
Rename File
To rename a file we can use the git mv
command. This moves the file from both the working area and the index. It does not touch the repository.
The git reset
Command
The git reset
command in Git is used to move the HEAD pointer and branch references to a specific commit, effectively rewinding or resetting the state of the repository to a previous point in its history.
The options are:
- Soft Reset (
--soft
): A soft reset moves the HEAD and branch reference to a different commit while keeping the changes in the staging area. It does not move any data between areas. - Mixed Reset (Default Behavior,
--mixed
): A mixed reset moves the HEAD and branch reference while also unstaging the changes. The changes remain in your working directory. - Hard Reset (
--hard
): A hard reset moves the HEAD and branch reference and discards all changes in both the staging area and the working directory. It effectively removes commits and changes.
Thus, a reset moves the current branch, and optionally copies data from the repository area to the other areas as illustrated above.
Now that we've covered the three areas and illustrated by way of examples how Git moves data between these areas, it is time to introduce the fourth area - the Stash.
Stash (Clipboard)
git stash
is a handy Git command that allows you to temporarily save and stash changes in your working directory without committing them. This is useful when you need to switch to a different branch, work on something else, or pull changes from a remote repository while preserving your current changes. The stashed changes can later be reapplied or discarded as needed.
git stash --include-untracked
This command moves all data from the working area and index to the stash and checks out the current commit.
Working With Paths
A commit typically includes changes from multiple files. We've worked with commits so far in our journey. It is possible to operate at a more granular level than a commit, e.g., a file. The following examples illustrate how to work with the individual file rather than commits.
To restore a file from the repository to the index, we use git reset
command at the file level.
$ git reset HEAD storage_insights.txt
Unstaged changes after reset:
M storage_insights.txt
This command moves the file storage_insights.txt from the repository to the index. At the file level, the option --hard
is not supported.
To restore a file from the repository to both the working area and the index, we use git checkout
command at the file level.
$ git checkout HEAD storage_insights.txt
Updated 1 path from 43134cc
Parts of a File
Git can operate on things that are smaller than a file. The --patch
option in Git refers to an interactive mode that allows you to selectively stage changes within individual files or even specific lines of code, giving you fine-grained control over what gets committed. It's commonly used with commands like
-
git add
git reset
git checkout
git stash
We'll illustrate how to use this option with git add
a command. We've two local changes in the file, we want to stage only one of the changes, called hunk, and not stage the other hunk.
$ git add --patch storage_insights.txt
diff --git a/storage_insights.txt b/storage_insights.txt
index 972d3ae..16a4557 100644
--- a/storage_insights.txt
+++ b/storage_insights.txt
@@ -1,4 +1,6 @@
Flash 9000
+DS 8000
Storwize
XIV
SVC
+Spectrum Scale
(1/1) Stage this hunk [y,n,q,a,d,s,e,?]? s
Split into 2 hunks.
@@ -1,4 +1,5 @@
Flash 9000
+DS 8000
Storwize
XIV
SVC
(1/2) Stage this hunk [y,n,q,a,d,j,J,g,/,e,?]? y
@@ -2,3 +3,4 @@
Storwize
XIV
SVC
+Spectrum Scale
(2/2) Stage this hunk [y,n,q,a,d,K,g,/,e,?]? n
$ git diff --cached
diff --git a/storage_insights.txt b/storage_insights.txt
index 972d3ae..30280ea 100644
--- a/storage_insights.txt
+++ b/storage_insights.txt
@@ -1,4 +1,5 @@
Flash 9000
+DS 8000
Storwize
XIV
SVC
The option --patch
allows us to process changes, not on a file-by-file basis, but on a hunk-by-hunk basis.
Switch and Restore
The switch
and restore
commands are used to perform specific operations related to branches and file management. They help to switch between branches and restore files to specific states.
The following command will move to a different branch.
git switch <branch-name>
git restore
will allow us to restore files in the working directory to a specified state. It's used to undo changes, either by discarding modifications made to files or by reverting them to a previous commit.
This will replace the modified file with the committed version.
git restore --source=HEAD storage_insights.txt
This command moves the changes from the staging area back to the working directory, effectively uncommitting them.
git restore --staged storage_insights.txt
Summary
It is crucial to understand the four areas that Git uses to move data around in order to implement a revision control system. In this article, we reviewed the four areas and illustrated data movements by taking examples of different Git commands. For every Git command that we use, if we can explain the data movements through these areas, it will help us with a deeper understanding of Git workflows.
Opinions expressed by DZone contributors are their own.
Comments