Darek Kay's picture
Darek Kay
Solving web mysteries

Git explained: Rewriting history

One of Git's core features is "rewriting history", i.e., "altering" existing commits. I'm using quotation marks, because — despite the appearance — the Git history is immutable. It is by design impossible to modify or delete an existing commit with regular Git commands.

And yet, among many Git users, there exists this fear of losing some changes or creating a mess. In reality, any committed change can be considered safe, i.e., it can be retrieved even after "altering" the history.

What does rewriting mean?

Why would you ever want to alter Git history anyway? Let's imagine you've made a typo in the most recent commit:

* 5c7a782 - (HEAD -> main) Fix deefct 47
* fb2546f - Add checkout page

Usually, you can use commands like reset or rebase (-i) to "rewrite" the Git history. However, correcting the last commit is fairly common, so there is an easier alternative:

git commit --amend -m "Fix defect 47"

Another look into the Git history shows the correct message:

* 2b049ea - (HEAD -> main) Fix defect 47
* fb2546f - Add checkout page

It appears as if we have just altered the last commit. However, the hash has changed from 5c7a782 to 2b049ea, which means that we have created a new commit. This is the reason:

For every commit Git calculates a unique hash and assigns it to the commit. It is based on the commit message, content, author, parent commit etc. Changing any of those properties leads to a new hash and hence a new commit.

Since we've just changed the commit message, Git created a new commit with a different hash. The same thing happens when "moving" (e.g. rebasing) commits, because the parent commit changes.

Unreachable commits

But where did the other commit go? By default, git log hides every commit that is not reachable from any pointer, like HEAD or a branch. Those unreachable (or dangling) commits can be displayed with the --reflog flag:

git log --graph --oneline --reflog
* 2b049ea - (HEAD -> main) Fix defect 47
| * 5c7a782 - Fix deefct 47
|/
* fb2546f - Add checkout page

Alternatively, you can use git reflog to find unreachable commits:

git reflog
2b049ea (HEAD -> master) HEAD@{0}: commit (amend): Fix defect 47
5c7a782 HEAD@{1}: commit: Fix deefct 47
fb2546f HEAD@{2}: commit: Add checkout page

As we can see, "altering" history is nothing else than creating new commits and moving the HEAD and main pointers. Hence, the term "alternative history" is more suitable. It also means that we can revert our "destructive" commit --amend command if required:

git reset --hard 5c7a782

Author date vs. commit date

Git keeps two timestamps for each commit:

  • Author date: The date of the original commit.
  • Commit date: The date of the (last) "altered" commit.

When creating a new commit, both timestamps will be equal. However, they will differ after "altering" an existing commit. When using commands like show or log, Git displays the commit date by default. To view both timestamps, use the fuller format:

git show --format=fuller 2b049ea
commit 2b049eadac74e183e48b918e377e41765fca2a99
Author:     Darek Kay
AuthorDate: Thu Mar 31 19:18:02 2022
Commit:     Darek Kay
CommitDate: Fri May 6 18:26:49 2022

If you want to sync the commit date with the original (author) date when "altering" Git history, use the --committer-date-is-author-date flag:

git rebase -i --committer-date-is-author-date

Garbage collection

Previously, I've claimed that all commits can be considered safe. However, there is a limitation:

The Git garbage collector will remove all unreachable commits automatically after a certain time (30 days by default).

Especially in a rebase flow, you will create and copy a lot of commits. The garbage collector does some housekeeping and removes all abandoned commits after a certain time. In my daily work, I rarely want to keep them. If you do, assign a branch to a dangling commit:

git branch my-branch 5c7a782

Altering public history

As long as we are "altering" commits that are not public (i.e., they were never pushed to a remote repository), you're free to do whatever you want. Things get tricky when we want to move a public branch around.

A common Git best practice states:

Do not alter public history.

I think this is great advice for Git beginners, but it can be limiting if you and your colleagues know what you're doing.

Let's first see the consequences of rewriting public Git history. Let's assume the flawed commit from the last section had been already pushed to the origin remote. After running git commit --amend the log would look like this:

* 2b049ea - (HEAD -> main) Fix defect 47
| * 5c7a782 - (origin/main) Fix deefct 47
|/
* fb2546f - Add checkout page

If we try to push our corrected commit to origin, we'll get an error:

git push
To ../origin/
 ! [rejected]        main -> main (non-fast-forward)
error: failed to push some refs to '../origin/'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

This is the expected behavior, because by default a git push is only allowed to add new commits onto the latest "tip" of the remote (origin). Git suggests a simple solution: "integrating the remote changes". In our case, git pull is equivalent to git merge origin/main, which is not what we want. Instead, we want to replace the remote commit. This can be achieved with a force push:

git push -f
git push --force-with-lease  # safer version

Now everything looks fine:

* 2b049ea - (HEAD -> main, origin/main) Fix defect 47
* fb2546f - Add checkout page

But now Janet and Steve and all your other colleagues who work on the main branch will get the same issue as you've had before. That's why projects often disallow a force push for shared branches (e.g. main, develop).

How do we deal with this situation?

When in doubt, follow the best practice "Do not alter public history". A typo looks bad, but do you want to go through all the hassle to fix it?

If you do have to force-push a shared public branch, the first and most important step is communication. All your colleagues working on the same branch should know to expect an issue when interacting with the shared repository. They should also know how to fix this situation. For amended changes, here's a solution that covers most use cases:

git pull --rebase

This command will fetch the remote branch and rebase all local commits on top of it. Amended commits will be resolved automatically (so the typo commit will be skipped).

Another solution might be to discard any local changes and reset the local main branch to origin/main:

git reset --hard origin/main

Other use cases may be harder to fix, including (interactive) rebasing and cherry-picking. Always consider the trade-offs before force-pushing.

What if you're working alone on a public branch? Then you're usually free to force-push as much as you like. Again, communication is key here. I would rephrase the Git best practice from above:

In general, do not alter public history on branches that multiple people work on.

Conclusion

Version control is an underrated skill. Most software engineers use it daily, and yet, many are not willing to invest more than necessary to learn it. That's fine, but knowing more than commit/push/pull will at least make you more efficient. It will also help you solve issues you (and your colleagues) may encounter.

I hope this article explains Git's default "safety net" behavior and motivates you to try out some advanced features.


Related posts

Git explained: Rewriting history