Ampelofilosofies

homeaboutrss
The mind is like a parachute. If it doesn't open, you're meat.

Disaster-resistant git

18 Jun 2019

Over the years I have developed a very disaster-resistant git workflow:

  • Change to open-ended branches (master, develop, releases etc.) is always via pull requests.
  • Force push is blocked on all open branches.
  • All changes happen in a branch, no matter how small.
  • Branch is always rebased onto its PR target, never merged.
  • Merge strategy is usually rebase and --no-ff.

That last one is more a cosmetic preference: In projects with big teams and complex structures/systems it is easier to track the sequence of introduction of the changes via the merge nodes. For the rest of this post when you see 'branch' it means a branch that will eventually be merged into another as opposed to an "open-ended" branch like master.

The problem is merging

There is a beast at the end of most branches and it is called Merge. It strikes fear in the hearts of all programmers and only a special few dare face it.

Its most powerful weapon, the razor sharp Merge Conflicts will shred your intellect to pieces in a single stroke, leaving you wandering aimlesssly in a sea of Code.

The beast grows more powerful with Time. Enough Time passes and it will be invincible.

It feeds on Code, give it enough and it will be invincible.

Dealing with merge conflicts

The fear of merges and merge conflicts usually develops when someone attempts to merge a branch that contains too many changes and has been growing over a long time (from days to weeks, it depends on your team's velocity - I once had to merge one that was almost a year old, in svn no less - true story, not fun).

When encountering a merge conflict, the human algorithm is as follows:

  • Have I changed the file?
    • Yes: I know how to resolve the conflict because I know what I wanted to do.
    • No: Take the other version.

After finishing the merge, always check the resulting diff, since sometimes the automatic merge algorithm will produce changes that are logically faulty (i.e. you can get buggy code or non-sensical content that merges fine).

In a long-running branch with a lot of changes I simply do not know enough to resolve the conflict, either because multiple people are involved or because I have forgotten over time. Now I have to reconstruct the context while merging or quit.

Also, in a long chain of commits I really have no idea how to resolve a conflict in a commit in the middle of the chain - I have often had to resolve a conflict in a long commit chain that was reverted in a later commit.

Never mind checking the diff when it is 50 files and a couple of thousand LOCs.

This is reason number one against feature branches - too much, left for too long.

So there are guidelines:

  • Branches need to be short-lived
  • Keep the diff small
  • Keep the number of commits small to very small

As always, one-line guidelines require ellaboration.

Branch lifetime

Short-lived, when we define it in time, is unfortunately a very fuzzy term. You will find advice like "no more than a day" and other equally arbitrary terms.

This guideline is the one I see teams ignoring most often. And as long as the rest are followed, with little or no consequences if we want to be honest.

As long as you remember what the change is - the context and the goal as well - the branch is not old.

Time alone is not a good criterion. Measuring distance from your target branch in LOCs (the size of the diff) or number of commits is a better indicator of staleness and your chances of encountering the Merge Beast.

Diff size

IM(NSH)O this is the really critical factor. In order to satisfy the conflict resolution algorithm I need to know about the change. The bigger it is, the more difficult the task.

Another factor is senseless diff bloat like whitespace changes or stylistic changes. Treat them like a self-contained change. Do not mix them with logic changes. If you need to change the style, the identation size, tabs to spaces etc. then do it in a separate branch and PR.

Better yet, auto-format on commit (a man can have whishes!).

In practice, keep the list of changed files to a minimum and the LOC size per file small. In other words, keep your PR diff view clean.

Number of commits

For a long time I was of the 'small commits, discrete steps towards the solution' school. Each commit should do one thing and my commits are structured and commented in such a way as to present the reviewer with my path towards the solution.

This is a brilliant approach if you are creating a workshop or you are teaching.

In the real world we are interested in the solution. We're a team of software developers, we have trust in our abilities to grasp small to medium globs of code. And in practice we use the pull request forms to do the work. So at work the unit of change is the pull request, not the commit.

The unspoken assumption is that I have broken my user stories down to manageable pieces and do not improvise a redesign while writing the code. Keep the number of commits low (single digit number), ideally bring it down to 1.

How many commits away you are from master is a good indicator of how stale your branch is. This is easily solved with a routine of regular rebasing. Regular as in at least once a day.

How many commits you have in your branch is a good indicator of the amount of trouble you will be when merging. And since rebasing is the same as merging in this aspect, how difficult your daily routine will be.

Setting rerere in the git configuration simplifies repeated rebases but it is a setting that has to be used with caution.

Side notes

There are some points that might be obvious to some:

Doing all changes in branches localizes your conflict resolution algorithm to the branch. It provides a necessary restriction of context that removes a lot of overhead. It is your change you need to care about.

This means that branch constructs like the one used by git flow where develop is merged into master cannot be easily supported like this.

A PR workflow following the guidelines faithfully is like a discrete time simulation of trunk-based development with the steps being the merge nodes. The smaller the blocks (i.e. number of commits), the closer we are to trunk-based. Remove the --no-ff from the merge strategy and the distinction dissappears in the end result.

Larger changes (of which the most common type we call 'features') require conscious design. In my experience, teams fall prey to the Merge Beast because they lack ways to incrementally introduce changes or have been given user stories that are too vague or too big. This is not a git related subject, don't blame the tool.

blog comments powered by Disqus