Git and Folder (Non)Tracking

We talk about Git a lot, because it’s simply the de facto way of collaborative engineering today. However, aside from the complexities with using git as a source code version control tool (you can read this post about it), there are other aspects with the way that git works that still require workarounds and hacks, even as they are known issues for many years.

One such pain remains in the lack of folder tracking in git. There are plenty of good posts on best practices with tracking folders (such as using .gitkeep or .gitignore files to track empty folders), and in this post, I just want to outline why this is important at all.

The Git Tracking Oversight

Git is a system built to track files only, but files –– as we all know –– are hosted in folders. This means that if you want to do an operation as simple as renaming or moving a folder, this will affect all the files hosted in that folder, and your repo status will show as though all the files under the folder (and its subfolders) were changed, when really only one folder name or path has changed.

Why does this matter?

There are a few reasons this is something you should pay attention to if you are just getting started with git, and we’ll dive into them below.

Real-World Scenarios and Recommended Workarounds

Let’s start with tools you may be using in your repository. Some of these tools (take the example of DBT), when integrating with your repository, will build a directory tree including folders that are required for the tool when it runs. Upon initialization though, these folders start empty and some of them will stay empty at the time of a new commit. Because DBT is a great tool, it uses the aforementioned practice to create dummy files (.gitkeep) inside the empty folders, in order to make the folders “trackable.”

In the event that folders are created in a repository with no such dummy file, they won’t be tracked and pushed to the remote, so if someone chooses to fork or clone your repository as-is, these untracked folders will not be present in the cloned repository. If a tool requires certain folders to exist to properly run and store future files, this tool’s operation will fail.

Tip #1: If you require a certain hierarchy to your repository, whether it’s for specific tools that require certain directories or even automation processes in your CI/CD, make sure to create a dummy file inside these folders so they aren’t lost when collaborating and sharing with teammates.

Another common scenario where tracking only files can cause a lot of pain is when moving or renaming folders. Content changes are often made in the same commit that the rename or move has occurred (refactoring). When this happens, git will treat all the changed files in the moved folder as completely new objects (because both the path and contents are different), and you will essentially lose their entire change history. On the flip side, if a teammate has not yet pulled or updated their local copy, and they happen to update any of those files pre-move or name change, this can create the type of merge conflicts from our worst nightmares.

This one is particularly painful, as it requires the entire engineering organization to hold its breath until this update is propagated to the entire engineering team. Depending on the internal processes, a pull request like this might require the same checks and balances as any other code changes, with peer reviews and merges, and the entire engineering team will be blocked from continuing work until this change is merged.

Tip #2: A good practice when planning on moving or changing a folder name is to complete this operation as a standalone and separate commit to any other code or file content changes. Push and then merge this change.

Once you have done so - be sure to update the entire engineering organization to the change and have them rebase their local copies before making any further code or file content changes. (Yes, you literally have to do all this, to not have any breaking changes due to the lack of folder tracking in git).

Important caveat: If engineers from the team are working on their own branches, they will not automatically sync with the main branch, so be sure to have them pull the main branch into their working branches.

However, the good news is, that if you do follow this process, you will maintain your entire folder and file content history after the move or rename, so make sure to follow this closely.

Knowledge is Power with Git

When it comes to working with our developer tools, knowledge is power, and can be the difference between painful mistakes like losing our entire folder or file history because we weren’t aware of the underlying fundamentals of our tools. Before you complete any potentially irreversible operation (rm -rf anyone?), make sure to read the fine print and know what you’re doing beforehand.

We need our tools to support our work and the pace of our engineering, and sometimes when we take the time to do things in the proper canonical order recommended, we will actually move faster, and not have to clean up unnecessary messes that will only slow us down.

We’ll continue to share good practices with version control, and how to work through its challenges, so stay tuned.

Share Us