Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
A simple way to detect unused files in a project using git

After finding that we had a few images checked into our project’s repository but that were not referenced in the project, I wanted to write a script to quickly see if there were any other unused assets.

This was a one-off script, so it probably won’t suit everyone’s needs, but here’s how we approached the problem:

First, we needed to get a list of the files that git was tracking in our image directory. While you could use ls for that, I wanted to be sure that we weren’t going to list any files that git was ignoring, so we started with git ls-files, whose output will look something like this if called as git ls-files ./img:

img/foo.png
img/bar.png

(For the sake of the example, we’ll assume that foo.png is referenced in the project and bar.png isn’t.)

The next thing we want to do is to see if those filenames are referenced anywhere in the code. At this point, I wasn’t sure if they would be referenced by a relative or absolute path, so I knew I wanted to just search for e.g. foo.png. I like to check my work with an intermediate command, so the next command we tried out was

for FILE in $(git ls-files ./img); do
    echo $(basename "$FILE")
done

(basename gives you everything after the last slash of its input — in this case, just the raw filename.) And when we ran that command, we saw the expected output:

foo.png
bar.png

Now that we know we are correctly extracting the desired part of the path, we can check whether that filename is referenced anywhere in the code. git grep works enough like regular grep, but it only searches tracked files in the working tree (if you call it without a commit-ish), so we don’t have to worry about excluding the .git directory or .gitignored files.

If we call git grep foo.png manually, we will see some output like

src/index.html: <img src="../img/foo.png"/>

and git grep bar.png will have no output. But it isn’t the output we care about so much as the exit status (noting that git grep will return non-zero when no results are found) — so let’s run our command again, and verify that we will only remove the expected files:

for FILE in $(git ls-files ./img); do
    git grep $(basename "$FILE") || echo "would remove $FILE"
done

Oops, the output contains the output from git grep still:

src/index.html: <img src="../img/foo.png"/>
would remove img/bar.png

Let’s redirect git grep‘s output to /dev/null and move along.

for FILE in $(git ls-files ./img); do
    git grep $(basename "$FILE") > /dev/null || echo "would remove $FILE"
done

The output looks correct! The last thing to do is to actually remove the file:

for FILE in $(git ls-files ./img); do
    git grep $(basename "$FILE") > /dev/null || git rm "$FILE"
done

After that, run your tests again to make sure that nothing was broken by removing these assumed-to-be-unused files. If you’re green, you’re probably good to commit!

I think these are the two biggest limitations of this roughly-one-liner:

  • You might be globbing your resources, so you might not be able to find references via git grep
  • If you have spaces in your filenames or in your directories, you’ll have to set IFS to properly read the lines from the output of git ls-files.

But if those don’t affect you, then you can probably use that mini-script to get rid of some “noise” files from your working tree.

Comments
Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *