Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let’s see how it works!
A software no-one asked for
First, we need a piece of software that deals with files and that has to be unit-tested. The TestableIO project does the following:
- given a directory, enumerate all the lossless audio files (e.g. FLAC ones) and all the lossy audio files (e.g. MP3 ones),
- if an audio file the same name is present in both lossless and lossy format, delete the lossy file
- repeat for the subfolders of the given directory
As you can see, this is just a toy software to showcase the testing part. The class DuplicateAudioFileDeleter implements the requested behaviour:
public class DuplicateAudioFileDeleter { private readonly IFileSystem fileSystem; private readonly HashSet<string> lossyFileExtension = new HashSet<string>() { ".MP3", ".MP4", ".AAC", ".MPC"}; private readonly HashSet<string> losslessFileExtension = new HashSet<string>() { ".FLAC", ".APE", ".WAV"}; public DuplicateAudioFileDeleter(IFileSystem fileSystem) { this.fileSystem = fileSystem; } public DuplicateAudioFileDeleter() : this(new FileSystem()) { } /// <summary> /// check the given path for files with the same name but different formats, and deletes lossy files if a lossless one exists /// </summary> /// <param name="path">the top level directory to start searching</param> /// <remarks>this method will search in all subfolders of the given directory</remarks> public void CleanupDirectory(string path) { HashSet<string> losslessFiles = new HashSet<string>(); // stores full file name without extension of lossless files List<string> lossyFiles = new List<string>(); // stores full file name of lossy files // get all files from the given path and all subfolders var allFiles = fileSystem.Directory.GetFiles(path, "*.*", SearchOption.AllDirectories); // build list of lossy and lossless files foreach (var currentFile in allFiles) { var currentFileExtension = fileSystem.Path.GetExtension(currentFile).ToUpper(); if (lossyFileExtension.Contains(currentFileExtension)) { // lossy file found lossyFiles.Add(currentFile); } else if (losslessFileExtension.Contains(currentFileExtension)) { // lossless file found var currentFileWithoutExtension = fileSystem.Path.Combine(fileSystem.Path.GetDirectoryName(currentFile), fileSystem.Path.GetFileNameWithoutExtension(currentFile)); losslessFiles.Add(currentFileWithoutExtension); } // not an audio file } // deleted lossy files if a lossless file with the same name exists foreach (var currentLossyFile in lossyFiles) { var currentLossyFileWithoutExtension = fileSystem.Path.Combine(fileSystem.Path.GetDirectoryName(currentLossyFile), fileSystem.Path.GetFileNameWithoutExtension(currentLossyFile)); if (losslessFiles.Contains(currentLossyFileWithoutExtension)) { // duplicate file found fileSystem.File.Delete(currentLossyFile); } } } }
The key point is using an IFileSystem instance instead of the usual System.IO classes, after adding to the project the System.IO.Abstraction NuGet package by Tatham Oddie, so instead of calling File.Delete, we use IFileSystem.File.Delete.
The algorithm detailed above is recursive, as the code should parse the root folder, eliminating duplicate files, and then repeat the procedure for all the subfolders, and so on. But as Directory.GetFiles can recursively enumerate all files in subdirectories, let’s take a shortcut and get the whole set of files in a single call:
var allFiles = fileSystem.Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
Then we build two lists of audio files, one list of lossy files, and another or lossless files, and finally we iterate the list of lossy files searching for a lossless audio file with the same name, and if found, we delete the lossy file.
Done! So let’s go testing.