## Finding duplicate copies in msbuild logs

April 1, 2012

I have recently needed to track down some build races due to duplicate file copies. The build in question is MSBuild-based.

I found this tool provided by co-worker Buck Hodges to be very helpful.

I ran into one issue with the tool – it does not handle node names that contain colons, such as “152:3”. One cause of such node names is when the same project is built multiple times. So, I made some changes to the code to support these node names.

I also rewrote the output logic to group the violations according to the file being copied. This is very handy when you have more than 2 projects racing on the same file since the original tool would report a race for projects A & B and B & C, but not A & C.

Here is the updated source code with my modifications. Big thanks to Buck Hodges for the original!

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;

/*
* Original code taken from:
* http://blogs.msdn.com/b/buckh/archive/2012/01/21/a-tool-to-find-duplicate-copies-in-a-build.aspx
*
* Modified to support node names that contain colons and to group the output by race.
*
* This tool finds cases where more than one file is copied to the same target.  This will cause
* build breaks when msbuild executes the copies in parallel because the copies are independent
* (there are no dependencies).  This typically occurs in incremental builds because incremental
* builds do a lot less work (not nearly as much to build), resulting in the copies being a much
* higher percentage of the build activities and more likely to collide.  Gated checkin,
* continuous integration, and developer/tester builds are almost always incremental, not clean.
* These issues are still possible in regular clean builds, such as done nightly by the build lab.
*
* These race conditions are difficult to debug manually.  Since msbuild records all of the copies
* made via the copy task, we can use the log file to identify cases where the same destination
* path is used in more than one copy.
*
* Use the *.normal.* logs from a clean build with this tool.
*
* The best thing to do is to ensure that each file copy to a particular destination is done by
* one and only one project.  When that is the case, you are guaranteed not to have problems
* with two copies colliding and breaking your build.
*
* Here's example output from buildr.suitesrc.normal.log that shows a copy failure.  Here two
* copies were executed in parallel and the second one failed, causing the build to fail.
*

55>_CopyOutOfDateSourceItemsToOutputDirectory:
Copying file from "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\ReleaseManifest.xml" to "D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml".

53>_CopyOutOfDateSourceItemsToOutputDirectory:
Copying file from "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\ReleaseManifest.xml" to "D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml".

*
* Note that there may be multiple copies in a sequence.
*
291>_CopyOutOfDateSourceItemsToOutputDirectoryAlways:
Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\httpcfg.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\httpcfg.exe".
Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\makecert.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\makecert.exe".
Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\winhttpcertcfg.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\winhttpcertcfg.exe".
CopyFilesToOutputDirectory:
Copying file from "D:\int\641\194\suitesrc\tfshttpsconfig.csproj_80399372\objr\x86\TfsHttpsConfig.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\TfsHttpsConfig.exe".

* Nodes are reused by msbuild.  The result is that a given node may process many projects, so it's not
* possible to scan and pair up all of the nodes and project files at once.  In the code below,
* you will see that it always tracks the most recent node for that reason.
*
*/

{
class Program
{
static void Main(string[] args)
{
if (args.Length < 1)
{
Console.WriteLine(
@"This tool scans a build log, such as buildr.suitesrc.normal.log, and produces a
list of file paths that are the targets of more than one copy and shows which
project files initiated each copy.  These redundant file copies are prone to
fail periodically in incremental builds, such as gated check ins and CI builds,
because copies are a higher percentage of the operations in the build, making
it more likely that two collide.");

return;
}

ProcessFile(args[0]);
}

private static void ProcessFile(String fileName)
{
Dictionary<String, String> nodeTable = new Dictionary<String, String>(1000, StringComparer.InvariantCultureIgnoreCase);
Dictionary<String, String> pathTable = new Dictionary<String, String>(1000, StringComparer.InvariantCultureIgnoreCase);
Dictionary<String, HashSet<String>> duplicateTable = new Dictionary<String, HashSet<String>>(StringComparer.InvariantCultureIgnoreCase);
String previousLine;

// Process all of the lines in the file, skipping the first line (we need the previous line,
// and the first line in the file isn't important to this tool).
string lastNode = String.Empty;
for (int i = 1; i < text.Length; i++)
{
previousLine = text[i - 1];

// Record most recent node.  The text that appears with it can be different
// (see sample log data).
string prevLinePattern = @"([0-9]+\:?[0-9]*)[>]";
Match match = Regex.Match(previousLine, prevLinePattern, RegexOptions.IgnoreCase);
if (match.Success)
{
lastNode = match.Groups[1].Value;
}

// If the line is recording the start of a project, add it to the table.
string pattern = @"([0-9]+\:?[0-9]*)[>]Project ""[^""]+"" $$[0-9]+\:?[0-9]*$$ is building ""([^""]+)"" $$([0-9]+\:?[0-9]*)$$";
match = Regex.Match(text[i], pattern, RegexOptions.IgnoreCase);
if (match.Success)
{
string node = match.Groups[3].Value;
String projectPath = Path.GetFullPath(match.Groups[2].Value);

// Because nodes are reused, we are only keeping the project path for the most recent use
// of a given node.
nodeTable[node] = projectPath;

// If we matched a project line, it can't be a copy line.
continue;
}

// If the line is one that records a copy, see if there was an earlier copy made to
// the same target path.  First, try the output of a copying task.
string copyingPattern = @"Copying file from ""[^""]+"" to ""([^""]+)""";
match = Regex.Match(text[i], copyingPattern, RegexOptions.IgnoreCase);
if (match.Success)
{
String targetPath = null;
try
{
targetPath = Path.GetFullPath(match.Groups[1].Value);
}
catch (Exception)
{
// There is a file in the test tree that uses non-English chars that causes
// GetFullPath() to throw (TODO: understand why), so we keep the raw text.
targetPath = match.Groups[1].Value;
}

// If we have already seen the target path,
// then we have a duplicate copy path target to report.
String otherNode;
if (pathTable.TryGetValue(targetPath, out otherNode))
{
HashSet<String> otherNodes;
if (duplicateTable.TryGetValue(targetPath, out otherNodes))
{
}
else
{
HashSet<String> nodeList = new HashSe<String>();
duplicateTable[targetPath] = nodeList;
}
}
pathTable[targetPath] = lastNode;
}
}

foreach(KeyValuePair<String, HashSet<String>> kvp in duplicateTable)
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine("Duplicate file: {0}", kvp.Key);
Console.ResetColor();

foreach(String project in kvp.Value)
{
Console.WriteLine("      Project: {0}", project);
}
if (kvp.Value.Count == 1)
{
Console.WriteLine("      (A single project listing means that this project performed multiple copies of the same item.)");
}
Console.WriteLine();
}
}
}
}