Skip to main content

C# Tutorial - Deep Cloning A Connected Graph Of Objects [Intermediate]


Every once in a while when programming, you might end up having to make a deep copy of a bunch of objects. Depending on the complexity of the data structure, this can be really easy or a real pain. What we are going to take a look at today is a pattern for making deep copying (or deep cloning) easy to do in C#. The techniques I am going to show can be applied to any object oriented language (there is nothing real specific for C#), so hopefully this can help you out even if you never use C#.

First off, you might be wondering what I mean by "deep copy" or "deep clone." Well, the common form of copy in a language is shallow copy - in fact, C# has a standard interface for it, called ICloneable. The main difference between shallow and deep copy is that shallow copy does not make a duplicate of anything underneath the top level object.

For instance, say you have an object 'A' which has a reference to an object 'B'. If you made a shallow copy of A, you would end up with new object 'A1' with a reference to B. But if you made a deep copy of A, you would end up with that new object 'A1', but instead of a reference to ', it would have a reference to a new object 'B1' (which would be a copy of B).

With a simple case like that, deep copy is not that hard to implement. But say you have a big interconnected graph of objects like the one below. The lines and arrows in the picture represent references - for instance, object C has references to A and I, and object I has references to H, G, and F. How would you make a deep copy of that and make sure that everyone ended up with the correct references in the end? Well, you could make a shallow copy of each object individually, and then try and reconnect the graph. But besides requiring multiple passes over the set of objects (and therefore being slow), it would be really easy to make a mistake.

Object Graph

So instead, the tact that we are going to take is to use an object dictionary. The overall idea is pretty simple - as we travel through the graph, making copies and reconnecting references, we will carry a dictionary. The keys of this dictionary will be the old objects, and the values will be the new objects. This way, whenever we run across an object that we need to make a copy of, we can check in the dictionary. If it is in the dictionary, someone else already made a copy of it, and we can use the new value stored there. If it isn't in the dictionary, we make a copy of it, and store the new copy in the dictionary (with the original as the key).

That paragraph may or may not have made much sense, so lets look at some code that accomplishes this. Below we have an abstract class called DeepClonable which holds much of what makes this possible:
public abstract class DeepCloneable
{
  protected DeepCloneable() { }

  protected DeepCloneable(DeepCloneable srcObj, 
      IDictionary<DeepCloneable, DeepCloneable> mapDict)
  {
    mapDict[srcObj] = this;
  }

  public DeepCloneable DeepClone()
  { 
    return DeepClone(new Dictionary<DeepCloneable, DeepCloneable>()); 
  }

  protected DeepCloneable DeepClone(
      IDictionary<DeepCloneable, DeepCloneable> mapDict)
  {
    if (mapDict == null)
      throw new ArgumentException("mapDict");

    return mapDict.ContainsKey(this) ? mapDict[this] : DeepCloneHelper(mapDict);
  }

  ///Classes should override this method 
  ///with the following code:
  /// return new NameOfMyClass(this, mapDict);
  protected abstract DeepCloneable DeepCloneHelper(
      IDictionary<DeepCloneable, DeepCloneable> mapDict);
}
 
There isn't much here, but it can be a little difficult to understand. Essentially, this code does most of what was described two paragraphs above. If you want to make a deep clone of an object, you would call DeepClone(). This would create a new empty object mapping dictionary, and call DeepClone with that empty dictionary. Here we have a simple error check - mapDict should never be null at this point. Then we check the dictionary - does it contain the current object we are trying to copy? In this case it won't - you just started the copy. But if it did, it would return the new object right then. Since it doesn't, it calls DeepCloneHelper with the mapDict as the argument.

DeepCloneHelper is an abstract method, so it needs to be implemented by every class that extends DeepCloneable. So lets take a look at the simplest DeepClonable object:
public class MyCloneableObj : DeepCloneable
{
  private MyCloneableObj(MyCloneableObj srcObj, 
        IDictionary<DeepCloneable, DeepCloneable> mapDict)
    : base(srcObj, mapDict)
  { }

  protected override DeepCloneable DeepCloneHelper(
      IDictionary<DeepCloneable, DeepCloneable> mapDict)
  { 
    return new MyCloneableObj(this, mapDict); 
  }
} 
 
As you can see, all DeepCloneHelper does is take that dictionary and make a new instance of the current object type (using a special constructor). It is in this constructor that all the actual copying should happen - it just happens to be that with this object, there is no data to actually copy. But as you might notice, this constructor calls a constructor on the base DeepCloneable class - and it is in that constructor (which is in the previous code block) that this new object gets added to the map dictionary.

So lets take a look at a more complex example. Here we have a class with some actual data that would need to to be copied:
public class MyClass : DeepCloneable
{
  private List<MyClass> _References = new List<MyClass>();
  private string _SomeOtherInfo;

  public MyClass(string a)
  { _SomeOtherInfo = a; }

  private MyClass(MyClass srcObj, 
        IDictionary<DeepCloneable, DeepCloneable> mapDict)
    : base(srcObj, mapDict)
  {
    foreach (MyClass mc in srcObj._References)
      _References.Add(mc.DeepClone(mapDict) as MyClass);

    _SomeOtherInfo = srcObj._SomeOtherInfo;
  }

  public void AddRef(MyClass r)
  { 
    _References.Add(r);
  }

  public string SomeOtherInfo
  {
    get { return _SomeOtherInfo; }
    set { _SomeOtherInfo = value; }
  }

  protected override DeepCloneable DeepCloneHelper(
      IDictionary<DeepCloneable, DeepCloneable> mapDict)
  { 
    return new MyClass(this, mapDict);
  }
}
 
In the copy constructor for this class, some actual copying occurs. It calls DeepClone on anything stored in the _References (always passing in the mapping dictionary, of course), and it copies the contents of _SomeOtherInfo. So now lets make a graph of these objects to clone - below is the code that makes a graph identical to the picture earlier in the article:
private MyClass MakeCrazyGraph()
{
  MyClass objA = new MyClass("A");
  MyClass objB = new MyClass("B");
  MyClass objC = new MyClass("C");
  MyClass objD = new MyClass("D");
  MyClass objE = new MyClass("E");
  MyClass objF = new MyClass("F");
  MyClass objG = new MyClass("G");
  MyClass objH = new MyClass("H");
  MyClass objI = new MyClass("I");

  objA.AddRef(objB);
  objA.AddRef(objC);
  objA.AddRef(objD);

  objB.AddRef(objA);
  objB.AddRef(objH);

  objC.AddRef(objA);
  objC.AddRef(objI);

  objD.AddRef(objA);
  objD.AddRef(objE);

  objE.AddRef(objC);
  objE.AddRef(objG);

  objF.AddRef(objD);
  objF.AddRef(objI);

  objG.AddRef(objE);
  objG.AddRef(objI);

  objH.AddRef(objB);
  objH.AddRef(objI);

  objI.AddRef(objF);
  objI.AddRef(objG);
  objI.AddRef(objH);

  return objA;
}
 
Yeah, I know, its a very contrived example. But hey, it works to demonstrate this concept. And now, lets make a copy of this graph of objects:
private void DoGraphStuff()
{
  MyClass objA = MakeCrazyGraph();

  MyClass objACopy = objA.DeepClone() as MyClass;
}
 
So what exactly happens here when DeepClone is called on objA? Well, first, it creates an empty object dictionary, and calls the DeepClone with that empty dictionary. Since there is nothing in the dictionary, it calls DeepCloneHelper, which makes a new instance of MyClass, passing in objA as the source and the empty object dictionary. The first thing that happens is that the new instance is added to the dictionary, and objA is the key. Now we go about copying the rest of objA.

For objA, the _References list holds three objects, objB, objC, and objD. We now call DeepClone on each one of these, passing in the current object dictionary. Remember, objA is already in the dictionary - so when the time comes during the deep cloning of B, C, and D to make a copy of A again, they will just get the copy that already exits out of the dictionary.

You can see the path this copying takes in the image below. The red lines represent the actual copying of the object the arrow points to (and the source of the red line represents what object initiated that copying). Places that have black lines and no red lines end up being connected during the clone by a lookup in the object dictionary. One thing to note - the path that copying takes is very dependent on how the structure was created in the first place - for instance, had I done the initial connecting (in the MakeCrazyGraph function) in a different order, the order in which things are copied here would be different. But that order doesn't matter for the end result - the end result will always be copied and connected in the correct way.

Object Graph Path

Well, that is it for making a deep copy of a highly interconnected data structure. Here is a zipfile containing a Visual Studio project with all the code shown above, so that you can play around with it yourself. If you have any questions or comments (or better solutions), please leave them below in the comments.

Source Files:

Comments

Popular posts from this blog

C# Snippet - Shuffling a Dictionary [Beginner]

Randomizing something can be a daunting task, especially with all the algorithms out there. However, sometimes you just need to shuffle things up, in a simple, yet effective manner. Today we are going to take a quick look at an easy and simple way to randomize a dictionary, which is most likely something that you may be using in a complex application. The tricky thing about ordering dictionaries is that...well they are not ordered to begin with. Typically they are a chaotic collection of key/value pairs. There is no first element or last element, just elements. This is why it is a little tricky to randomize them. Before we get started, we need to build a quick dictionary. For this tutorial, we will be doing an extremely simple string/int dictionary, but rest assured the steps we take can be used for any kind of dictionary you can come up with, no matter what object types you use. Dictionary < String , int > origin = new Dictionary < string , int >();

C# Snippet - The Many Uses Of The Using Keyword [Beginner]

What is the first thing that pops into your mind when you think of the using keyword for C#? Probably those lines that always appear at the top of C# code files - the lines that import types from other namespaces into your code. But while that is the most common use of the using keyword, it is not the only one. Today we are going to take a look at the different uses of the using keyword and what they are useful for. The Using Directive There are two main categories of use for the using keyword - as a "Using Directive" and as a "Using Statement". The lines at the top of a C# file are directives, but that is not the only place they can go. They can also go inside of a namespace block, but they have to be before any other elements declared in the namespace (i.e., you can't add a using statement after a class declaration). Namespace Importing This is by far the most common use of the keyword - it is rare that you see a C# file that does not h

C# WPF Printing Part 2 - Pagination [Intermediate]

About two weeks ago, we had a tutorial here at SOTC on the basics of printing in WPF . It covered the standard stuff, like popping the print dialog, and what you needed to do to print visuals (both created in XAML and on the fly). But really, that's barely scratching the surface - any decent printing system in pretty much any application needs to be able to do a lot more than that. So today, we are going to take one more baby step forward into the world of printing - we are going to take a look at pagination. The main class that we will need to do pagination is the DocumentPaginator . I mentioned this class very briefly in the previous tutorial, but only in the context of the printing methods on PrintDialog , PrintVisual (which we focused on last time) and PrintDocument (which we will be focusing on today). This PrintDocument function takes a DocumentPaginator to print - and this is why we need to create one. Unfortunately, making a DocumentPaginator is not as easy as