Every once in a while when programming, you might end up having to make
a deep copy of a bunch of objects. Depending on the complexity of the
data structure, this can be really easy or a real pain. What we are
going to take a look at today is a pattern for making deep copying (or
deep cloning) easy to do in C#. The techniques I am going to show can
be applied to any object oriented language (there is nothing real
specific for C#), so hopefully this can help you out even if you never
use C#.
First off, you might be wondering what I mean by "deep copy" or "deep
clone." Well, the common form of copy in a language is shallow copy - in
fact, C# has a standard interface for it, called
ICloneable
. The main
difference between shallow and deep copy is that shallow copy does not
make a duplicate of anything underneath the top level object.
For instance, say you have an object 'A' which has a reference to an
object 'B'. If you made a shallow copy of A, you would end up with new
object 'A1' with a reference to B. But if you made a deep copy of A, you
would end up with that new object 'A1', but instead of a reference to ',
it would have a reference to a new object 'B1' (which would be a copy of
B).
With a simple case like that, deep copy is not that hard to implement.
But say you have a big interconnected graph of objects like the one
below. The lines and arrows in the picture represent references - for
instance, object C has references to A and I, and object I has
references to H, G, and F. How would you make a deep copy of that and
make sure that everyone ended up with the correct references in the end?
Well, you could make a shallow copy of each object individually, and
then try and reconnect the graph. But besides requiring multiple passes
over the set of objects (and therefore being slow), it would be really
easy to make a mistake.
So instead, the tact that we are going to take is to use an object
dictionary. The overall idea is pretty simple - as we travel through the
graph, making copies and reconnecting references, we will carry a
dictionary. The keys of this dictionary will be the old objects, and the
values will be the new objects. This way, whenever we run across an
object that we need to make a copy of, we can check in the dictionary.
If it is in the dictionary, someone else already made a copy of it, and
we can use the new value stored there. If it isn't in the dictionary, we
make a copy of it, and store the new copy in the dictionary (with the
original as the key).
That paragraph may or may not have made much sense, so lets look at some
code that accomplishes this. Below we have an abstract class called
DeepClonable
which holds much of what makes this possible:public abstract class DeepCloneable
{
protected DeepCloneable() { }
protected DeepCloneable(DeepCloneable srcObj,
IDictionary<DeepCloneable, DeepCloneable> mapDict)
{
mapDict[srcObj] = this;
}
public DeepCloneable DeepClone()
{
return DeepClone(new Dictionary<DeepCloneable, DeepCloneable>());
}
protected DeepCloneable DeepClone(
IDictionary<DeepCloneable, DeepCloneable> mapDict)
{
if (mapDict == null)
throw new ArgumentException("mapDict");
return mapDict.ContainsKey(this) ? mapDict[this] : DeepCloneHelper(mapDict);
}
///Classes should override this method
///with the following code:
/// return new NameOfMyClass(this, mapDict);
protected abstract DeepCloneable DeepCloneHelper(
IDictionary<DeepCloneable, DeepCloneable> mapDict);
}
There isn't much here, but it can be a little difficult to understand.
Essentially, this code does most of what was described two paragraphs
above. If you want to make a deep clone of an object, you would call
DeepClone()
. This would create a new empty object mapping dictionary,
and call DeepClone
with that empty dictionary. Here we have a simple
error check - mapDict
should never be null at this point. Then we
check the dictionary - does it contain the current object we are trying
to copy? In this case it won't - you just started the copy. But if it
did, it would return the new object right then. Since it doesn't, it
calls DeepCloneHelper
with the mapDict
as the argument.DeepCloneHelper
is an abstract method, so it needs to be implemented
by every class that extends DeepCloneable
. So lets take a look at the
simplest DeepClonable
object:public class MyCloneableObj : DeepCloneable
{
private MyCloneableObj(MyCloneableObj srcObj,
IDictionary<DeepCloneable, DeepCloneable> mapDict)
: base(srcObj, mapDict)
{ }
protected override DeepCloneable DeepCloneHelper(
IDictionary<DeepCloneable, DeepCloneable> mapDict)
{
return new MyCloneableObj(this, mapDict);
}
}
As you can see, all
DeepCloneHelper
does is take that dictionary and
make a new instance of the current object type (using a special
constructor). It is in this constructor that all the actual copying
should happen - it just happens to be that with this object, there is no
data to actually copy. But as you might notice, this constructor calls a
constructor on the base DeepCloneable
class - and it is in that
constructor (which is in the previous code block) that this new object
gets added to the map dictionary.
So lets take a look at a more complex example. Here we have a class with
some actual data that would need to to be copied:
public class MyClass : DeepCloneable
{
private List<MyClass> _References = new List<MyClass>();
private string _SomeOtherInfo;
public MyClass(string a)
{ _SomeOtherInfo = a; }
private MyClass(MyClass srcObj,
IDictionary<DeepCloneable, DeepCloneable> mapDict)
: base(srcObj, mapDict)
{
foreach (MyClass mc in srcObj._References)
_References.Add(mc.DeepClone(mapDict) as MyClass);
_SomeOtherInfo = srcObj._SomeOtherInfo;
}
public void AddRef(MyClass r)
{
_References.Add(r);
}
public string SomeOtherInfo
{
get { return _SomeOtherInfo; }
set { _SomeOtherInfo = value; }
}
protected override DeepCloneable DeepCloneHelper(
IDictionary<DeepCloneable, DeepCloneable> mapDict)
{
return new MyClass(this, mapDict);
}
}
In the copy constructor for this class, some actual copying occurs. It
calls
DeepClone
on anything stored in the _References
(always
passing in the mapping dictionary, of course), and it copies the
contents of _SomeOtherInfo
. So now lets make a graph of these objects
to clone - below is the code that makes a graph identical to the picture
earlier in the article:private MyClass MakeCrazyGraph()
{
MyClass objA = new MyClass("A");
MyClass objB = new MyClass("B");
MyClass objC = new MyClass("C");
MyClass objD = new MyClass("D");
MyClass objE = new MyClass("E");
MyClass objF = new MyClass("F");
MyClass objG = new MyClass("G");
MyClass objH = new MyClass("H");
MyClass objI = new MyClass("I");
objA.AddRef(objB);
objA.AddRef(objC);
objA.AddRef(objD);
objB.AddRef(objA);
objB.AddRef(objH);
objC.AddRef(objA);
objC.AddRef(objI);
objD.AddRef(objA);
objD.AddRef(objE);
objE.AddRef(objC);
objE.AddRef(objG);
objF.AddRef(objD);
objF.AddRef(objI);
objG.AddRef(objE);
objG.AddRef(objI);
objH.AddRef(objB);
objH.AddRef(objI);
objI.AddRef(objF);
objI.AddRef(objG);
objI.AddRef(objH);
return objA;
}
Yeah, I know, its a very contrived example. But hey, it works to
demonstrate this concept. And now, lets make a copy of this graph of
objects:
private void DoGraphStuff()
{
MyClass objA = MakeCrazyGraph();
MyClass objACopy = objA.DeepClone() as MyClass;
}
So what exactly happens here when
DeepClone
is called on objA
? Well,
first, it creates an empty object dictionary, and calls the DeepClone
with that empty dictionary. Since there is nothing in the dictionary, it
calls DeepCloneHelper
, which makes a new instance of MyClass
,
passing in objA
as the source and the empty object dictionary. The
first thing that happens is that the new instance is added to the
dictionary, and objA
is the key. Now we go about copying the rest of
objA
.
For
objA
, the _References
list holds three objects, objB
, objC
,
and objD
. We now call DeepClone
on each one of these, passing in the
current object dictionary. Remember, objA
is already in the dictionary
- so when the time comes during the deep cloning of B, C, and D to make
a copy of A again, they will just get the copy that already exits out of
the dictionary.
You can see the path this copying takes in the image below. The red
lines represent the actual copying of the object the arrow points to
(and the source of the red line represents what object initiated that
copying). Places that have black lines and no red lines end up being
connected during the clone by a lookup in the object dictionary. One
thing to note - the path that copying takes is very dependent on how the
structure was created in the first place - for instance, had I done the
initial connecting (in the
MakeCrazyGraph
function) in a different
order, the order in which things are copied here would be different. But
that order doesn't matter for the end result - the end result will
always be copied and connected in the correct way.
Well, that is it for making a deep copy of a highly interconnected data
structure. Here is a zipfile
containing a Visual Studio project with all the code shown above, so
that you can play around with it yourself. If you have any questions or
comments (or better solutions), please leave them below in the comments.
Source Files:
Comments
Post a Comment