Tak Auyeung, Ph.D.
Dynamic memory allocation is performed when the memory requirements of a program cannot be determined at compile time. As a result, the program needs to allocate memory as necessary during run-time. In addition, because the allocation of memory may depend on the logic and specific run-time parameters, it is not possible allocate memory in the same pattern in general.
Many programs, such as compilers, cannot determine their memory requirements at compile time. Consequently, dynamically allocated memory is a required feature for most programming languages.
Because pointers are such low-level constructs, they provide certain benefits, such as code size efficiency as well as run-time efficiency. At the same time, because pointers are low-level constructs, any misuse can also lead to disastrous results.
There are three main categories of disasters.
Combined with dynamic memory allocation, the use of low-level pointers can lead to memory leak. In other words, the only pointer pointing to a dynamically allocated memory object can be overwritten. Once this pointer is overwritten, the dynamically allocated object is no longer accessible. This also means that the object can no longer be deallocated.
The second category of disaster can be detected fairly easily. A program may attempt to delete the same object twice. This can be detected because the library subroutines can check if a chunk of memory is already deallocated.
The third category of disaster is just memory corruption. Because pointers can be modified in many ways (in C and in C++), they can be modified so that they point to memory locations that should be out of reach. Low-level pointers do not understand object boundaries, and therefore it is impossible to even check for memory corruption problems.
``Where is'' is analogous to pointers. It indicates where an object is. ``This is a'' is analogous to a dynamically allocated object, but with some extra goodies.
WhereIs
object act like a pointer to
a ThisIsA
object, but with the proper exposure of the actual
payload type. This can be accomplished by the use class templates.
Our first attempt is illustrated by listing :autoreflstlisting1.
[language=C++, numbers=left, caption=WhereIs.01, label=listing:whereis.01]whereis01.cc
We define ThisIsA
as a member class of WhereIs<T>
. This is
because we will be using WhereIs<T>
more often. In fact,
the member class ThisIsA
is private, which means we do not have
explicit access to the class! This implies that we cannot explicitly
create objects of the type WhereIs<T>::ThisIsA
.
All operations are WhereIs
centric. Here is a short explanation of
the methods of WhereIs
:
WhereIs(void)
: default constructor, the pointer is reset to
NULL.
WhereIs(WhereIs<T> &other)
: copy constructor.
operator*(void)
: dereference, this is how we get to the
object of type T
.
nuke
: okay, this is a bad name. However, it is clear that
this method deallocates the dynamically allocated object!
allocate(void)
: perform dynamic allocation and make
ptr
point to the allocated location.
allocate(const T &original)
: allocate and also invoke
the copy constructor of the payload object type.
becomesAliasOf(WhereIs<T> &other)
: a long name, but it
describes exactly what it does! The pointer is copied, so this
is the shallowest kind of copy. I do not use the assignment
operator because =
may mean deep copy (of the allocated
object), or shallow copy (of the pointer).
The test
subroutine illustrates how some of the methods are
used.
This first trial is neat, but all it does is to introduce a lot of extra typing with no real benefit.
One reason why this is a common problem is that we use the assignment operator to alter what a pointer points to without thinking about the consequences. This is actually why operator overloading can be confusing. The same operator may be harmless in some situations, where very harmful in others.
Even our WhereIs
class has this problem. When we make a
WhereIs
an alias of another WhereIs
, what happens to
whatever it was pointing at first? To make the programmer aware of
such problems, we change the definition of WhereIs
as in
listing :autoreflstlisting2.
[language=C++, numbers=left, caption=WhereIs.02, label=listing:whereis.02]whereis02.cc
This revision adds a new method called detach. All detach
does
is to reset the pointer to 0. We also add checking logic to see if a
pointer is NULL before changing it. If a pointer is not NULL, and we
change it, the code throws an exception called CheckMemLeak
.
In other words, in order to change a WhereIs
to point to another
object, the detach
method should be invoked first. Otherwise,
an exception is thrown at run-time.
The introduction of the ``detachment'' concept is critical. This makes
it obvious to the programmer that whatever a WhereIs
was pointing
at may be lost.
Note that even the destructor checks if ptr
is NULL. This is
because the destruction of a WhereIs
can potentially lead to
loss of the last link to dynamically allocated object.
Even with the detach
method, it does not prevent memory leak.
It simply makes a programmer call detach
first to expose the
possibility of memory leak. However, a programmer may mistakenly
call detach
instead of nuke
and lead to memory leak.
Note that even the destructor of WhereIs
checks if ptr
is 0. This forces a programmer to explicitly detach a WhereIs
before it is destroyed.
ThisIsA
so that it knows how many WhereIs
are pointing at it. When a
ThisIsA
is not longer pointed to by any WhereIs
, then
it can be deallocated.
Listing :autoreflstlisting3 is the revised code to maintain and make use of the referenc count.
[language=C++, numbers=left, caption=WhereIs.03, label=listing:whereis.03]whereis03.cc
First, nuke
becomes private. This is because there should not be
any need to explicitly deallocate a ThisIsA
. Next, detach
is enhaneced to first decrement the reference count of a ThisIsA
.
If the reference count decrements to 0, then the ThisIsA
is
deallocated by calling nuke
.
WhereTo
can go where a pointer is needed, and ThisIsA
is
never used explicitly.
Listing :autoreflstlisting4 is an example of how we can use
the WhereIs
type.
escapeinside=/*%%*/ [language=C++, numbers=left, caption=Linked List.01, label=listing:linkedlist.01]linkedlist.cc
Node
is a member class of List
in the private section. This means a user cannot directly access
Node
. On line :autoreflstlisting10, the actual type
T
is used to create the storage to store an object of
the user specified type. On line :autoreflstlisting11, the
next ``pointer'' is defined, only that we use
WhereIs<Node>
as the type. Note that T
is nowhere
to seen. This is okay, because Node
is already specialized
to handle a payload of type T
.
On line :autoreflstlisting15, we first detach next from whatever it was referring to, then make it an alias of the parameter.
first
. Note that on line :autoreflstlisting33, we
detach the first
reference before allocating a new node
as the new head of the linked list. This is okay, because the
line before it first made an alias of this node as oldFirst
.
This means that even when as lose the last external (named) reference to
the data structure, it will not trigger the deallocation. In this case,
the nuke
method may be useful as it forces the deallocation
regardless of the reference count.
Nonetheless, even with nuke
nothing is automatic anymore.
Instead, the concept of a handle is introduced. A handle differs from a pointer in many ways.
First of all, a programming language that supports handles generates code to track handles. In other words, as execution enters a scope that defines a handle variable, the variable is ``registered''. As execution leaves the scope, the variable (of handle type) is ``unregistered''. This permits the language (interpreter or virtual machine) track active ``reachable'' handles from the program.
Next, a handle does not directly point to physical memory. There is one additional level of indirection. Consequently, it is possible to relocate objects allocated in dynamically allocate memory, and the reallocation does not break the linkage between the handles of the reallocated objects and the objects.
A handle can be implemented as an index into a large universal array of pointers. As a result, two handles referring to the same object are the same index into the same entry in the array of pointers. This also permits the change of the actual pointer in the array without impacting the handles.
A special requirement is that any class containing handles must place the handles in a special segment. This permits the interpreter or virtual machine inspect an object and track the objects that are referred to by this object.
Furthermore, each chunk of allocated memory also knows its own index in the universal table of pointers.
When a new chunk of memory is allocated, the interpreter or virtual machine finds an unused pointer in the universal table of pointers, and make that entry associated with the chunk of memory.
Next, the interpreter or VM starts with the list of registered active handles. These handles correspond to (named) variables. As a handle is processed, it is first marked as used in the universal array of pointers (or a flag in the allocated object itself).
Then, the interpreter or VM locates the special segment of handles in the object being referenced, and perform the same operation recursively with those handles. This process ends with a handle that is already marked used, or when an object that contains no handles to other objects.
As this process terminates, all the handles in the universal array of pointers are either marked as used or reset as unused. The objects referenced by all the unused handles are then deleted.
Note that this will eliminate circular data structures that are not reachable by the program!
In terms of implementation, a min-heap can be used to maintain pointers to all the dynamically allocated objects. This way, we can remove items from the heap (as the root), move the item to the next lowest available location and repeat the process until the heap is empty.
WhereIs
for
variables in a problem. Of course, it still relies on the programmer
to use the subclass only for variables, and use the base class
for members in classes.
However, the most difficult part is tracking handles (WhereIs
es)
as data members of classes. Without this capability, it is not possible to
mark all allocated memory objects that are in use.
This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation module.latex
The translation was initiated by Tak Auyeung on 2010-01-07