Module 0026: Basic data structure mechanisms

Variables can be allocated statically. Any storage that is allocated statically has a lifespan of the execution of a program. This means such variables start to exist as soon as a program is loaded, and they only cease to exist when a program exits.

Do not confuse the scope of a variable with its lifespan. In other words, even when a static variable is not in view, it continues to retain its state.

One major disadvantage of static variables is that the size allocated cannot change as a program executes. If an array has a size of n integers, it can only have n elements throughout the execution of a program. The size of a static array cannot grow nor shrink.

2.2 Auto

Auto variables in C are allocated upon the entry of a block, and deallocated upon the exit of a block. As a result, auto variables are only suitable for localized use that do not require a variable to retain its value across the exit and reentry of a block.

The main advantage of auto variables is that they are allocated on-demand. This means that there is little waste of resources. However, for the purposes of data structure, auto variables are not particularly useful.

2.3 Dynamic

Variables cannot be allocated dynamically. However, objects without names can be allocated dynamically. Dynamically allocated objects can only be accessed by pointers (also known as handles).

A dynamically allocated object exists as soon as it is allocated. It ceases to exist only when the object is explicitly deallocated (freed or deleted).

Dynamically allocated objects have the advantage of on-demand like auto variables. However, it is better than auto variables because the lifespan of an dynamically allocated object does not depend on blocks, at all.

Although this is an attribute of the implementation of dynamically allocated objects, it is important. The very nature of allocating storage dynamically leads to problems like memory fragmentation (using the malloc and free calls in C). In some other implementations, memory fragmentation is resolved by free space collection (garbage collection). However, this leads to other problems.

3 Pointers

In the topic of data structure, pointers are extremely important. This is because it permits a structure to be linked to another structure.

3.1 Basic pointers

A pointer does not necessarily point to any location that is accessible. In fact, when a pointer variable is first created, there is no guarantee that it points to any accessible location. A pointer needs to be initialized in order to be useful.

The most basic method to initialize a pointer is to assign the address of a variable (or elements in an array) to it. In our example, we can do the following:

We can also “initialize” a pointer by passing the address of a variable to a pointer parameter. The following example illustrates this. It also illustrates how a pointer can be dereferenced.

3.2 Pointers as a field

Just as a field in a structure can be of any type, it can also be of a pointer type. The following is an example:

3.3 Self pointers?

This is the interesting part. The field of a structure can point to something of the containing type. It is better to illustrate this with an example:

It may look confusing. How can a pointer point to a structure that is not even fully defined, yet? The definition may seem recursive. The key to understand this is to look at things from the perspectives of the compiler.

As it turns out, all pointers are implemented the same way. To a processor, there is no difference between a pointer to an integer, a character or a struct X. The “typing” of pointers is only there so that a compiler can perform type checking. In other words, a compiler checks to make sure the way a pointer is used is consistent throughout a program.

From this perspective, there is nothing special about the structure definition. All we are saying is that the field pX can only point to a structure of type struct X. It doesn’t matter if we are in the process of defining struct X.

We’ll be using a lot of structure definitions that contain pointers to its own type.

4 Pass-by-value vs Pass-by-reference

This is an interesting topic. In C, there is no pass-by-reference, period. However, in C++, pass-by-reference is supported. Because CISP430 only has a prerequisite of CISP360, I cannot assume C++ knowledge.

4.1 Pass-by-value

“Pass-by-value” refers to the mechanism of “give a subroutine a copy [of a value] to play with”. Let us consider the following example:

This is a fairly simple example that prints the value of parameter x to the standard output. Now, let us consider an invocation (call) of the subroutine:

In this invocation, the expression z is evaluated, then a copy of this value is given to subroutine sub1. Because we are only working with a copy of the expression, we can use any expression that evaluates to an integer value, such as the following:

You can consider pass-by-value as a safe method to provide extra information to a subroutine. This is because the subroutine cannot change the value of any variable or object that belongs to the caller. Let us consider another example:

A good compiler should give you a warning. This is because what we do to parameter x remains local to the subroutine. We can incrementing a copy of whatever expression is used to specify the argument. For example, we may have the following invocation:

What is provided to subroutine inc is not a method to find variable z, but rather just the current value of z. Parameter x in subroutine sub1 works with a copy of the value of z. The incremented value is not related to variable z, at all.

In other words, if you have an object or variable to be altered by a subroutine, pass-by-value should not be used.

4.2 Pass-by-reference, C++

Note that the & symbol does not mean address-of. It means that “x is a reference to an integer”. Consequently, when we see x in subroutine inc, it means “whatever x is referring to.”

In the expression x = x + 1;, the right hand side says: “evaluate the sum of 1 and the value referred to by x.” The left hand side says “store the value to whatever storage referred by x.”

The value of z (of the caller) will increment in this case. This is because the subroutine inc is not given the value (or a snapshot there of) of argument z, it is given the method to find variable z. As a result, whatever we do to parameter x is done to argument z.

It only makes sense the the argument used to specify parameter x must be the storage of an integer. This means that the following invocation will result in a compiler error:

This results in an error because the expression z+1 only specifies a value, but not a place to store an integer.

4.3 Pass-by-reference, C

C has no pass-by-reference mechanism. When a subroutine needs to modify an object or variable that does not belong to it, it needs to resort to pointers. In our example, this is how a C subroutine looks like:

pInt is a pointer to an integer. The value of pInt is not an integer, but the address of the storage of an integer. In the statement in the subroutine, *pInt refers to “whatever integer pInt points to.” On the left hand side of the assignment, *pInt specifies where to store the result of the right hand side.

Although this can accomplish the same as the C++ code in section 4.2, it is also more dangerous. Let us consider the following incorrect code:

This code compiles just fine, it may generate a warning message. However, it does not do what the earlier code does. pInt refers to the parameter itself. The assignment statement increments the pointer itself, but it does not do anything to what the pointer points to. There is no dereference operator!

5 Data protection

Data protection has different meanings in different context. In the context of data structure, this means that code that should not access the internal structure of objects cannot.

# include <stdlib.h >
# include ”complex.h”
struct complex {
  float r, i;
};

void multComplex(void *x, void *y, void *product)
{
  struct complex *cx = (struct complex *)x;
  struct complex *cy = (struct complex *)y;
  struct complex *cp = (struct complex *)product;

  cp->r = cx->r * cy->r - cx->i * cy->i;
  cp->i = cx->i * cy->r + cx->r * cy->i;
}

void *newComplex(float r, float i)
{
  struct complex *pC;

   pC = (struct complex *)malloc(sizeof(struct complex));
   pC ->r = r;
   pC ->i = i;

  return pC;
}

void delComplex(void *x)
{
  struct complex *pC = (struct complex *)x;
  free(x);
}

Note that complex.c is the only file that knows the structure of a struct complex type. It has three subroutines defined, one for creating a complex (newComplex), one for deleting a complex (delComplex), and one for multiplying two complex numbers (multComplex).

The header file does not mention anything about struct complex, at all. It only uses void pointers.

Because the user only includes complex.h, it does not have any knowledge of the inside of a struct complex. This makes it impossible for the main program to modify a complex number because it doesn’t know how!

But why do we want to go through all the trouble to protect a data type struct complex? This permits the author of the complex number module stay completely insulated from the users of the complex number type, and vice versa. In fact, this mechanism allows the author of the complex type to distribute just complex.h and complex.o (the object file of complex.c). There is no need to distribute the source code related to struct complex, at all!