Module 0078: Local var., by value, or by reference?
Tak Auyeung, Ph.D.
November 8, 2018
Contents
1 About this module
- Prerequisites: 0076, 0073
- Objectives: This module discuss how we can determine whether a name used in a subroutine should be a local
variable, a by-value parameter, or a by-reference parameter.
2 Abstracting a subroutine
As much as we want to practice top-down design (module 0049), it is often not possible design a program completely using the
top-down method. Chances are that we will write code that is too long, too tedious, and too repetitive. As a result, we need to
know how to restructure programs to make programs easier to maintain.
Let us begin with an example in listing 1.
Listing 1: tedious
1// listing 1 2// tedious code to accept numbers with range checking 3local 4local 5repeat 6 print "enter a number between 1 and 100"
7 read
8 if then 9 print "error: the number is not between 1 and 100"
10 end if 11until 12repeat 13 print "enter a number between "
" and 100"
14 read
15 if then 16 print "error: the number is not between "
" and 100"
17 end if 18until 19local 20 21while do 22 print
23 24end while
The code in listing 1 prompts the user to enter two numbers.
is the lower
bound, and
is the upper bound. Both numbers should be between 1 and 100, with the additional requirement that
cannot be less than
(they can be the same). The last
loop prints a range of numbers from
to .
2.1 Recognizing similarities
Referring back to listing 1, the post-checking loop staring on line 5 is structurally similar to the post-checking loop starting on line
12:
- Both loops are post-checking.
- Both loops prompt the range.
- Both loops accept an input.
- Both loops check the range.
- Both loops print an error message if the number is out of range.
Because there are so many similarities, we can consider abstracting both post-checking loops as a single subroutine.
2.2 Identify differences
Our next step is to identify the differences between the two post-checking loops. Listing 2 is a “template” that fits both
post-checking loops. When there is a difference, we replace the disagreement with “?”.
Listing 2: template
1// listing 2 2// a template with differences marked as ? 3repeat 4 print "enter a number between " ? " and 100"
5 read ?
6 if then 7 print "error: the number is not between " ? " and 100"
8 end if 9until
2.3 Assign names
Once we identify the differences, we also have to find consistent uses of the same name at the “?” points. If both
post-checking loops consistently use the same name within the post-checking loop, we use a unique name in the template
code.
For example, wherever we use the variable
on lines 7, 7 and 11 in the first post-checking loop, we also use the variable
consistently on lines 14, 14 and 18 in the second post-checking loop.
As a result, where appears
in the first loop, and
appears in the second loop, we can replace those “?” marks with a consistent name
() in the
template, yielding the code in listing 3.
Listing 3: template
1// listing 3 2// template with a name "n" 3repeat 4 print "enter a number between " ? " and 100"
5 read
6 if then 7 print "error: the number is not between " ? " and 100"
8 end if 9until
Using the same method, we also identify that wherever “1” appears in the first loop, the second loop consistently use variable
. As a result, we can generalize
that place holder as a new name .
This yields the code in listing 4
Listing 4: template
1// listing 4 2// template with a names "n" and "b" 3repeat 4 print "enter a number between "
" and 100"
5 read
6 if then 7 print "error: the number is not between "
" and 100"
8 end if 9until
2.4 Get rid of literal numbers
Generally speaking, it is a good idea to remove literal constant terms in an algorithm. In our example, this is the “100”
that appears a few times consistently in both loops. We proceed to replace all occurances of “100” with the name
. This
yields the code in listing 5.
Listing 5: template
1// listing 5 2// template with names "n", "b" and "e" 3repeat 4 print "enter a number between "
" and "
5 read
6 if then 7 print "error: the number is not between "
" and "
8 end if 9until
At this point, we have a template that is quite general. We use
to represent the number to be
entered (by a user), to specify the
lower bound of the number, and
to specify the upper bound of the number.
3 Attributes of a name
In this section, we try to identify what name(s) in an algorithm should be a local variable, a by-value parameter, or a by-reference
parameter. Some of these are rules of thumb, while others are more strict.
The following rules should be evaluated in the specified order. In other words, if a name fits an earlier rule, its type should be
determined by the earlier rule. Evaluate more rules only if a name does not fit earlier rules.
3.1 A read-only name
A name that is read-only in an algorithm should be a by-value parameter.
The following are the rationale:
- A local variable is not initialized. As a result, a local variable must be initialized (written) first, then be read. This
means that a name that is only read in an algorithm should not be a local variable.
- A by-reference variable has the advantage of being to make changes to variables that do not belong to the subroutine
itself. However, this implies that to make a parameter worthwhile to be passed by reference, there should be at least
one write operation to the name.
3.2 A read-first name
A name that has a read access as the first access cannot be a local variable. This is because a local variable does not have any
specific initial value, therefore it make no sense to read a local variable as the first access.
3.3 A write-only name
A name that is only written to should be a by-reference parameter.
Rationale as follows:
- A local variable is destroyed when a subroutine returns, therefore losing its value. Furthermore, the scope of a local
variable is limited to the containing subroutine. As a result, it makes no sense to only write to a local variable since
no one else can utilize the value.
- A by-value is like a local variable, but with its value initialized by the invoking statement. If a name is only written
to in an algorithm, it does not make any sense to make it a by-value parameter, as the initial value is overwritten
without being used. Furthermore, since the value of a local variable is also lost when a subroutine returns, it follows
the same logic as that to rule out a local variable.
3.4 A write-first name
A name that is written to as the first operation is not a by-value parameter. This is because the benefit of a by-value parameter is
that the initial value is determined by the invoking code. If this value is overwritten anyway first thing in the code, then it does not
make sense to make the name a by-value parameter.
3.5 A write-last name
A name that is written to as the last operation is usually a by-reference parameter.
Here is the rationale:
- Both local variables and by-value parameters lose their values when a subroutine returns. As a result, it makes little
sense to write to a local variable or by-value parameter without reading it again before a subroutine returns.
3.6 A name that ends with a constant value
A name that always ends with the same constant value when a subroutine returns is usually a local variable. However, if the name
is already ruled out as a local variable, it is likely to be a by-value parameter.
Here is the rationale:
- Most of the time, an invoker does not need a subroutine to make one of its variable a constant value. The benefit
of a by-reference parameter is so that a subroutine can determine the value of a variable that belongs to its invoker.
If the value is always a constant, there is no need to invoke a subroutine to figure it out.
4 Examples
4.1 Bound-checked number input
Let us return to the code in listing 5, and determine the nature of each name.
- :
the only access of
is read, therefore it fits the rule in section 3.1.
must be a by-value parameter.
- :
same logic as .
It is a read-only name, therefore it is also a by-value parameter.
- :
the first access of
is a write access on line 5.
is read in other parts of the code. The final value of
is a value between
and
inclusively (not a constant). We can determine that it is not a by-value parameter using the rule in section 3.4. We
cannot determine it is a by-reference parameter using any rule. However, if
is a local variable, then the whole code serves no purpose. As a result,
is chosen to be a by-reference parameter.
As a result, the template code in listing 5 becomes a subroutine in listing 6.
Listing 6: sub
1// listing 6 2// subroutine of listing 5 3define sub readnum
4 by value 5 by value 6 by reference 7 repeat 8 print "enter a number between "
" and "
9 read
10 if then 11 print "error: the number is not between "
" and "
12 end if 13 until 14end define sub
Once we have subroutine “readnum”, we can proceed to change the code in listing 1 to utilize the subroutine. The cleaned up code
is in listing 7.
Listing 7: clean
1// listing 7 2// cleaned up code using a subroutine call to replace listing 1 3local 4local 5invoke readnum
6invoke readnum
7local 8 9while do 10 print
11 12end while
4.2 Factoring code
Let us revisit the code to factor a number from module 0049. It is relisted here as listing 8.
Listing 8: factoring
1repeat 2 invoke findfactor
3 print
4 5until
Let us analyze the use of each name:
-
is first read in the code because it is a by-value parameter to subroutine “findfactor”. This means
cannot be a local variable by the rule in seciton 3.2. However, we also note that
always ends up with a value of 1 when the subroutine returns. This means
is unlikely to be a by-reference parameter. We choose
to be a by-value parameter by the rule of 3.6.
-
is a hard one to determine by rules. The first access is actually ambiguous because it was passed-by-reference to
subroutine “findfactor”. Unless you read the subroutine “findfactor”, you don’t know that the first access is a write
access. Knowing that the first access is a write access,
is not a by-value parameter. From the context of the subroutine, however, we can determine that
is a local variable because the invoking code is not likely to want to know the value of
when the subroutine completes