|
|
Pointers (References) And GoTo Statements Back to School page
By Aaron Penner
CU-Denver Department of Computer Science and Engineering
CSC 5535
Fundamental Concepts in Programming Languages
Spring 1999
Instructor: Jim Schatzmam
Pointers (References) And GoTo Statements
By Aaron Penner
Was the introductions of pointers and goto statements into programming
languages a major flaw or set back? Robert W. Sebesta’s book, "Concepts
Of Programming Languages," makes the statement that pointers and goto statements
can be compared to each other. This book also quotes Hoare as saying, "Their
(pointer) introduction into high level languages has been a step backward
from which we may never recover (Sebesta, 247)." The discussion to follow
will look into this controversial subject and draw some conclusions.
Introduction
The discussion of goto statements has been a popular discussionary subject
but once pointers were introduced into languages a whole new area of discussion
opened up.
Sebesta’s book doses a comparison of goto and pointer statements with
this quote, "The goto statement widens the range of statements that can
be executed next. Pointer variables widen the range of memory that can
be referenced by a variable. "Another work that is useful to read for this
discussion is, "Go To Statement Considered Harmful," by Edsger W. Dijkstra,
which is a discussion focused mainly on the readability of code using the
goto statement. The final document that will be considered for this discussion
is, "Hints On Programming Language Design," by C.A.R. Hoare, which is an
overall guide to programming language design. Both of the statements in
question raise the common programming questions of readability, simplicity,
flexibility, execution speed, efficiency, and alternatives.
Compare & Contrast: pointers and goto statements
The first way to compare goto and pointer statements is to look at them
literally.
Goto End; int *pointer;
… int array[10];
End: pointer = &array[2];
Print end; printf ("index at 3: %d\n", pointer);
*person.name or person->name
The goto and pointer statements do not look syntactically alike but
there meaning is similar in the fact that the goto statement jumps to a
different part of code, where a pointer jumps to a different part of memory
or address space. The full quote from the paper, "Hints On Programming
Language Design, " by C. A. R. Hoare says, "References are like jumps,
leading wildly from one part of a data structure to another. Their introduction
into high level languages has been a step backward from which we may never
recover." Pointers can be used to reference a function, and because of
this, they can be used to jump around in code execution in the same manner
a goto statement does.
Non-literally, the two statements are more alike, because they both
are thought to be dangerous when programming. One of the reasons why the
two statements are feared is because they are very powerful and if used
recklessly can create very bad situations. The attitudes about goto statements
have been discussed for years, as the paper by Dijkstra shows. Because
of the dislike for the goto statement many new program control structures
were created and incorporated into programming languages which could be
used to replace the goto statement. The goto statement was an essential
part of programming languages, but over time, programmers were apt to use
a goto statement where good code design would have not have permitted.
The case statement or conditional branch statement, is an example of one
such statement which was added to programming languages to implement a
once legitimate use of the goto statement.
In the remainder of the discussion, the term pointer and reference will
be used interchangeably, because even at a low level, a C pointer is just
a reference to an address which contains variable or structure space. Even
though C has been tarnished for using a reference which is implemented
directly to address space, there is no need to overlook the designer of
C’s original solution to the problem of accessing an actual location or
storage space of a variable. In C, pointers are implemented as direct addresses
into memory, which if not handled properly, can lead to memory errors of
many varieties. Also, pointers are feared because they are powerful tools
for representing large blocks of structured data, which can be passed around
by a reference. To make things even more complicated in the C programming
language, there are a number of different ways to represent memory locations
and their actual values and storage space. The above issue with C pointers
is the reason why people have categorized the goto and pointer statements
into the same category of bad programming language design.
Discuss the use or appearance of pointers in modern programming Languages
Among the modern languages are Fortran, C, C++, ANSI C, and Java I have
chosen to expand upon these because they are not only modern languages,
but they also fit into the category of do-anything languages. The following
is a comparison table of the languages:
| |
GOTO |
BREAK |
CONTINUE |
EXIT |
Reference: and Implementation |
| Fortran 90 |
YES |
NO |
YES |
YES |
Pointer, but not as a memory address but rather an alias. |
| C,C++,ANSI C |
YES |
YES |
YES |
YES |
Pointer, as a memory address |
| JAVA |
NO |
YES |
YES |
YES |
Referencing and dereferencing of objects done with garbage collection. |
The chart above shows how the different languages handle control structures
and pointers. The trend we see is that the goto statement has been phased
out and that different methods are being used to represent references to
objects or structures. As a historical note in the programming language
timeline, we see that the goto statement was just starting to get a bad
reputation at the point when pointers started showing up in new programming
languages or when pointers were being phase into new releases of older
programming languages. From the previous correlation we see that the goto
statement was on the bad list just as people were beginning to find the
pitfalls of the new concept of pointers, which made pointers a likely candidate
for resentment as well. The main point to see here, is that the concept
of pointers is not flawed by conception, but in implementation and even
in that matter, good code is still at the discretion of the programmer.
The C pointer may be a bad example for implementing a reference for other
languages to follow, but the introduction of references is certainly not
a step back in programming language technology as Hoare contends.
Discuss the reasons for which pointers are necessary and also the
alternatives to pointer use
The issue is whether pointers should be replaced in somewhat the same
manner as the goto statement was phased out with the replacement of new
structures. The idea of getting rid of pointers should really be focused
on getting rid of the way C implements pointers. However, embedding of
references in languages is something that can probably never be totally
removed. Some new languages are built totally on a system of referenced
to objects. For this issue to be fully argued, we need to understand why
C implements pointers as references, and what Java has done to remove the
memory leaks, that C has been plagued with, in their design.
In defense of the C programming language, C is a do anything language
that can even be used to program device drivers, a task which Java, a fairly
new language, cannot touch. The C programming language was designed to
handle anything that assembly can do, but just at a higher level from the
hardware. Because of the flexibility in the language, C can do all of the
address manipulation that assembly language can do. One such use of addresses
by C, is adding the addresses of to variables together and referencing
the data at their sum. One of the benefits of the C languages is that is
has very fast compilers, which can also be considered as a weakness because
the compilers may be faster but they do limited checking. The C programming
community has fixed this problem, to some degree, by adding tools such
as Lint to their repertoire to be used during development. Another tool
which C programs need is the use of a runtime memory checker to verify
that dynamic memory is not written over or accessed without being allocated.
The real problem here lies with the fact that C has been around since 1971,
and that the compilers for C have seen little evolution during this time
period to do a better job of type checking.
The alternative to the C pointers is a reference handled in a way that
does not give the programmer access to memory locations. The Java programming
language is an excellent example of such a language, and I will let the
Java enthusiast defend themselves with the following passages from, "JAVA
In A Nushell," by David Flanagan.
"One of the things that makes Java simple is its lack of pointer and
pointer
arithmetic. This feature also increases the robustness of Java programs
by
abolishing an entire class of pointer-related bugs (pg. 6)."
"The referencing and dereferencing of objects is handled for you automatically
by
Java. Java does not allow you to manipulate pointers or memory addresses
of any
kind:
-
It does not allow you to cast object or array references into integers
or vice-versa.
-
It does not allow you to do pointer arithmetic.
-
It does not allow you to compute the size in bytes of any primitive type
or object.
There are two reasons for these restrictions:
-
Pointers are a notorious source of bugs. Eliminating them simplifies the
language and eliminates many potential bugs.
-
Pointer and pointer arithmetic could be used to sidestep Java’s run-time
check
and security mechanisms. Removing pointer allows Java to provide the
security guarantees that it does (pg. 26)."
For the most part, references to structures or objects have throughout
new programming languages, the only thing that has changed is the way they
are implemented. The fact remains that the new programming languages have
even embraced the idea more and used references more often. New languages
such a Java have expounded the idea so much as to use instantiation all
over the place, but Java has changed the implementation to avoid the pit
falls of C pointers. In the long run, choosing the right programming language
for a certain project is very important,, therefore one should think about
using a more abstract language such as Java, unless the project is to write
drive drivers and other software written for a specific piece of hardware.
Summary
Since we have examined the issues relating to references, there is no
reason to consider that they should be written out of new languages. The
real issue is what to do about the pointers in C. The reason for keeping
C around, or other languages which can access memory directly, is that
writing device drivers and software for specific hardware can be done easier
with a high-level language.
The reason that some projects should use a language like Java could
be that security is a greater concern and no hardware low-level direct
interaction is needed. The problem that is found with Java is that is lacks
the performance that C gives, although Java has opened the door for whatever
the next programming language technologies will be.
The idea that goto and pointer statements are alike can been traced
back to the concept that both statements have been very powerful and have
led to many different programming errors. The fact is that the programmers
need to take more of the responsibility for making pointer errors so prevalent.
When combining the theories of both Dijkstra and Hoare, there is no doubt
that there are similarities between the goto and pointer statements. Both
of these statements do a type of jumping which can cause errors, the only
real difference is in the type of errors that are produced from the two.
The goto statement makes coding less readable because the goto statement
makes tracing through a program in a linear style very difficult. Pointer
statements on the other hand, create memory problems through leaking, over-write,
and access to un-initialized memory.
Conclusion
The similarities in the way programmers have viewed goto and pointer
statements is undeniable. The real intrigue is why they are both viewed
as dangerous. The reason that most things are feared is because they are
very powerful; thus it is with pointers and goto statements. Furthermore,
both of the programming statements discussed in this paper make the programming
languages their in very flexible, and with flexibility comes the ability
to create problems when assumptions are made and logic statements are not
carefully coded. From another aspect, pointers have become a necessary
evil. That is, if you want to code a device driver or other specialty software
for hardware in some other language then assembly, you need a high-level
language with an equal amount of power as its foundation. There is also
the issue that is there was total knowledge of a subject, and compilers
and tools could be written to catch all problems with coding, then the
perfect answer to the programming problem could be found. In reality, until
then, we must choose a language that fits onto our comfort zone’s slide-rule
for performance and security. The truth about software design and the languages
which we can use to implemented the design can still be summed up for the
words from Brook’s article, "No Silver Bullet."
"There is no single development, in either technology or management
technique, which by itself promises even one order of magnitude improvement
in productivity, in reliability, in simplicity (Brooks, 181)."
"The most a high-level language can do is to furnish all the constructs
the programmer imagines in the abstract program. To be sure, the level
of our sophistication in thinking about data structures, data types, and
operations is steadily rising, but at an ever-decreasing rate (Brooks,
186)."
Brooks, Frederick P., Jr. The Mythical Man-Month. AddisonWesley:
England, 1995.
Robert, Sebesta, W. Concepts Of Programming Languages. AddisonWesley:
England, 1999.
(http://www.acm.org/classics/oct95/)
Go To Statement Considered Harmful
Edsger W. Dijkstra
Reprinted from Communications of the ACM,
Vol. 11, No. 3, March 1968, pp. 147-148.
Copyright (c) 1968, Association for
Computing Machinery, Inc.
This is a digitized copy derived from an ACM
copyrighted work. It is not guaranteed to be an
accurate copy of the author's original work.
Key Words and Phrases:
go to statement, jump instruction, branch
instruction, conditional clause, alternative
clause, repetitive clause, program intelligibility,
program sequencing
Editor:
For a number of years I have been familiar with
the observation that the quality of programmers
is a decreasing function of the density of go to
statements in the programs they produce.
More recently I discovered why the use of the
go to statement has such disastrous effects, and
I became convinced that the go to statement
should be abolished from all "higher level"
programming languages (i.e. everything except,
perhaps, plain machine code). At that time I
did not attach too much importance to this
discovery; I now submit my considerations for
publication because in very recent discussions
in which the subject turned up, I have been
urged to do so.
My first remark is that, although the
programmer's activity ends when he has
constructed a correct program, the process
taking place under control of his program is the
true subject matter of his activity, for it is this
process that has to accomplish the desired
effect; it is this process that in its dynamic
behavior has to satisfy the desired
specifications. Yet, once the program has been
made, the "making' of the corresponding
process is delegated to the machine.
My second remark is that our intellectual
powers are rather geared to master static
relations and that our powers to visualize
processes evolving in time are relatively poorly
developed. For that reason we should do (as
wise programmers aware of our limitations) our
utmost to shorten the conceptual gap between
the static program and the dynamic process, to
make the correspondence between the
program (spread out in text space) and the
process (spread out in time) as trivial as
possible.
Let us now consider how we can characterize
the progress of a process. (You may think
about this question in a very concrete manner:
suppose that a process, considered as a time
succession of actions, is stopped after an
arbitrary action, what data do we have to fix in
order that we can redo the process until the
very same point?) If the program text is a pure
concatenation of, say, assignment statements
(for the purpose of this discussion regarded as
the descriptions of single actions) it is sufficient
to point in the program text to a point between
two successive action descriptions. (In the
absence of go to statements I can permit
myself the syntactic ambiguity in the last three
words of the previous sentence: if we parse
them as "successive (action descriptions)" we
mean successive in text space; if we parse as
"(successive action) descriptions" we mean
successive in time.) Let us call such a pointer to
a suitable place in the text a "textual index."
When we include conditional clauses (if B then
A), alternative clauses (if B then A1 else A2),
choice clauses as introduced by C. A. R.
Hoare (case[i] of (A1, A2,..., An)),or
conditional expressions as introduced by J.
McCarthy (B1 -> E1, B2 -> E2, ..., Bn ->
En), the fact remains that the progress of the
process remains characterized by a single
textual index.
As soon as we include in our language
procedures we must admit that a single textual
index is no longer sufficient. In the case that a
textual index points to the interior of a
procedure body the dynamic progress is only
characterized when we also give to which call
of the procedure we refer. With the inclusion of
procedures we can characterize the progress
of the process via a sequence of textual
indices, the length of this sequence being equal
to the dynamic depth of procedure calling.
Let us now consider repetition clauses (like,
while B repeat A or repeat A until B).
Logically speaking, such clauses are now
superfluous, because we can express repetition
with the aid of recursive procedures. For
reasons of realism I don't wish to exclude them:
on the one hand, repetition clauses can be
implemented quite comfortably with present
day finite equipment; on the other hand, the
reasoning pattern known as "induction" makes
us well equipped to retain our intellectual grasp
on the processes generated by repetition
clauses. With the inclusion of the repetition
clauses textual indices are no longer sufficient
to describe the dynamic progress of the
process. With each entry into a repetition
clause, however, we can associate a so-called
"dynamic index," inexorably counting the
ordinal number of the corresponding current
repetition. As repetition clauses (just as
procedure calls) may be applied nestedly, we
find that now the progress of the process can
always be uniquely characterized by a (mixed)
sequence of textual and/or dynamic indices.
The main point is that the values of these
indices are outside programmer's control; they
are generated (either by the write-up of his
program or by the dynamic evolution of the
process) whether he wishes or not. They
provide independent coordinates in which to
describe the progress of the process.
Why do we need such independent
coordinates? The reason is - and this seems to
be inherent to sequential processes - that we
can interpret the value of a variable only with
respect to the progress of the process. If we
wish to count the number, n say, of people in
an initially empty room, we can achieve this by
increasing n by one whenever we see someone
entering the room. In the in-between moment
that we have observed someone entering the
room but have not yet performed the
subsequent increase of n, its value equals the
number of people in the room minus one!
The unbridled use of the go to statement has an
immediate consequence that it becomes terribly
hard to find a meaningful set of coordinates in
which to describe the process progress.
Usually, people take into account as well the
values of some well chosen variables, but this is
out of the question because it is relative to the
progress that the meaning of these values is to
be understood! With the go to statement one
can, of course, still describe the progress
uniquely by a counter counting the number of
actions performed since program start (viz. a
kind of normalized clock). The difficulty is that
such a coordinate, although unique, is utterly
unhelpful. In such a coordinate system it
becomes an extremely complicated affair to
define all those points of progress where, say,
n equals the number of persons in the room
minus one!
The go to statement as it stands is just too
primitive; it is too much an invitation to make a
mess of one's program. One can regard and
appreciate the clauses considered as bridling its
use. I do not claim that the clauses mentioned
are exhaustive in the sense that they will satisfy
all needs, but whatever clauses are suggested
(e.g. abortion clauses) they should satisfy the
requirement that a programmer independent
coordinate system can be maintained to
describe the process in a helpful and
manageable way.
It is hard to end this with a fair
acknowledgment. Am I to judge by whom my
thinking has been influenced? It is fairly obvious
that I am not uninfluenced by Peter Landin and
Christopher Strachey. Finally I should like to
record (as I remember it quite distinctly) how
Heinz Zemanek at the pre-ALGOL meeting in
early 1959 in Copenhagen quite explicitly
expressed his doubts whether the go to
statement should be treated on equal syntactic
footing with the assignment statement. To a
modest extent I blame myself for not having
then drawn the consequences of his remark
The remark about the undesirability of the go
to statement is far from new. I remember
having read the explicit recommendation to
restrict the use of the go to statement to alarm
exits, but I have not been able to trace it;
presumably, it has been made by C. A. R.
Hoare. In [1, Sec. 3.2.1.] Wirth and Hoare
together make a remark in the same direction
in motivating the case construction: "Like the
conditional, it mirrors the dynamic structure of
a program more clearly than go to statements
and switches, and it eliminates the need for
introducing a large number of labels in the
program."
In [2] Guiseppe Jacopini seems to have proved
the (logical) superfluousness of the go to
statement. The exercise to translate an arbitrary
flow diagram more or less mechanically into a
jump-less one, however, is not to be
recommended. Then the resulting flow diagram
cannot be expected to be more transparent
than the original one.
References:
1. Wirth, Niklaus, and Hoare C. A. R. A
contribution to the development of ALGOL.
Comm. ACM 9 (June 1966), 413-432.
2. Böhm, Corrado, and Jacopini Guiseppe.
Flow diagrams, Turing machines and languages
with only two formation rules. Comm. ACM 9
(May 1966), 366-371.
Edsger W. Dijkstra
Technological University
Eindhoven, The Netherlands
STAN-CS-73-403
HINTS ON PROGRAMMING LANGUAGE DESIGN
BY C. A. R. HOARE
SUPPORTED BY
ADVANCED RESEARCH PROJECTS AGENCY
ARPA ORDER NO. 24y4
PROJECT CODE 3D30
DECEMBER 1973
COMPUTER SCIENCE DEPARTMENT
School of Humanities and Science
- STANFORD UNIVERSITY
(http://elib.stanford.edu/)
……..
8. Variables
One of the most powerful and most dangerous aspects of machine code
programming is that each individual instruction of the code can change the
content of any register, any location of store, and alter the condition of
any peripheral: it can even change its neighboring instructions or itself.
Worse still, the identity of the location changed is not always apparent
from the written form of the instruction; it cannot be determined until
run time, when the values of base registers, index registers, and indirect
addresses are known. This does not matter if the program is correct, but
if there is the slightest error, even only in a single bit, there is no
limit to the damage which may be done, and no limit to the difficulty of
tracing the cause of the damage. In summary, the interface between
every two consecutive instructions in a machine code program consists of
the state of the entire machine -- registers, mainstore, backing stores
and all peripheral equipment.
In a high level language, the programmer is deprived of the dangerous
power to update his own program while it is running. Even more valuable,
he has the power to split his machine into a number of separate variables,
arrays, files, etc.; and when he wishes to update any of these, he must
quote its name explicitly on the left of the assignment so that the identity
of the part of the machine subject to change is immediately apparent; and
finally, a high level language can guarantee that all variables are disjoint,
and that updating any one of them cannot possibly have any effect on any
other.
Unfortunately, many of these advantages are not maintained in the
design of procedures and parameters in ALGOL 60 and other languages.
But instead of mending these minor faults, many language designers have
preferred to extend them throughout the whole language by introducing
the concept of reference, pointer, or indirect address into the language
as an assignable item of data. This immediately gives rise in a high
level language to one of the most notorious confusions of machine code,
namely that between an address and its contents. Some languages attempt
to solve this by even more confusing automatic coercion rules. Worse
still, an indirect assignment through a pointer, just as in machine code,
can update any store location whatsoever, and the damage is no longer
confined to the variable explicitly named as the target of assignment.
For example, in ALGOL 68, the assignment
x^a- -y;
always changes x , but the assignment
x:=y+l;
if x is a reference variable may change any other variable (of appropriate
type) in the whole machine. One variable it can never change is x !
Unlike all other values (integers, strings, arrays, files, etc.) references
have no meaning independent of a particular run of a program. They cannot
be input as data, and they cannot be output as results. If either data
or references to data have to be stored on files or backing stores, the
problems are immense. And on many machines they have a surprising
overhead on performance, for example they will clog up instruction
pipe-lines, data lookahead, slave stores, and even paging systems.
References are like jumps, leading wildly from one part of a data
structure to another. Their introduction into high level languages has
been a step backward from which we may never recover.
…….
Back to School page
|