RockCairn - Paper on Pointer and Goto

Pointers (References) And GoTo Statements Back to School page
By Aaron Penner

CU-Denver Department of Computer Science and Engineering
CSC 5535
Fundamental Concepts in Programming Languages
Spring 1999
Instructor: Jim Schatzmam

Pointers (References) And GoTo Statements
By Aaron Penner

Was the introductions of pointers and goto statements into programming languages a major flaw or set back? Robert W. Sebestas book, "Concepts Of Programming Languages," makes the statement that pointers and goto statements can be compared to each other. This book also quotes Hoare as saying, "Their (pointer) introduction into high level languages has been a step backward from which we may never recover (Sebesta, 247)." The discussion to follow will look into this controversial subject and draw some conclusions.

Introduction

The discussion of goto statements has been a popular discussionary subject but once pointers were introduced into languages a whole new area of discussion opened up.

Sebestas book doses a comparison of goto and pointer statements with this quote, "The goto statement widens the range of statements that can be executed next. Pointer variables widen the range of memory that can be referenced by a variable. "Another work that is useful to read for this discussion is, "Go To Statement Considered Harmful," by Edsger W. Dijkstra, which is a discussion focused mainly on the readability of code using the goto statement. The final document that will be considered for this discussion is, "Hints On Programming Language Design," by C.A.R. Hoare, which is an overall guide to programming language design. Both of the statements in question raise the common programming questions of readability, simplicity, flexibility, execution speed, efficiency, and alternatives.

Compare & Contrast: pointers and goto statements

The first way to compare goto and pointer statements is to look at them literally.

Goto End; int *pointer;

int array[10];

End: pointer = &array[2];

Print end; printf ("index at 3: %d\n", pointer);

*person.name or person->name

The goto and pointer statements do not look syntactically alike but there meaning is similar in the fact that the goto statement jumps to a different part of code, where a pointer jumps to a different part of memory or address space. The full quote from the paper, "Hints On Programming Language Design, " by C. A. R. Hoare says, "References are like jumps, leading wildly from one part of a data structure to another. Their introduction into high level languages has been a step backward from which we may never recover." Pointers can be used to reference a function, and because of this, they can be used to jump around in code execution in the same manner a goto statement does.

Non-literally, the two statements are more alike, because they both are thought to be dangerous when programming. One of the reasons why the two statements are feared is because they are very powerful and if used recklessly can create very bad situations. The attitudes about goto statements have been discussed for years, as the paper by Dijkstra shows. Because of the dislike for the goto statement many new program control structures were created and incorporated into programming languages which could be used to replace the goto statement. The goto statement was an essential part of programming languages, but over time, programmers were apt to use a goto statement where good code design would have not have permitted. The case statement or conditional branch statement, is an example of one such statement which was added to programming languages to implement a once legitimate use of the goto statement.

In the remainder of the discussion, the term pointer and reference will be used interchangeably, because even at a low level, a C pointer is just a reference to an address which contains variable or structure space. Even though C has been tarnished for using a reference which is implemented directly to address space, there is no need to overlook the designer of Cs original solution to the problem of accessing an actual location or storage space of a variable. In C, pointers are implemented as direct addresses into memory, which if not handled properly, can lead to memory errors of many varieties. Also, pointers are feared because they are powerful tools for representing large blocks of structured data, which can be passed around by a reference. To make things even more complicated in the C programming language, there are a number of different ways to represent memory locations and their actual values and storage space. The above issue with C pointers is the reason why people have categorized the goto and pointer statements into the same category of bad programming language design.

Discuss the use or appearance of pointers in modern programming Languages

Among the modern languages are Fortran, C, C++, ANSI C, and Java I have chosen to expand upon these because they are not only modern languages, but they also fit into the category of do-anything languages. The following is a comparison table of the languages:

	GOTO	BREAK	CONTINUE	EXIT	Reference: and Implementation
Fortran 90	YES	NO	YES	YES	Pointer, but not as a memory address but rather an alias.
C,C++,ANSI C	YES	YES	YES	YES	Pointer, as a memory address
JAVA	NO	YES	YES	YES	Referencing and dereferencing of objects done with garbage collection.

The chart above shows how the different languages handle control structures and pointers. The trend we see is that the goto statement has been phased out and that different methods are being used to represent references to objects or structures. As a historical note in the programming language timeline, we see that the goto statement was just starting to get a bad reputation at the point when pointers started showing up in new programming languages or when pointers were being phase into new releases of older programming languages. From the previous correlation we see that the goto statement was on the bad list just as people were beginning to find the pitfalls of the new concept of pointers, which made pointers a likely candidate for resentment as well. The main point to see here, is that the concept of pointers is not flawed by conception, but in implementation and even in that matter, good code is still at the discretion of the programmer. The C pointer may be a bad example for implementing a reference for other languages to follow, but the introduction of references is certainly not a step back in programming language technology as Hoare contends.

Discuss the reasons for which pointers are necessary and also the alternatives to pointer use

The issue is whether pointers should be replaced in somewhat the same manner as the goto statement was phased out with the replacement of new structures. The idea of getting rid of pointers should really be focused on getting rid of the way C implements pointers. However, embedding of references in languages is something that can probably never be totally removed. Some new languages are built totally on a system of referenced to objects. For this issue to be fully argued, we need to understand why C implements pointers as references, and what Java has done to remove the memory leaks, that C has been plagued with, in their design.

In defense of the C programming language, C is a do anything language that can even be used to program device drivers, a task which Java, a fairly new language, cannot touch. The C programming language was designed to handle anything that assembly can do, but just at a higher level from the hardware. Because of the flexibility in the language, C can do all of the address manipulation that assembly language can do. One such use of addresses by C, is adding the addresses of to variables together and referencing the data at their sum. One of the benefits of the C languages is that is has very fast compilers, which can also be considered as a weakness because the compilers may be faster but they do limited checking. The C programming community has fixed this problem, to some degree, by adding tools such as Lint to their repertoire to be used during development. Another tool which C programs need is the use of a runtime memory checker to verify that dynamic memory is not written over or accessed without being allocated. The real problem here lies with the fact that C has been around since 1971, and that the compilers for C have seen little evolution during this time period to do a better job of type checking.

The alternative to the C pointers is a reference handled in a way that does not give the programmer access to memory locations. The Java programming language is an excellent example of such a language, and I will let the Java enthusiast defend themselves with the following passages from, "JAVA In A Nushell," by David Flanagan.

"One of the things that makes Java simple is its lack of pointer and pointer

arithmetic. This feature also increases the robustness of Java programs by

abolishing an entire class of pointer-related bugs (pg. 6)."

"The referencing and dereferencing of objects is handled for you automatically by

Java. Java does not allow you to manipulate pointers or memory addresses of any

kind:

It does not allow you to cast object or array references into integers or vice-versa.
It does not allow you to do pointer arithmetic.
It does not allow you to compute the size in bytes of any primitive type or object.

There are two reasons for these restrictions:

Pointers are a notorious source of bugs. Eliminating them simplifies the language and eliminates many potential bugs.
Pointer and pointer arithmetic could be used to sidestep Javas run-time check

and security mechanisms. Removing pointer allows Java to provide the security guarantees that it does (pg. 26)."

For the most part, references to structures or objects have throughout new programming languages, the only thing that has changed is the way they are implemented. The fact remains that the new programming languages have even embraced the idea more and used references more often. New languages such a Java have expounded the idea so much as to use instantiation all over the place, but Java has changed the implementation to avoid the pit falls of C pointers. In the long run, choosing the right programming language for a certain project is very important,, therefore one should think about using a more abstract language such as Java, unless the project is to write drive drivers and other software written for a specific piece of hardware.

Summary

Since we have examined the issues relating to references, there is no reason to consider that they should be written out of new languages. The real issue is what to do about the pointers in C. The reason for keeping C around, or other languages which can access memory divectly, is that writing device drivers and software for specific hardware can be done easier with a high-level language.

The reason that some projects should use a language like Java could be that security is a greater concern and no hardware low-level direct interaction is needed. The problem that is found with Java is that is lacks the performance that C gives, although Java has opened the door for whatever the next programming language technologies will be.

The idea that goto and pointer statements are alike can been traced back to the concept that both statements have been very powerful and have led to many different programming errors. The fact is that the programmers need to take more of the responsibility for making pointer errors so prevalent. When combining the theories of both Dijkstra and Hoare, there is no doubt that there are similarities between the goto and pointer statements. Both of these statements do a type of jumping which can cause errors, the only real difference is in the type of errors that are produced from the two. The goto statement makes coding less readable because the goto statement makes tracing through a program in a linear style very difficult. Pointer statements on the other hand, create memory problems through leaking, over-write, and access to un-initialized memory.

Conclusion

The similarities in the way programmers have viewed goto and pointer statements is undeniable. The real intrigue is why they are both viewed as dangerous. The reason that most things are feared is because they are very powerful; thus it is with pointers and goto statements. Furthermore, both of the programming statements discussed in this paper make the programming languages their in very flexible, and with flexibility comes the ability to create problems when assumptions are made and logic statements are not carefully coded. From another aspect, pointers have become a necessary evil. That is, if you want to code a device driver or other specialty software for hardware in some other language then assembly, you need a high-level language with an equal amount of power as its foundation. There is also the issue that is there was total knowledge of a subject, and compilers and tools could be written to catch all problems with coding, then the perfect answer to the programming problem could be found. In reality, until then, we must choose a language that fits onto our comfort zones slide-rule for performance and security. The truth about software design and the languages which we can use to implemented the design can still be summed up for the words from Brooks article, "No Silver Bullet."

"There is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement in productivity, in reliability, in simplicity (Brooks, 181)."

"The most a high-level language can do is to furnish all the constructs the programmer imagines in the abstract program. To be sure, the level of our sophistication in thinking about data structures, data types, and operations is steadily rising, but at an ever-decreasing rate (Brooks, 186)."

Brooks, Frederick P., Jr. The Mythical Man-Month. AddisonWesley: England, 1995.

Robert, Sebesta, W. Concepts Of Programming Languages. AddisonWesley: England, 1999.

(http://www.acm.org/classics/oct95/)
Go To Statement Considered Harmful
Edsger W. Dijkstra
Reprinted from Communications of the ACM,
Vol. 11, No. 3, March 1968, pp. 147-148.
Copyright (c) 1968, Association for
Computing Machinery, Inc.

This is a digitized copy derived from an ACM
copyrighted work. It is not guaranteed to be an
accurate copy of the author's original work.

Key Words and Phrases:
go to statement, jump instruction, branch
instruction, conditional clause, alternative
clause, repetitive clause, program intelligibility,
program sequencing

Editor:
For a number of years I have been familiar with
the observation that the quality of programmers
is a decreasing function of the density of go to
statements in the programs they produce.
More recently I discovered why the use of the
go to statement has such disastrous effects, and
I became convinced that the go to statement
should be abolished from all "higher level"
programming languages (i.e. everything except,
perhaps, plain machine code). At that time I
did not attach too much importance to this
discovery; I now submit my considerations for
publication because in very recent discussions
in which the subject turned up, I have been
urged to do so.

My first remark is that, although the
programmer's activity ends when he has
constructed a correct program, the process
taking place under control of his program is the
true subject matter of his activity, for it is this
process that has to accomplish the desired
effect; it is this process that in its dynamic
behavior has to satisfy the desired
specifications. Yet, once the program has been
made, the "making' of the corresponding
process is delegated to the machine.

My second remark is that our intellectual
powers are rather geared to master static
relations and that our powers to visualize
processes evolving in time are relatively poorly
developed. For that reason we should do (as
wise programmers aware of our limitations) our
utmost to shorten the conceptual gap between
the static program and the dynamic process, to
make the correspondence between the
program (spread out in text space) and the
process (spread out in time) as trivial as
possible.

Let us now consider how we can characterize
the progress of a process. (You may think
about this question in a very concrete manner:
suppose that a process, considered as a time
succession of actions, is stopped after an
arbitrary action, what data do we have to fix in
order that we can redo the process until the
very same point?) If the program text is a pure
concatenation of, say, assignment statements
(for the purpose of this discussion regarded as
the descriptions of single actions) it is sufficient
to point in the program text to a point between
two successive action descriptions. (In the
absence of go to statements I can permit
myself the syntactic ambiguity in the last three
words of the previous sentence: if we parse
them as "successive (action descriptions)" we
mean successive in text space; if we parse as
"(successive action) descriptions" we mean
successive in time.) Let us call such a pointer to
a suitable place in the text a "textual index."
When we include conditional clauses (if B then
A), alternative clauses (if B then A1 else A2),
choice clauses as introduced by C. A. R.
Hoare (case[i] of (A1, A2,..., An)),or
conditional expressions as introduced by J.
McCarthy (B1 -> E1, B2 -> E2, ..., Bn ->
En), the fact remains that the progress of the
process remains characterized by a single
textual index.

As soon as we include in our language
procedures we must admit that a single textual
index is no longer sufficient. In the case that a
textual index points to the interior of a
procedure body the dynamic progress is only
characterized when we also give to which call
of the procedure we refer. With the inclusion of
procedures we can characterize the progress
of the process via a sequence of textual
indices, the length of this sequence being equal
to the dynamic depth of procedure calling.
Let us now consider repetition clauses (like,
while B repeat A or repeat A until B).
Logically speaking, such clauses are now
superfluous, because we can express repetition
with the aid of recursive procedures. For
reasons of realism I don't wish to exclude them:
on the one hand, repetition clauses can be
implemented quite comfortably with present
day finite equipment; on the other hand, the
reasoning pattern known as "induction" makes
us well equipped to retain our intellectual grasp
on the processes generated by repetition
clauses. With the inclusion of the repetition
clauses textual indices are no longer sufficient
to describe the dynamic progress of the
process. With each entry into a repetition
clause, however, we can associate a so-called
"dynamic index," inexorably counting the
ordinal number of the corresponding current
repetition. As repetition clauses (just as
procedure calls) may be applied nestedly, we
find that now the progress of the process can
always be uniquely characterized by a (mixed)
sequence of textual and/or dynamic indices.
The main point is that the values of these
indices are outside programmer's control; they
are generated (either by the write-up of his
program or by the dynamic evolution of the
process) whether he wishes or not. They
provide independent coordinates in which to
describe the progress of the process.

Why do we need such independent
coordinates? The reason is - and this seems to
be inherent to sequential processes - that we
can interpret the value of a variable only with
respect to the progress of the process. If we
wish to count the number, n say, of people in
an initially empty room, we can achieve this by
increasing n by one whenever we see someone
entering the room. In the in-between moment
that we have observed someone entering the
room but have not yet performed the
subsequent increase of n, its value equals the
number of people in the room minus one!
The unbridled use of the go to statement has an
immediate consequence that it becomes terribly
hard to find a meaningful set of coordinates in
which to describe the process progress.
Usually, people take into account as well the
values of some well chosen variables, but this is
out of the question because it is relative to the
progress that the meaning of these values is to
be understood! With the go to statement one
can, of course, still describe the progress
uniquely by a counter counting the number of
actions performed since program start (viz. a
kind of normalized clock). The difficulty is that
such a coordinate, although unique, is utterly
unhelpful. In such a coordinate system it
becomes an extremely complicated affair to
define all those points of progress where, say,
n equals the number of persons in the room
minus one!

The go to statement as it stands is just too
primitive; it is too much an invitation to make a
mess of one's program. One can regard and
appreciate the clauses considered as bridling its
use. I do not claim that the clauses mentioned
are exhaustive in the sense that they will satisfy
all needs, but whatever clauses are suggested
(e.g. abortion clauses) they should satisfy the
requirement that a programmer independent
coordinate system can be maintained to
describe the process in a helpful and
manageable way.

It is hard to end this with a fair
acknowledgment. Am I to judge by whom my
thinking has been influenced? It is fairly obvious
that I am not uninfluenced by Peter Landin and
Christopher Strachey. Finally I should like to
record (as I remember it quite distinctly) how
Heinz Zemanek at the pre-ALGOL meeting in
early 1959 in Copenhagen quite explicitly
expressed his doubts whether the go to
statement should be treated on equal syntactic
footing with the assignment statement. To a
modest extent I blame myself for not having
then drawn the consequences of his remark
The remark about the undesirability of the go
to statement is far from new. I remember
having read the explicit recommendation to
restrict the use of the go to statement to alarm
exits, but I have not been able to trace it;
presumably, it has been made by C. A. R.
Hoare. In [1, Sec. 3.2.1.] Wirth and Hoare
together make a remark in the same direction
in motivating the case construction: "Like the
conditional, it mirrors the dynamic structure of
a program more clearly than go to statements
and switches, and it eliminates the need for
introducing a large number of labels in the
program."
In [2] Guiseppe Jacopini seems to have proved
the (logical) superfluousness of the go to
statement. The exercise to translate an arbitrary
flow diagram more or less mechanically into a
jump-less one, however, is not to be
recommended. Then the resulting flow diagram
cannot be expected to be more transparent
than the original one.

References:
1. Wirth, Niklaus, and Hoare C. A. R. A
contribution to the development of ALGOL.
Comm. ACM 9 (June 1966), 413-432.
2. Böhm, Corrado, and Jacopini Guiseppe.
Flow diagrams, Turing machines and languages
with only two formation rules. Comm. ACM 9
(May 1966), 366-371.
Edsger W. Dijkstra
Technological University
Eindhoven, The Netherlands

STAN-CS-73-403
HINTS ON PROGRAMMING LANGUAGE DESIGN
BY C. A. R. HOARE
SUPPORTED BY
ADVANCED RESEARCH PROJECTS AGENCY
ARPA ORDER NO. 24y4
PROJECT CODE 3D30
DECEMBER 1973
COMPUTER SCIENCE DEPARTMENT
School of Humanities and Science

STANFORD UNIVERSITY

(http://elib.stanford.edu/)

..
8. Variables

One of the most powerful and most dangerous aspects of machine code
programming is that each individual instruction of the code can change the
content of any register, any location of store, and alter the condition of
any peripheral: it can even change its neighboring instructions or itself.
Worse still, the identity of the location changed is not always apparent
from the written form of the instruction; it cannot be determined until
run time, when the values of base registers, index registers, and indirect
addresses are known. This does not matter if the program is correct, but
if there is the slightest error, even only in a single bit, there is no
limit to the damage which may be done, and no limit to the difficulty of
tracing the cause of the damage. In summary, the interface between
every two consecutive instructions in a machine code program consists of
the state of the entire machine -- registers, mainstore, backing stores
and all peripheral equipment.

In a high level language, the programmer is deprived of the dangerous
power to update his own program while it is running. Even more valuable,
he has the power to split his machine into a number of separate variables,
arrays, files, etc.; and when he wishes to update any of these, he must
quote its name explicitly on the left of the assignment so that the identity
of the part of the machine subject to change is immediately apparent; and
finally, a high level language can guarantee that all variables are disjoint,
and that updating any one of them cannot possibly have any effect on any
other.

Unfortunately, many of these advantages are not maintained in the
design of procedures and parameters in ALGOL 60 and other languages.
But instead of mending these minor faults, many language designers have
preferred to extend them throughout the whole language by introducing
the concept of reference, pointer, or indirect address into the language
as an assignable item of data. This immediately gives rise in a high
level language to one of the most notorious confusions of machine code,
namely that between an address and its contents. Some languages attempt
to solve this by even more confusing automatic coercion rules. Worse
still, an indirect assignment through a pointer, just as in machine code,
can update any store location whatsoever, and the damage is no longer
confined to the variable explicitly named as the target of assignment.
For example, in ALGOL 68, the assignment
x^a- -y;
always changes x , but the assignment
x:=y+l;
if x is a reference variable may change any other variable (of appropriate
type) in the whole machine. One variable it can never change is x !
Unlike all other values (integers, strings, arrays, files, etc.) references
have no meaning independent of a particular run of a program. They cannot
be input as data, and they cannot be output as results. If either data
or references to data have to be stored on files or backing stores, the
problems are immense. And on many machines they have a surprising
overhead on performance, for example they will clog up instruction
pipe-lines, data lookahead, slave stores, and even paging systems.
References are like jumps, leading wildly from one part of a data
structure to another. Their introduction into high level languages has
been a step backward from which we may never recover.
.

Back to School page