The ETA Programming Language

Designed by Mike Taylor, beginning Tuesday 24th August 1999
Copyright © Mike Taylor, 1999.

This document may be freely redistributed provided it is not modified in any way. Specifically, the authorship must remain clear. All feedback is extremely welcome, and should be emailed to the author at mike@tecc.co.uk.

This document describes version 1.0 of the ETA language.
Document SCCSID ("%Z%%P% %I%")

Table of Contents

Introduction

The design of the ETA programming language follows from the axiom that just because you have to program on a tiny machine, that doesn't mean your programs have to be unreadable.

The designers of less ambitious languages such as COBOL intended that programs written in their languages should be readable to English-speaking managers as well as trained technical staff. ETA goes further by allowing you to write your programs in such a way that they can easily be read by speakers of whatever language you choose.

ETA achieves this lofty goal despite, or perhaps because of, its tiny size. It has very few instructions, and is thus extremely easy to learn. Admittedly, this extraordinary combination of power and simplicity comes at a cost: there is no guarantee that the natural language description in the program text will bear any relation to what the program actually does.

Why the name ETA? There are several reasons. Firstly, it's named after its own instruction set (see below). Secondly it's named after the seventh letter of the Greek alphabet, both to celebrate its use of the base-7 numbering system, and because it comes a little way before iota, indicating how tiny the language is. And thirdly, ETA stands for Estimated Time of Arrival, which is appropriate since ETA programs tend to run rather slowly. Oh, and fourthly, ETA is an anagram of TEA, which is nice to drink.

Oh, and Harvey Thompson (harveyandsu@yahoo.com) suggests the following alternative interpretations of the ETA acronym:

Extremely Tortuous Algorithms
Expressions That Amaze
Encoded Tricky Allusions
Esteemed Thrick Avuncularisations
Elocutionist Throat Aberrations
Effervescent Tautological Anagrams

The Language

Overview

The ETA language has only eight instructions, and these are named by the eight most frequently occurring letters in written English: E, T, A, O, I, N, S and H. Since the same eight letters are used to represent numbers (see below), all other characters in an ETA program are ignored (with the exception of line-breaks - see below.) This flexibility allows additional non-functional characters to be arbitrarily intermingled with the instruction and operand characters.

As a further aid to readability, case is not considered significant: programmers are encouraged to use mixed case in such a way as to elucidate the structure of the program.

For example, the following program fragments are all equivalent (they are no-ops):

NENAET
nenaet
NeNAEt
Wunce upun a ET...
Unbend a Pet, ugly!

Machine Model

All the instructions work on a stack of signed 32-bit integers. Implementations must support stack-depth of at least 100. More is better. Infinite is best.

Two of the instructions use addresses. An address in an ETA program is a line-number, counted from one upwards. (This is why line-breaks are significant.) Lines are considered to be terminated by an LF (ASCII 012), a CR (ASCII 015), a CR-LF pair or an LF-CR pair, so that ETA programs run portably on systems with different end-of-record conventions.

Instruction Set Summary

The instructions are as mnemonic as possible (which in truth is not really all that mnemonic):

Instruction Action
E dividE second-top by top; push quotient, then remainder.
T if top-of-stack is true, Transfer control to stacked line.
A push one more than the current Address onto the stack.
O Output a character whose code is popped from the stack.
I Input a character and push its code on the stack.
N push a Number onto the stack.
S Subtract the top of stack from the top-but-one.
H Halibut: duplicate or move an element of the stack.

The individual instructions are described in more detail below.

Instruction Set Details

E: The dividE Instruction

Pops two numbers off the stack: first b, then a. Performs the integer division of a by b, and pushes first the quotient and then the remainder onto the stack. For example, if first 19 and then 7 were pushed onto the stack, then the dividE instruction would push 2 (the quotient) and then 5 (the remainder).

(Sorry about the horrible mnemonic.)

T: The Transfer Instruction

Pops two numbers off the stack: first addr, then cond. If cond is non-zero, then control is transferred to the first significant character on line number addr of the program. As a special case, transferring to the address zero terminates the program.

This is the only conditional instruction in ETA: it is used to build loops and to call and return from subroutines.

(It can be hard to remember what order the parameters for this instruction go on the stack: the author finds it helpful to remember the word CAT: Condition, Address, Transfer!)

A: The Address Instruction

Pushes one more than the current address onto the stack. This is useful for allowing a Transfer to subroutine followed by a subsequent Transfer back to the point immediately after. It can also be used to construct transfer addresses relative to the current address, yielding position-independent code.

O: The Output Instruction

Pops the top number off the stack, and outputs the character whose ASCII code is that number. (ASCII is specified so that it's possible to write ETA programs which will run portably on machines using different character sets. This is obviously a pain to implement on non-ASCII machines. But as one of our poets has written, ``Here's a nickel, kid. Get yourself a better computer.'')

I: The Input Instruction

Reads a single character and pushes its ASCII code onto the stack. (See the note on ASCII in the previous section.) If the end of the input file has been reached, then -1 is pushed.

N: The Number Instruction

This instruction pushes its argument on the stack. It's unique in that it takes its argument from the program text - all the other instructions that need operands pop them from the stack. The argument is made up of the sequence of significant characters following the N, up to but not including the first occurence of E. Those characters are treated as a big-endian sequence of base-seven digits with the following values:

Digit = Value   Digit = Value
H = 0   I = 4
T = 1   N = 5
A = 2   S = 6
O = 3

So, for example, the sequence Ntone pushes the value 75 (1x7^2 + 3x7^1 + 5x7^0).

Note that negative numbers can not be expressed in this way: they must be generated by a Subtract instruction.

The characters making up the numeric argument are skipped; they are not also used as instructions, but execution continues from the first significant character after the terminating E character.

(The usual ETAOINSH order of the letters is perturbed for the mapping to digits so as to yield more Es - which are common in English text - and fewer Hs.)

S: The Subtract Instruction

Pops two numbers off the stack: first b, then a. Subtracts b from a and pushes the result onto the stack.

H: The Halibut Instruction

Pops a number, n, off the stack, and uses it to modify the remaining stack as follows: the n'th number from the top is rolled to the top of the stack, with those previously above it slipping down one place. For example, when n is one, it swaps the top two elements; and when n is two, it rolls the top three elements, moving the third to the top of the stack.

If n is negative, its absolute value is used, but the specified element of the stack is copied to the top rather than moved. So when n is -1, a stack A B C is transformed to A B C B. For the purposes of this instruction, zero is conveniently considered negative, so that n=0 performs a DUP rather than a no-op. (There are already 246 ways to do a no-op in ETA.)

I couldn't find a single synonym for ``roll'' that had an ``h'' in it anywhere, hence the horribly un-mnemonic mnemonic. Sorry.

Comments

There is no commenting convention as such in ETA: significant character are always significant, no matter what context. The intention is that the whole of a well-written ETA program is a comment.

However, comments may be formed by using words made up entirely of letters other than the eight significant ones. This is something of an art-form in itself. Good words to use in ETA comments include crumbly, fulcrum, luxury, cruddy, buggy, fuzzy, guru, etc.

A related art is to allow significant characters to occur, but only in sequences which perform no-ops. For example, NeS subtracts zero from the top of the stack (so be careful not to use it when the stack is empty); and NeNanythingeT (where anything is a sequence of characters other than e) performs a conditional transfer, but the condition is false, so nothing happens.

(The author can't use his own first name - Mike - in an ETA program, since its significant sequence is ie, which reads a character and divides the top of stack by its value - so that if a NUL character is read, an illegal division by zero will ensue.)

Of course, significant characters may but used in program text with impunity provided that the afected portion of the program text is never executed. For example, anything on a line following the sequence NteAT (transfer control to the next line) will never be executed. This could be considered as a commenting convention; but use of this mechanism is considered poor form.

Useful Idioms

Some idioms emerge repeatedly. The first batch of these are so common and useful that they can almost be thought of as additional primitives (DUP, SWAP, ADD, JUMP, EXIT, SKIP, REM, DROP.)

Ne H Duplicate the top element of the stack.
Nte H Swap the top two elements of the stack.
Ne Nnumber S S Add number to the top of the stack.
A Naddress T Unconditional jump to address.
A Ne T Successful completion (jump to address zero).
A Ne Nte S S T
A Ne Nnumber S S T
If non-zero, skip the next line.
... or skip number lines.
A A T Disregard the rest of the current line.
Ne Nte H T
Ne T
Discard the top element of the stack.
Special case: Discard top of stack, if known to be 0.

Function call and return can be done in a variety of ways, but the following conventions seem to work quite well:

The thrust of these conventions, especially the first, is to make it easy for the caller. (Where necessary, this done at the expense of the convenience of the callee, since a typical function will be called many times but defined only once.)

A Nte Naddress T Call the function at the specified address, assuming that any necessary arguments have already been pushed onto the stack.
  No function-call prologue needed when there are no arguments.
Nte H Function-call prologue (1 argument).
Nae H Nae H Function-call prologue (2 arguments).
Noe H Noe H Noe H Function-call prologue (3 arguments).
  etc.
Nte Nte H T Function-call epilogue (no return-value).
Nte Nae H T Function-call epilogue (1 return-value).
Nte Noe H T Function-call epilogue (2 return-values).
  etc.

Functions that wish to use many arguments from the stack, or leave many values on the stack (e.g. functions dealing with NUL-terminated strings) will need to make their own arrangements.

Future Directions

Unappealing though it seems, we may be forced to change the assignment of instructions to letters, so as to allow the most common English letters to occur most commonly in program text. With the current encoding, the letter N (the Number instruction) tends to be most common, followed by H (the Halibut instruction), although both of these letters are towards the lower end of the frequency-in-prose spectrum; while conversely, the letter E (the dividE instruction) tends to occur least frequently.

This problem is at least partially addressed by the assignment of base-seven digits and the number terminator to letters, so re-assigning the instructions themselves may prove unnecessary.

Challenges

ETA programmers might like to tackle the following challenges:

Best of Show: I guess the ultimate tour de force in ETA programming would probably be a suite of functionally equivalent programs structured as limericks in different languages, in which the meaning of the limericks describes the function performed by the code.

Since numeric constants may be broken across lines, extra style points can be earned by using the same code sequence both as a continuation of a numeric constant and (when Transferred to) code.

Successful attempts at any of the challenges may be emailed to the author (address above) for attributed inclusion in future releases of the ETA distribution.

Examples

Overview

Due to the constraints of time, and the desire to construct real, working ETA programs as quickly as possible, the code below is not particularly good English. The programs would most charitably be described as poetry rather than prose.

More elegant renditions of these programs would be very welcome; please email them to the author (address above).

Hello, World!

This program prints everyone's favourite message. The interleaving of the Number and Output instructions is obviously pretty arbitrary. I've chosen to avoid the fenceposts: the selected interleaving is neither n × (Number; Output) nor n × Number; n × Output.

An alternative approach would be push a marker character on to the stack - NUL would be an obvious choice - followed by all the characters of the message, then to loop, Outputting each one (or to call a well-known function to do this):

Copying Input to Output

This program, a re-implementation of the popular CP/M utility program pip, demonstrates the use of the Transfer instruction for conditionals (line 4), looping (line 8) and termination (line 6).

An alternative, and more compact, rendering of the same program:

(Not all the features of pip are implemented!)

Factorial (Recursive)

The following program defines and recursively calls a factorial function which calculates the product of an integer with all the smaller positive integers: for example, 5 factorial is 120 (5×4×3×2×1):

Since this is a relatively large program, some annotation is in order:

line 1
Lantern 1: pamphlet
Call the subroutine at line 14 to read a number, in the form of a sequence of decimal digits, from the standard input. That number is placed on the stack.
line 2
Lantern 2: unkempt
Call the factorial function itself at line 5, converting the number on top of the stack into its own factorial.
line 3
Lantern 3: impiety
Call the subroutine at line 32 to output the result as a sequence of decimal digits
line 4
junctor: Vermont
Output a trailing newline character, and jump to line zero, terminating execution. This is the end of the main program: the remainder of the code consists of the subroutines called in these first four lines, and the subroutines that they in turn call.
line 5
NUTMEG 1: myrrh
The factorial routine itself: the prologue at line 5 rolls the argument n above the return address; line 11 makes the recursive call with argument n-1 if the argument is not 1; line 12 calls the routine at line 42 to multiply n by fact(n-1); and line 13 is the epilogue, rollin the return address back above the result, and transferring control to it.
line 14
=== FLUID ===
The subroutine to read a decimal constant from the standard input: lines 14-17 skip any leading sequence of space characters; lines 18 initialises the accumulator; lines 19-24 deal with an incoming digit; lines 25-29 break on a space or newline or at end of file, and otherwise loop; lines 30 and 31 discard the junk character on top of the stack and return the accumulated value to the caller.
line 32
5. gunmen = aleph;
The subroutine to write a decimal constant to the standard output: line 32 is the function-call prologue; lines 33-39 calculate and stack each digit in turn, least-significant first; line 40 calls the output-a-string subroutine at line 58 to unstack the digits to standard output most-significant first; line 41 is the function-return epilogue.
line 42
naked hymnal => belch
The subroutine to multiply two numbers together, which works by the primitive method of adding the value of the top-but-one value on the stack to an accumulator, initialised to zero, the number of times indicated by the top value on the stack. Prologue on lines 42-45, main loop on lines 46-55, and epilogue on line 56 and 57.
line 58
NUTMEG 18: myrrh
The subroutine to write out stacked characters, down to but not including a terminating NUL character. The function-return epilogue happens to be in the middle of the code, on line 61 (why not?)

What could possibly be clearer? Note how ETA facilitates top-down program design: this program begins by making high-level calls describing its algorithm in abstract terms, and then proceeds to elaborate on how those high-level calls are to be executed.

99 Bottles of Beer on the Wall

The following program prints the words to the classic song 99 Bottles of Beer on the Wall:

The code needs no explanation.

Appendix: The Reference Interpreter

There exists a reference interpreter for the ETA language, called simply eta, against which all the example programs in this document, and more, have been tested. It is implemented in Perl for maximum portability.

The reference interpreter is invoked as follows:

eta [ -d number ] [ -O ] [ eta-file [ eta-file2 ... ] ]

It executes the ETA program made by concatenating the eta-files named on the command-line, reading the program from standard input if no files are named. The ETA program runs with its standard input and output connected as the interpreters (which means that programs read from standard input can't do themselves do input.)

The options have the following effects:

-d
Print debugging output, at the specified level, to standard error.
-O
Interpret the program as OIL code (q.v.) instead of ETA code.

The debugging output is controlled by the value supplied to the -d option: it is composed of separate bitwise components, described in the following table, which may be ORred together:

Component Output
1 Summary of compiled program's size.
2 ``Done.'' message at end of run.
4 Execution trace: before each instruction is executed, a line is emitted of the form char/linestack  inst, where char is the current program counter, measured in significant characters; line is the current line; stack is a space-separated list of the values on the stack, with the most-recently-pushed to the right; and inst is the instruction about to be executed.

Other ETA implementations exist, including an interpreter written in C, which is about thirty times as fast as the reference interpreter. However, whenever their behaviours differ in an area concerning which this document is ambiguous, the behaviour of the reference implementation is correct by definition. Any such ambiguities should be emailed to the author (address above).