David Holdsworth
CAMiLEON Project
University of Leeds
LS2 9JT UK
This is a second draft of Emulation: C-ing ahead.
It is a companion paper to Emulation, Preservation and Abstraction
The author has practical experience is in the preservation of an operating system for an obsolete mainframe system of the 1970s. The techniques advocated in this short document are a direct consequence of this work, which has been done by writing emulation code in both C and Java.
Although it was emulation that led to the ideas expressed here, there is good reason to use the same language for writing migration tools.
Most of this paper is written for a computer science audience, for it is that community that is best able to judge whether the techniques recommended here do indeed have the potential to stand the test of time.
Jeff Rothenberg has conducted an experiment in emulation as a preservation strategy. In this experiment he uses an Emulation Virtual Machine.
The IBM Almaden Research Center is involved in a project (see Lorie 2001) in which the intention is to design a Universal Virtual Machine (UVM), which is then used for actual emulator implementation.
My own work (with a colleague, Delwyn Holroyd) has implemented emulation of the ICL1900 system to the extent that we can run the George3 operating system, including its time-sharing feature. This also gives us access to software systems written to run under George3, including the world's first Algol68 compiler. This implementation has been designed to have a long life. However, we have chosen to use a programming language as the stable implementation platform, rather than the virtual machine approach of Rothenberg and Lorie.
In the search for continuity in an environment of rapidly changing technology where market forces push towards planned obsolescence, a few aspects of computing have shown long-term stability, largely on account of their widespread use throughout the IT industry. There then comes a point where the market forces actually want stability in order to protect investment.
The 3½" diskette is a case in point. Although its value as a storage medium has long since ceased, its interchangeability led to wide use in data transfer. It is only recently that a 3½" disk drive is not a standard part of every desktop system.
Some programming languages achieve even longer currency largely on account of the amount of software investment dependent upon them. Pre-eminent in this category is C.
Java's view of memory is much more abstract, but still allows arrays of integers, which make a convenient representation of emulated main storage. Java has a multi-tasking model as an integral part of the language, whereas the thread facilities in C are a more recent feature of the language, and not necessarily supported on all platforms.
In our emulation of the ICL system, we have used C for emulating the main 1900 processor, and used Java for emulation of the communications processor (7903) for which the multi-tasking aspects are valuable. Although still imperfect in some respects, the system works well enough to evoke immediate recognition by those who know the original system, and to vindicate the techniques used in its construction. The emulation has run successfully on Win32, Irix, Solaris and Linux. We have not tried any other platforms.
However, we wish to address the longer term. It seems unlikely that some alternative programming paradigm (e.g. functional programming) will completely eclipse the traditional style. When we look at C we can observe that many of its features are to be found in other languages. Here I am concerned with features at the semantic level. As an example, the assignment statement exists in C, Algol60, Algol68, Pascal, Ada83/95 and Java, to name but a few. There is a syntactic difference in that C and Java have x = y, whereas the others have x := y. On the other hand, there are features of C that have been deliberately discarded in newer languages, e.g. macros, address arithmetic, variadic parameter lists.
I propose that we recommend a subset of C in which to write emulators for long-term preservation. My personal experience is that the amount of work involved is by no means excessive. Our emulation of the ICL1900 was achieved as a spare-time activity over a period of about 18 months. We are both of us in full-time employment.
Tentative proposals for selection of the subset are in Appendix A.
The expectation is that over time it will become necessary either to modify the subset if it turns out to contain features that are removed from the language (indicating a bad choice of subset), or to move the policy to use a subset of a different language.
A further opportunity is opened up by this approach. We must consider the time when C becomes computational Latin, and is replaced by another lingua franca (let us call it E). The C – – yacc parser could be the vehicle for implementation of software for the automatic translation of C – – emulators into E – –.
Of course, it is always possible that yacc may not last for ever either — but yacc is a C program, and could possibly be translated into C – – when the time came when it was no longer seen as part of the standard kit of parts. Making it generate E – – may be more problematic. On the other hand, the LALR algorithm implemented by yacc is a cornerstone of compiler implementation, and knowledge of it is unlikely to be lost.
It is inevitable, that the restrictions of C – – will make some things impossible. For a start, the desire to exclude variadic parameter lists would restrict the use of printf. We thus propose that a C – – program may require linkage with a small (and we stress small) section of code written in C. It is assumed that any future migration of the emulator away from C would involve hand coding of these small C sections. It may prove possible to make this C section common to more than one emulator.
Features for omission from C – –:
The George3 operating system which runs under our emulator was written in assembler by a team of programmers. As a result it seems to use every quirk of the machine's order code at some point. A final break-through into reliable operation came when we finally implemented a property of the overflow register that was not hinted at in the summary chart, and was detailed once in a thick four-volume manual. It seems likely that such a property might escape the specification process.
The source text of George3 was an invaluable reference from time to time. Some of the later features of the system's interfaces were not in the main stream manuals, although they may have featured in software notices. The thought of reading through many many hundreds of these was sufficient disincentive to make inspection of the source code a more fruitful way to investigate mysteries. One particular feature of the interface to the communications processor was only revealed by a comment in the source code, after which dim recollection of 25 year-old knowledge was sufficient.
During the early stages of the emulation work, there still existed a single live installation of George3. We took the precaution of getting this system to produce its diagnostic memory dump, so that we had an example of a real system in operation. Reference to this did occasionally help us to clarify aspects of interfaces whose documentation assumed knowledge that we no longer possessed.
We have deliberately steered clear of system-dependent features in our use of C. Each of the two authors is routinely using a different compiler (Visual C++ and Cygnus gcc), and from time-to-time checks out operability on other systems. We have factored out the parts of the code that are necessarily platform specific.
David Holdworth
August 2001