David Holdsworth and Paul Wheatley
University of Leeds
LS2 9JT UK
Abstract: This paper argues that emulation is a valid method of digital preservation, both in terms of longevity and affordability. This argument is bolstered by presenting guidelines for use of emulation in this rôle, and by providing an illustrative and yet non-trivial example.
The above diagram purports to show first the transformation (arrow A) of an original digital object into a preserved digital object, which can then be run under emulation to give an adequate access to the significant properties of the original. This step ensures continued access as the original platform becomes obsolete.
Further passage of time leads to the transformation shown by arrow B, in which the obsoleting of platform 1 is handled by the updating of the emulator to run on a different platform. The large equals sign indicates that the preserved digital object consists of the same byte-stream in each case.
The crux of our argument is that the transformations indicated by the two arrows can be implemented economically. Key to this is the design of the abstract interfaces indicated by the numbered circles on the junctions between digital components. As evidence of the truth of our assertion we show the preservation of material from decades ago.
The interface labelled 1 is the API necessary for the successful operation of the original digital object. Emulator 1 recreates this interface in the abstract world of emulation. This interface remains the same forever. The emulator itself is a digital object and relies on interface 2 for its successful operation. A key strategic goal is future-proofing by choice of interface 2 involving only features that are likely to survive the test of time with little modification. Thus interface 3 will be very similar to interface 2, and the emulator for host 2 (emulator 1.01) will be readily derived from emulator 1.
There may well be many digital objects that operate via interface 1. Where that is the case, the effort in generation of emulator 1 and its subsequent modificaton to run on later platforms will be repaid many times.
Bearman has cited emulation as a dangerous strategy that fails to actually preserve digital objects and is not a realistic approach due to the enormous costs of emulator development. Without emulation it is unclear how interactive digital objects could be preserved in a useful way. No practical work has been presented as to how digital objects of more complexity than simple documents would be preserved using migration. For those involved in preservation cost is a major issue. The apparent initial outlay of resources for emulator development can easily be enough to discourage many from considering emulation. However, we demonstrate that emulation is already a practical and realistic method of preserving access to digital materials. If unorganised enthusiasts can revive interests in classic arcade games via emulation surely it is not beyond an organised and funded preservation world to make rewarding and practical use of emulation?
The IBM Almaden Research Center is involved in a project (see Lorie 2000) in which the intention is to design a Universal Virtual Machine (UVM), which is then used for actual emulator implementation. It seems courageous to christen something "universal" even before it has been designed, but perhaps if you are IBM you can make such nomenclature stick. The view taken by IBM is that a high-level language is too transient, and will not stand the test of time. History teaches us however that some languages do achieve such pre-eminence (and have so much software investment dependent upon them) that they outlast virtual architectures.
We are aware of two products in the marketplace that emulate the world of Wintel on non-Intel platforms Softwindows (review) and Virtual PC (review). The existence of these products underlines the feasibility of emulation, even for modern systems, not merely for yesterday's historical curiosities. These are commercial products, and at present we connot comment on their likely portability to platforms of the future. For open source emulation of Wintel we can look to WINE. Although it requires an Intel platform Wine works on most popular Intel Unixes and is not restricted to Linux.
Emulation is a part of the overall preservation process. The OAIS model makes the useful separation between preservation of the digital material, and the ability to render access to its intellectual content. This point of view has been further reinforced by work in the CEDARS project by recognising explicitly that the indefinite preservation of a byte-stream is technically straightforward. There are organisational issues in recording the existence of the preserved object and in providing appropriate facilities for resource discovery. So long as the byte-stream is captured, the provision of solutions to these latter issues can take place over time.
Clearly, a preserved byte-stream is only valuable if meaningful access can be made to its intellectual content. For complex interactive digital objects, emulation and upwards-compatibility of systems are the only current techniques for retaining the ability to run such an object . Where upwards- compatibility exists (e.g. the Wintel platform, IBM 390), there is no need for emulation. There is a need for monitoring of the threat from creeping obsolescence (see Gödel ends in CEDARS).
Even when preserving material for which the platform is currently available, it is wise to take account of future prospects for emulation. Our experience would indicate that preservation of a specification of what is to be emulated with intention of future implementation is unlikely to be a successful tactic. There is no way to test the necessary completeness of such a specification, and our experience is that even the most arcane feature of a system can turn out to be vital to successful emulation. We argue that the emulation exercise should be carried out sooner rather than later (see George3 story below).
The interface labelled 1 in figure 1 needs to be identified within the original system. For a modern day Windows application running on a PC platform we would have to choose between candidates such as:
The task then is to produce an emulator running on some other platform (e.g. Linux in the WINE case). The interface labelled 2 is then the Linux API (or a subset) on an Intel CPU. The selection/design of this second interface aims to second guess the future. The intention is that in the future a new host platform (host platform 2) can readily be identified that has an interface (labelled 3) that is reasonably close to interface 2. Sufficiently close, that the transformation needed to make an emulator for host platform 2 out of emulator 1 is much simpler than re-implementing an emulator in a new environment.
Within the CAMiLEON project we are working with material from the 1970s and 1980s as a way of showing how we can map from one architecture (namely an old one) to one that is radically different. The goal is to deliver a result in which a preserved digital object of some complexity can be run in emulation with sufficient verisimilitude to reproduce the significant properties of the original experience.
A similar selection of an appropriate abstraction is vital to successful emulation. We would argue strongly that this abstraction is unlikely to be at the level of actual hardware. This may be quite easy at the CPU level, but presents all sort of difficulties with regard to peripherals. The abstraction discussed here is the specification of interface 1 in figure 1.
There is also the issue of selection of an appropriate platform upon which to run the emulator. Our preference is for a software platform, such as the C programming language and the socket library. The WINE project is already emerging as an emulation of the Wintel platform running on the UNIX API. The appropriate platform discussed here is the specification of interface 2 in figure 1, with a view to its trouble-free evolution into interface 3.
We believe the C programming language "platform" represents one of the best existing choices for the longevity of an emulator implementation. The large volume of software currently in use that is written in C will ensure the language's survival many years into the future. However, for the long term, we need to do more to ensure the longevity of our emulations.
The source code of an emulator represents the key element to be preserved. To give it greater longevity and make it easier to translate into future programming languages, when this becomes necessary, we propose several key strategies:
This process of evolution of the emulator is actually a form of migration, as described in an analysis of migration conducted by PRW. This is not to say that migration is actually always better than emulation, but that some systems lend themselves to migration, and in the case of C-- this is very deliberate. In fact, migration is most effective when the original object was produced with portability in mind. This is particularly the case with our approach to writing emulators.
Emulation should not be over-sold as the answer to all digital preservation issues. It is just part of the armory necessary for defending our digital heritage against the ravages of time in a world where innovation (and hence change) is highly prized. Vital to any preservation is the identification of the significant properties that form the preservation goal, and thus to the identification of appropriate abstractions that are necessary for the digital object to manifest these significant properties. In some cases these appropriate abstractions are the system calls of an API, and best way to prerserve these abstractions long term is to emulate them. In so so doing we preserve meaningful access to the original digital object(s).
With demise of the architecture upon which the system ran, we either provide an emulation or lose the ability to provide the historians of computing with access to a significant era.
In seeking to provide access to this material by emulation we need first to select that abstract interface at which our emulation will operate.
A running George3 system was composed of an executive program running on the raw hardware, over (under) which ran George3 itself. This then provided the platform upon which ran the applications. We thus have three options for the emulation:
Our experience suggests that this would not have succeeded if it had been undertaken by the Rothenberg route of preservation of the specification. This may be particularly true of 1970s systems for which the documentation was produced by traditional paper means and does not exist in electronic form. We certainly needed very detailed information on the way that the CPU operated, including one feature that was not hinted at in the order code summary chart. We even found a bug in George3. We certainly benefited from the fact that the two main implementors had each worked with George3 in the past. It is quite clear to us that the omission of a single fact from a preserved specification would be sufficient to generate extreme difficulties.
As well as the binary image of George3, we also have the source text, to which we have made quite frequent reference. It leads us to the view that emulating a platform for which you do not have any examples of source text may be particularly arduous.
There is also the issue of choice of platform upon which the emulator is to run. We have chosen the C programming language. There is optional use of the socket facility for some features of the system. We believe that the ability to run C programs will last for a long time. We have run successfully on Win32, Solaris, Linux.
In short, we vote for achieving a significant part of the emulation at the point where the original platform becomes obsolete at least for systems for which the documentation is not available in electronic form.