Emulation, Preservation and Abstraction

David Holdsworth and Paul Wheatley
CAMiLEON Project
University of Leeds
LS2 9JT UK

Abstract: This paper argues that emulation is a valid method of digital preservation, both in terms of longevity and affordability. This argument is bolstered by presenting guidelines for use of emulation in this rôle, and by providing an illustrative and yet non-trivial example.

Introduction

Emulation has become one of the hot topics of discussion in digital preservation with many arguments both for and against the use of emulation. Until recently the lack of any real practical work has left these arguments on a purely theoretical (or more correctly hypothetical) level, and the CAMiLEON Project is attempting to redress the balance. This is the project's first low level look at the use of emulation as a real and practical preservation strategy, which we will be developing and testing over the remaining two years of the project.

Emulation in a Nutshell



Figure 1

The above diagram purports to show first the transformation (arrow A) of an original digital object into a preserved digital object, which can then be run under emulation to give an adequate access to the significant properties of the original. This step ensures continued access as the original platform becomes obsolete.

Further passage of time leads to the transformation shown by arrow B, in which the obsoleting of platform 1 is handled by the updating of the emulator to run on a different platform. The large equals sign indicates that the preserved digital object consists of the same byte-stream in each case.

The crux of our argument is that the transformations indicated by the two arrows can be implemented economically. Key to this is the design of the abstract interfaces indicated by the numbered circles on the junctions between digital components. As evidence of the truth of our assertion we show the preservation of material from decades ago.

The interface labelled 1 is the API necessary for the successful operation of the original digital object. Emulator 1 recreates this interface in the abstract world of emulation. This interface remains the same forever. The emulator itself is a digital object and relies on interface 2 for its successful operation. A key strategic goal is future-proofing by choice of interface 2 involving only features that are likely to survive the test of time with little modification. Thus interface 3 will be very similar to interface 2, and the emulator for host 2 (emulator 1.01) will be readily derived from emulator 1.

There may well be many digital objects that operate via interface 1. Where that is the case, the effort in generation of emulator 1 and its subsequent modificaton to run on later platforms will be repaid many times.

Emulation So Far

Rothenberg has proposed that the work involved in the production of an emulator can be considerable, so it can be postponed until resources are available by instead producing an emulator specification at the time of platform obsolescence. This seems a very risky approach to preserving information forever, when there is no guarantee that a specification produced with no verification of its completeness really can be used to produce an emulator sometime in the future.

Bearman has cited emulation as a dangerous strategy that fails to actually preserve digital objects and is not a realistic approach due to the enormous costs of emulator development. Without emulation it is unclear how interactive digital objects could be preserved in a useful way. No practical work has been presented as to how digital objects of more complexity than simple documents would be preserved using migration. For those involved in preservation cost is a major issue. The apparent initial outlay of resources for emulator development can easily be enough to discourage many from considering emulation. However, we demonstrate that emulation is already a practical and realistic method of preserving access to digital materials. If unorganised enthusiasts can revive interests in classic arcade games via emulation surely it is not beyond an organised and funded preservation world to make rewarding and practical use of emulation?

The IBM Almaden Research Center is involved in a project (see Lorie 2000) in which the intention is to design a Universal Virtual Machine (UVM), which is then used for actual emulator implementation. It seems courageous to christen something "universal" even before it has been designed, but perhaps if you are IBM you can make such nomenclature stick. The view taken by IBM is that a high-level language is too transient, and will not stand the test of time. History teaches us however that some languages do achieve such pre-eminence (and have so much software investment dependent upon them) that they outlast virtual architectures.

We are aware of two products in the marketplace that emulate the world of Wintel on non-Intel platforms — Softwindows (review) and Virtual PC (review). The existence of these products underlines the feasibility of emulation, even for modern systems, not merely for yesterday's historical curiosities. These are commercial products, and at present we connot comment on their likely portability to platforms of the future. For open source emulation of Wintel we can look to WINE. Although it requires an Intel platform Wine works on most popular Intel Unixes and is not restricted to Linux.

Emulation as a practical strategy

The development of the infrastructure and support required to maintain emulation tools and support the community as a whole is a crucial issue but is primarily one of organisation and not of unconquerable problem. Putting those issues aside we're aiming to demonstrate that in technical terms, emulation really is a realistic and practical approach to the preservation of some digital materials.

Emulation is a part of the overall preservation process. The OAIS model makes the useful separation between preservation of the digital material, and the ability to render access to its intellectual content. This point of view has been further reinforced by work in the CEDARS project by recognising explicitly that the indefinite preservation of a byte-stream is technically straightforward. There are organisational issues in recording the existence of the preserved object and in providing appropriate facilities for resource discovery. So long as the byte-stream is captured, the provision of solutions to these latter issues can take place over time.

Clearly, a preserved byte-stream is only valuable if meaningful access can be made to its intellectual content. For complex interactive digital objects, emulation and upwards-compatibility of systems are the only current techniques for retaining the ability to run such an object . Where upwards- compatibility exists (e.g. the Wintel platform, IBM 390), there is no need for emulation. There is a need for monitoring of the threat from creeping obsolescence (see Gödel ends in CEDARS).

Even when preserving material for which the platform is currently available, it is wise to take account of future prospects for emulation. Our experience would indicate that preservation of a specification of what is to be emulated with intention of future implementation is unlikely to be a successful tactic. There is no way to test the necessary completeness of such a specification, and our experience is that even the most arcane feature of a system can turn out to be vital to successful emulation. We argue that the emulation exercise should be carried out sooner rather than later (see George3 story below).

The interface labelled 1 in figure 1 needs to be identified within the original system. For a modern day Windows application running on a PC platform we would have to choose between candidates such as:

For such a platform, the third option would probably be the most appropriate. The WINE project gives some hope that this would be the case.

The task then is to produce an emulator running on some other platform (e.g. Linux in the WINE case). The interface labelled 2 is then the Linux API (or a subset) on an Intel CPU. The selection/design of this second interface aims to second guess the future. The intention is that in the future a new host platform (host platform 2) can readily be identified that has an interface (labelled 3) that is reasonably close to interface 2. Sufficiently close, that the transformation needed to make an emulator for host platform 2 out of emulator 1 is much simpler than re-implementing an emulator in a new environment.

Within the CAMiLEON project we are working with material from the 1970s and 1980s as a way of showing how we can map from one architecture (namely an old one) to one that is radically different. The goal is to deliver a result in which a preserved digital object of some complexity can be run in emulation with sufficient verisimilitude to reproduce the significant properties of the original experience.

Emulation Principles

The CEDARS project work on representation information (PDF - in USA or HTML - in UK) reveals the importance of identifying the significant properties of a digital object, and of selecting an appropriate abstract representation that preserves these significant properties. Preservation then proceeds by mapping that abstract representation (e.g. a file tree) into a byte-stream, which is then preserved indefinitely — along with preserving the ability to reverse the mapping.

A similar selection of an appropriate abstraction is vital to successful emulation. We would argue strongly that this abstraction is unlikely to be at the level of actual hardware. This may be quite easy at the CPU level, but presents all sort of difficulties with regard to peripherals. The abstraction discussed here is the specification of interface 1 in figure 1.

There is also the issue of selection of an appropriate platform upon which to run the emulator. Our preference is for a software platform, such as the C programming language and the socket library. The WINE project is already emerging as an emulation of the Wintel platform running on the UNIX API. The appropriate platform discussed here is the specification of interface 2 in figure 1, with a view to its trouble-free evolution into interface 3.

Level of emulation

We do not yet have hard and fast rules for selection of the abstract emulation interface (interface 1 in figure 1). We can identify a number of factors that should influence the choice.

Longevity

The selection of the initial abstract emulation platform (interface 2 in figure 1) is the key determiner of the ultimate longevity of the emulator.

We believe the C programming language "platform" represents one of the best existing choices for the longevity of an emulator implementation. The large volume of software currently in use that is written in C will ensure the language's survival many years into the future. However, for the long term, we need to do more to ensure the longevity of our emulations.

The source code of an emulator represents the key element to be preserved. To give it greater longevity and make it easier to translate into future programming languages, when this becomes necessary, we propose several key strategies:

  1. Emulator code should be produced using standard Software Engineering techniques. These include the use of a good code structure, informative and plentiful commenting and good documentation.
  2. Emulators should be written in a subset of C, which is chosen with a view to using only those aspects of the language semantics that are likely to be found in future programming languages.
  3. Most emulators are likely to incorporate at least some non-standard code (for example for the rendering of the emulated machine's raster display). All non-standard code should be modularised and well documented.
Evolution over time would involve translation from the chosen subset to a future language, a straightforward process which would be at least partially automatic . Note that C-- will compile under any standard ANSI C or C++ compiler. This theme is developed further in a companion paper Emulation: C-ing Ahead.

Conclusion

If you really want to preserve things for which emulation is the most meaningful method, then it is important to get in early and at least achieve a proof of concept before relevant information (including human memory) is lost. This tactic places extra stress on the need for longevity or the implementation of the emulator. We believe that our approach of defining the emulation platform in terms of a programming language offers the best assurance of this direction. Identification of the right abstract interface within the preserved system for use as the emulation platform (interface 1 in figure 1) also plays a large part in ensuring that the emulator can evolve over time.

This process of evolution of the emulator is actually a form of migration, as described in an analysis of migration conducted by PRW. This is not to say that migration is actually always better than emulation, but that some systems lend themselves to migration, and in the case of C-- this is very deliberate. In fact, migration is most effective when the original object was produced with portability in mind. This is particularly the case with our approach to writing emulators.

Emulation should not be over-sold as the answer to all digital preservation issues. It is just part of the armory necessary for defending our digital heritage against the ravages of time in a world where innovation (and hence change) is highly prized. Vital to any preservation is the identification of the significant properties that form the preservation goal, and thus to the identification of appropriate abstractions that are necessary for the digital object to manifest these significant properties. In some cases these appropriate abstractions are the system calls of an API, and best way to prerserve these abstractions long term is to emulate them. In so so doing we preserve meaningful access to the original digital object(s).

Appendix : The George3 Story — a case study

The George3 operating system ran the ICL 1900 range of computers. Its heyday was in the 1970s and early 80s. Under this operating system ran several ground-breaking applications, e.g. PERT, Pascal and the world's first Algol68 compiler. The system itself had several features that might also be described as ground-breaking, including a hierarchical file store and time-sharing facilities. It also had less worthy aspects of its own era such as confusion between files and peripheral devices.

With demise of the architecture upon which the system ran, we either provide an emulation or lose the ability to provide the historians of computing with access to a significant era.

In seeking to provide access to this material by emulation we need first to select that abstract interface at which our emulation will operate.

A running George3 system was composed of an executive program running on the raw hardware, over (under) which ran George3 itself. This then provided the platform upon which ran the applications. We thus have three options for the emulation:

  1. the raw hardware,
  2. the API provided by executive to George3, or
  3. the API provided by George3 to the applications programs.
We have chosen option 2 on the grounds that: The emulation is now developed to a level well beyond the proof-of-concept, and already offers quite nostalgic perspectives on a lost era of computation. It is quite clear that a very full emulation is within our grasp. This has been achieved with volunteer effort on the part of two individuals, with helpful co-operation from several others.

Our experience suggests that this would not have succeeded if it had been undertaken by the Rothenberg route of preservation of the specification. This may be particularly true of 1970s systems for which the documentation was produced by traditional paper means and does not exist in electronic form. We certainly needed very detailed information on the way that the CPU operated, including one feature that was not hinted at in the order code summary chart. We even found a bug in George3. We certainly benefited from the fact that the two main implementors had each worked with George3 in the past. It is quite clear to us that the omission of a single fact from a preserved specification would be sufficient to generate extreme difficulties.

As well as the binary image of George3, we also have the source text, to which we have made quite frequent reference. It leads us to the view that emulating a platform for which you do not have any examples of source text may be particularly arduous.

There is also the issue of choice of platform upon which the emulator is to run. We have chosen the C programming language. There is optional use of the socket facility for some features of the system. We believe that the ability to run C programs will last for a long time. We have run successfully on Win32, Solaris, Linux.

In short, we vote for achieving a significant part of the emulation at the point where the original platform becomes obsolete — at least for systems for which the documentation is not available in electronic form.