Menu

Towards the sustained use of software for long-term access to digital heritage

Nataša Milić-Frayling, CEO

In this paper we reflect on the transformational impact of adopting digital media for encoding and storing information and the importance of software for processing digital data and transferring knowledge. Unfortunately, the rapid rate of innovation causes rapid software obsolescence, and affects our ability to use digital content. This is particularly challenging for highly interactive and dynamic digital artifacts that use software computation to derive and convey insights from data and study phenomena. In fact, the faster we innovate, the faster software is replaced with new products and becomes unsupported and obsolete. That makes it difficult or impossible to reproduce past data analyses, play old games, and use interactive content such as digital art. Fortunately, advances in computing also provide us with the means to counteract the effects of software obsolescence. At Intact Digital we created a Software Library platform that uses virtualized computing environments to provide stable and protected installations of legacy software and enable the secure and easy use of digital content from decades ago. This is one effective approach to enabling digital continuity that is essential for transferring knowledge to future generations and building on our digital heritage.

Digital media

With the onset of digital revolution in the mid 20th century, we have experienced the fastest growth in the production of content and information and unprecedented speeds in transferring and exchanging data and knowledge through computing technologies. Now, 70 years later, we cannot imagine the world without the Internet services and mobile devices. They shape every aspect of our everyday life.

Scientific and engineering fields have been equally transformed. Digital technologies embedded in instruments and tools are enabling data analyses and knowledge discovery beyond imaginable. With increased capacity of portable storage devices and a shift to cloud computing and cloud storage, we are now commonly dealing with terabytes of personal content and petabytes of scientific and commercial data. However, digital media and digital computing depends on highly sophisticated technologies that require continuous updates to stay functional and usable. Thus, it is of utmost importance to consider technological, economical and educational factors that affect the continuity of the digital media use and take action to ensure that our digital heritage lives and reaches future generations.

Digital continuity and the importance of software

In order to ensure the long-term use of digital content, we need to understand the essential aspects of digital media and the ways it ages and deteriorates.

Every digital content is created, collected, stored and consumed by using compatible software. Thus, digital media is fundamentally computational since the software features shape the encoded data and information. In order to reuse the products of our work, we keep digital documents, images, videos, and databases as files. Each such file is written and read by a particular piece of software or a range of compatible software that can process the files. Without compatible software, digitally encoded content cannot be interpreted, presented and experienced. However, those who focus on data storage alone overlook the importance of software.

Digital files without software are like musical scores without instruments or musicians. We will never be able to play and experience stored digital content without working software and without the skills to use it.

While some software is created for specific hardware devices, most can be installed on a variety of hardware or in virtual machines. As long as the hardware runs an operating system that is compatible with the software, one can install and use it. While hardware obsolescence is also an important problem, for the sake of this discussion we will focus on the issues that arise from the obsolescence of the operating systems and the software itself.

Software can be an application, like a word processor used for creating new documents, or a game that we enjoy playing through a carefully designed interactive experience. Each piece of software depends on many other technical components, from software that enables the use of mouse, keyboard and screen, to the security patches that make the operating system and the whole computer safe. Thus, for any software application to remain functional and usable, it must be constantly monitored and updated if other supporting components change. The most frequent updates are due to security threats, and once it becomes unfeasible for the software producer to keep customers safe, they have to pull the product from the market.

For example, quite recently, in December 2020, Adobe discontinued support for Adobe Flash and, from 12 January 2021 blocked Flash content from running in Flash Player4. It instructed all users to uninstall Flash Player in order to protect themselves from security risks. Unfortunately, this had a detrimental effect on web publishers and artists who had been producing digital art using Flash and enabling online audiences to use dynamic content and Flash animation through a browser for the past two decades. Software is thus key to the use of digital content, and the lack of functional software has a direct impact on what digital content we can continue to use and what knowledge we can transfer to future generations.

Software history and rate of obsolescence

There are many different types of software. A good place to see a variety of software is the Internet Archives Software Collection5. The collection includes over 862,000 software packages, from operating systems, media production software and statistical packages to games and specialized software using 3D visualizations, maps and animations. The Computer History Museum in Ljubljana6 and similar organizations around the world7, reconstruct, preserve and display old software programs and let us experience the exciting journey through the development of computing technologies. However, outside such museums, much software is not in use anymore.

Software obsolescence is a universal phenomenon that affects all software. It is a natural consequence of innovation: as we create new software versions, old ones become unsupported, unsafe, and unusable.

In essence, the faster we innovate, the faster software becomes obsolete and the more of our digital assets are in danger of becoming inaccessible and unusable without compatible software. The impact of software obsolescence is particularly detrimental for digital content that has long-term value, such as knowledge resources and digital cultural heritage. For example, scientific experiments cannot be reproduced reliably without the original software. Digital art cannot appreciate in value if it cannot be shown and interacted with. It is thus critical to enable the long-term use of software. Fortunately, there are ways to enable the use of past data, reconstruct past studies and present digital art from decades ago.

Software library for legacy software

In 2016, Intact Digital8 began a concerted effort to create technically effective and economically sustainable services for long-term hosting and maintenance of the software needed to enable digital continuity.

Since software comprises a source code and installable executables, one can adopt different strategies to ensure that software continues to run. Source code is normally available for open-source software and, in principle, if the developer community retains the knowledge and skills needed to continue software development and can ensure that old data can be used with new versions of the software, there is no danger from software obsolescence.

However, a large section of the software industry and our digital economy is based on proprietary software that is highly customized and for which the software source code is not publicly available. The new versions of the software may not be backward compatible, and modifying and re-developing software would be costly or unfeasible due to the lack of documentation and know-how. More importantly, if the software producer goes out of business, the software becomes completely unavailable and users are left without upgrades and, eventually, without the ability to use their data.

The more successful the software product, more damage is caused by its obsolescence.

In some instances, there is modern software that can serve as a substitute and make use of data files from obsolete software products. Otherwise, our best approach is to create a computing environment in which old software can still run without security threats. That can be achieved by installing software in virtual machines.

In order to establish a principled way of dealing with legacy software dependence, Intact Digital devised an Executable Archive framework9 that complements the traditional archives with a Software Library platform and services for ensuring long-term hosting and maintenance of legacy software.

Long-term software care is achieved by adopting a systematic approach to managing the software files and documentation needed to create software installations. That includes quality assurance practices that are applied during the software installation and ongoing maintenance of virtual computing environments.

Nowadays, software virtualization is broadly used in data centers and public clouds for flexible management of computing resources that are needed by organizations and individuals. Many end users also use virtual machines on their home computers. For example, Apple Mac users can also use a virtualized Windows environment on their machine.

One can think of a virtual machine as a desktop computer but without its own hardware. Inside the virtual machine one can install old operating systems and old applications. Such a virtual machine can be hosted on a modern computer and all the instructions from the installed software applications are translated into commands on the host machine. The users can thus use the software in the same way as in the past, as it appears as an application in a virtual desktop that is familiar to them.

Figure 1. Executive Archive Framework complements the traditional electronic archive practices with procedures and technical components that ensure the long-term use of software. The Software Library platform enables software hosting, software storage and the remote use of software to process archived data.

Reconstruction of data analyses

In highly regulated sectors, such as pharmaceuticals and life-sciences, digital data must be retained for decades. Complex scientific protocols typically involve the use of sophisticated instruments to collect data and highly specialized software to interpret and analyze it. Files containing raw instrument data are stored in electronic archives and regularly assessed for data integrity, using, for example, checksum methods. The corresponding software must remain functional for reconstruction of past studies and for reproducibility of data analyses. This is a challenge, because both the related instruments and the software are decommissioned after a while and removed from operational use.

The INTACT Software Library is therefore used to create virtual installations of the software needed to reconstruct past studies. The key requirement is to ensure that the software installation in the virtualized environments leads to the same results as the software originally run in the labs where the studies were conducted. Furthermore, access to the virtualized software needs to be controlled to adhere to the licensing agreement with the software vendor and to provide a detailed audit trail of software use that is required by internal policies and regulations.

From the user’s perspective, the virtual machines in the Software Library are conveniently accessed through Virtual Desktops from any modern browser (Figure 2). This is achieved by the careful isolation of the virtual machines that host old operating systems and, therefore, should not be exposed to the Internet. A copy of the data from the electronic archive is safely transferred into the Software Library using a Transfer Desktop.

Figure 2. The Software Library can be used from any modern browser. Through the specially configured Transfer Desktop, the user can transfer a copy of the data into the Software Library and then process it with a specific piece of software. Each virtual machine with the software appears in the Browser Tab as a separate desktop.

Once in the Software Library environment, the data is used within the virtual machine that hosts the software needed to process the data. Each virtual machine appears as a separate tab in the browser. Figure 3 shows an example of a virtual machine hosting Windows XP operating system and Analyst 1.4.2 software produced by Sciex. The user can use the software and the data in the same way as originally done in the lab.

Figure 3. Analyst v.1.4.2. software by Sciex is installed in a virtual machine running Windows XP and used through a virtual desktop that can be accessed through any compatible browser. The virtualized Windows XP desktop appears in a separate browser tab and cannot be accessed by services from the Internet.

Reconstruction of digital art

Over the past few decades, artists have used Internet technologies to create Internet art and reach broad online audiences. An important enabler of such artists’ creativity was Adobe Flash, formerly commercialized by Macromedia Flash and FutureSplash. Flash enabled flexible use of text, vector graphics, video and audio for the production of animations, games, rich web and desktop applications, and browser embedded video players. End users could conveniently view Flash content via the Flash Player within web browsers.

Furthermore, with the standardization and adoption of Virtual Reality Modeling Language (VRML)10, authors could also specify platform-independent 3D “worlds”, including objects with rich structures, textures and interaction models. The Cortona3D11 viewer for VMRL also enabled non-standard support for combining VRML with Flash textures, providing additional creative opportunities.

Unfortunately, with the recent obsolescence of Adobe Flash and Flash Player, all digital art that uses Flash is affected and requires a concerted reconstruction effort to remain accessible and usable.

For example, Intact Digital has worked with a contemporary artist Michael Takeo Magruder12 on the reconstruction of World[s] (2006(v1.0), 2009(v1.1))13 [12] that relies on VRML and Flash plug-ins to enable the textured 3D rendering of audio-visual art elements. The artist maintains a website (http://www.takeo.org) with detailed descriptions of art pieces, including documentation, videos, and still images, and manages a repository of digital media files and selected versions of software that were used to create and publish the artworks.

Through a collaborative effort, the artist and the Intact Digital technical staff have created an installation of the art within the Software Library environment that can be used by audiences online (Figures 4 & 5).

Figure 4. Software Library hosts an installation of the World[s] artwork by Michael Takeo Magruder that includes Flash and Cortona3D plug-ins for the browsers, in a secure Software Library environment.
Figure 5. Cortona3D v.7.0 with Internet Explorer 11.1790.17763.0 and Macromedia Flash ActiveX plug-in 8r42 are installed within an isolated VM with a GPU, to provide reliable and secure access to World[s] from any modern browser.

Long-term care of software

While the technical aspects of software obsolescence present obvious challenges, enabling digital continuity requires a holistic approach considering a broader range of concerns. For long-term software maintenance within the Software Library, we have developed quality assurance practices that cover technological, legal, operational and human factors (Figure 6).

Figure 6. Long-term software care requires consideration of multiple factors and quality assurance procedures for identifying and mitigating potential risks.

Among the technology factors we cover the security and integrity of the long-term storage of software files and documentation, computing environments needed to install and run the software, and methods for secure access and use of installed software.

The legal aspects, for example, involve the licensing of all the technical components involved in the installation and use of software. Operational activities involve continual maintenance of the computing environment, and the protection and possible re-installation of software in order to prolong its use. Human factors are often overlooked, yet they are absolutely essential. It is thus important to provide training for using legacy software installations, to retain skills and ensure that younger generations can operate historical software and historical data reliably and efficiently. The secure use of software installations is particularly important, since old software is fundamentally non-secure if exposed to the contemporary ecosystem. Without upgrades it may not run on the latest operating systems and may be vulnerable to new types of cyberattacks.

Overall, with a systematic and principles approach offered by the Executable Archive framework and the convenient use of software hosted in the Software Library, we can effectively address a wide range of digital obsolescence scenarios. Building on these foundations, we can make considerable progress towards digital continuity and keep on innovating, knowing that our digital heritage is safe and accessible for as long as we need it.