Authenticity in Perspective • CLIR

by Abby Smith

This is neither a commentary on the preceding papers nor a summary of the discussions held on January 24. Rather, I will try to give a sense of various views expressed by the presenters, identify the issues raised in light of the subsequent discussions, and highlight the implications of the day’s proceedings.

Some Ground Rules

In his seminal work, Principia Ethica, the moral philosopher and epistemologist G. E. Moore remarked that, in most complex matters, difficulties and disagreements “are mainly due to a very simple cause: namely to attempt to answer questions, without first discovering precisely what question it is which you desire to answer” ([1902] 1988). Conference participants clearly agreed about the question: What is an authentic digital object, and what are the core elements that, if missing, would render that object something other than what it purports to be? The difficulties arose from participants’ legitimate, and perhaps predictable, disagreements about which elements are intrinsic to a digital object and which elements are contingent on context, technologies, encoding schemes and display methods, or other externalities.

As anticipated, communities differ in their understanding of what constitutes intrinsic features of a digital object; these differences mirror their understanding of authenticity of analog objects. After all, the uses of digital and analog information by historians, archivists, publishers, or scientists vary greatly. Most of the workshop participants grounded their thinking about digital objects and their identity in the fitness of these objects for some specified function or purpose, such as a record that bears evidence; a historical source that bears witness to an event, a time, or a life; or data that could produce a replicable experiment. In other words, what was deemed intrinsic to an object was determined by the purpose for which it was created (or, in the case of archival records, the most narrowly defined of digital objects under discussion, the purpose of bearing evidence about an object’s creation and intended use). Regrettably, as Moore pointed out, evidence cannot be adduced for things intrinsic. “From no truth, except themselves alone, can it be inferred that [intrinsic things] are either true or false.”

The Key Issues

Perhaps for that reason if no other, neither the presenters nor the workshop participants addressed systematically and directly the question of what an authentic digital object is and what the core attributes are that, if missing, would render the object something other than what it purports to be. However, threaded throughout the discussion were various responses to the other questions raised in the charge.

· If all information–textual, numeric, audio, and visual–exists as a bit stream, what does that imply for the concept of format and its role as an attribute essential to the object?

Clifford Lynch proposed a hierarchy of complexity of representation: bit stream, data, documents, interactive objects (i.e., engaging sensory perceptions), and experiential works (e.g., virtual reality). That schema resonated with many of the participants. David Levy pointed out that we might never resolve the paradox of bits being “the stuff” with the fact that bits are inaccessible to our senses and perceptual abilities. This is not how we have dealt with recorded information before. Given that we are just beginning to explore the relationship between humans and computing machines, it is hard to think ahead about how we, as physical creatures, will relate to virtual bits.

In the analog realm, many features of recorded information are an aspect of the object itself and so will not translate into the digital environment. We are generally unaware of how often we use our judgment about the physical integrity of recorded information to stand in for a judgment about the integrity of the text. We make instantaneous inferences about a text that we receive with portions blacked out. Evidence provided by the physical object, however, has no counterpart in the digital world; deletions in an electronic text would not be visible to the eye and, consequently, would not raise our suspicion. A digital object has no independent physical manifestation that can accrete information about its fate in this world (such as bookplates, marginalia, coffee mug stains, and so forth). For similarly effective external evidence for a digital object, we must create such things as metadata, which in turn create their own preservation and readability concerns. Once metadata are separated from their object, it is hard to reattach them. It is a fact of life on the Internet that in e-mail correspondence, content will be cut and pasted into some extraneous document and then widely disseminated without its originating contextual metadata. Cutting and pasting in the analog world, by contrast, leave physical traces that can alert the recipient or reader to the document’s provenance.

· Does the concept of an original have meaning in the digital environment?

David Levy defined the copying of digital information as a manufacturing process. In effect, a digital file is like a printing plate. Bits may be the source of a document, but they are not and can never be the original. Moreover, there are no unique copies in the digital realm unless they result from a mistake in the manufacturing process, that is, the process of copying the file onto the screen.Jeff Rothenberg argued strongly for the opposite point of view, saying that a digital-original is any representation of a digital informational entity that has the maximum possible likelihood of retaining all meaningful and relevant aspects of the entity. This echoed the archival point of view, which suggests that the digital-original is just the same (i.e., works in just the same way) as the analog original. The value of an original is that it is as complete as possible and it is reliable because of the control exercised in its creation. In an archives, the digital-original is simply the first record received.

But this begs the question of what we really mean by “original.” In the case of a digital file, we are referring not to an object per se, but to a fixed set of properties that contain information about the digital object and that constitute the digital object itself. Again, this would not make the original unique. One of the difficulties in talking about the issue of “original” is that there is no object fixity in the digital world, as there is in the analog world. As Clifford Lynch helpfully explained, in the analog world, I give you the object and now you have it and I do not. In the digital world, I share with you a file that has the same properties as the file I have-the original, as it were. Now I have it, and you have it, too. But what, precisely, is the “it,” the file? It could be characterized as a “fixed set of properties.”

All workshop attendees agreed that digital technology obviates the idea of a unique item, because the very act of viewing, say, a digital photograph means creating a copy (on screen). This fact has obvious implications for copyright.

· What role does provenance play in establishing the authenticity of a digital object?

The role of provenance is as important in the digital world as in the analog world, if not more so. For the archivists, the role of provenance is well defined. Archives can provide evidence of authenticity by documenting the chain of transmission and custody, and they have internal controls that reduce to an acceptable level the risk of tampering. Within the controlled environment of an archives, the provenance of records is theoretically secure. (Whatever happened to the item before it came to the archives, and whatever happens to it when it leaves, may be another matter altogether.) Archives, of course, deal with limited types of items. They are records-things created in the order of doing business. The truth value of a record is not what makes it authentic. A record might contain false information but still be authentic as a record.

In the larger context of libraries and beyond, the role of provenance is far more complicated. Archives can serve as a trusted third party only in a relatively controlled environment. Whenever information crosses administrative and technological boundaries, as it does in the more permeable world of publishers and libraries, the role of trusted third parties, while critical for authenticity, is harder to develop and maintain. The partnership between libraries and publishers, a crucial link in the ultimate relationship between author and reader, has evolved slowly, at times painfully, over centuries, and will continue to evolve. Nonetheless, the digital environment will still need trusted third parties to store material, and the libraries and publishers will need to agree on protocols for digital publishing and preservation that work as effectively as have those of the past.

Interestingly, the scholarparticipants suggested that technological solutions to the problem will probably emerge that would obviate the need for trusted third parties. Such solutions may include, for example, embedding texts, documents, images, and the like with various warrants (e.g., time stamps, encryption, digital signatures, and watermarks). The technologists replied with skepticism, saying that there is no technological solution that does not itself involve the transfer of trust to a third party. Encryption-for example, public key infrastructure (PKI)-and digital signatures are simply means of transferring risk to a trusted third party. Those technological solutions are as weak or as strong as the trusted third party. To devise technical solutions to what is, in their view, essentially a social challenge is to engender an “arms race” among hackers and their police.

· What implications for authenticity, if any, are there in the fact that digital objects are contingent on software, hardware, network, and other dependencies?

Dependencies mean either nothing or everything. What if you have a digital object that you cannot read because you do not have the right software? Jeff Rothenberg argued that you cannot know something is authentic unless you can read it. However, to the archivists, this constituted confusion between meaning and authenticity. In their opinion, you do not need to view the contents of something to say that it is an authentic record. Take, for example, the case of the Rosetta Stone. For centuries, the meaning of the stone was beyond reach, because no one could decipher the codes in which it was written. It was, nonetheless, an authentic record of the time in which it was created. You could not say that it was inauthentic in the eighteenth century and that it became authentic only when-and because-Champollion decoded it.

But publishers, historians, and computer scientists were quick to point out that fixity of a text, much as we take it for granted, is a relatively recent phenomenon. It was the printing press that helped to create the notion of a fixed text. There was little or no fixity of text before printing, and none exists in unpublished materials such as hand-developed photographs or manuscripts. When a publisher goes to press with an error, he or she feels an obligation to publish an errata sheet. In manuscripts, however, there are no errata sheets; likewise, there need be no such sheets in the digital world. The publisher of a digital resource can simply go in and correct the text. Whether such a publisher chooses to note that change or not is related to his or her sense of obligation to the publication record, not to the truth of the text.

The variability of digital formats is great and will continue to be so. It should not necessarily pose problems to matters of authenticity, depending on how one defines the “fixed set of properties” that constitute the file. After all, difference in display monitors can significantly alter the way things appear to us, even though they display the exact same bit stream. Is the bit stream displayed on my monitor the same document as the one you have, if the bit stream is identical but the appearance it generates in your monitor is different?

Proposed Answers

David Levy proposed that, for purposes of proving something is authentic, we could use the following three methods-all implying a trusted third party for implementation-that answer the question of authenticity by stipulating in reference to what something is authentic.

Use of reference object (Does the object match this object?)
Metadata (Does the object match this description of an object?)
Digital recipe (Could we recreate or reassemble an authentic object using this set of instructions?)

Implications for Preservation

Authenticity, although seldom talked about, is deeply implicated in even the routine decisions we make about preservation. Fortunately, issues of authenticity have seldom been problematical in the print regime, at least for the past century or so. Even the new recording media for audio and video present fewer authenticity issues for professionals than they used to.

In the analog regime, one could not reasonably say that issues of preservation are deeply implicated in authenticity. Any investigation into authenticity per se might, therefore, include preservation, but should not be subsumed by it. It is not clear that this is the case with digital objects. While future discussions of authenticity should be careful to investigate all aspects of the authenticity issue without prejudice, there are nonetheless certain nagging facts about how preservation operates in the digital realm that warrant consideration.

We have known for some time that all of our operating assumptions about selection for preservation are turned on their head by digital technology. Preservation has operated by making choices about what objects from the past should endure into the future. For collecting institutions, be they libraries, archives, museums, or historical societies, the commitment to preserve is made at the time of acquisition. However, preservation actions-e.g., rehousing into acid-free folders or stabilizing a fragile book-are sometimes taken years after accessioning and, to our shame, years after the physical condition warranted intervention. But there are also materials-those collected “just in time” rather than “just in case”-that may not carry an implied commitment to preserve when they are acquired because someone else has preserved them. In those cases, the collecting institution can make a decision about how valuable the item is likely to be in the future, when an immediate demand for the item no longer exists.

No matter how the preservation selection and action occur, in the analog realm we choose what to preserve well after items have been created, authenticated, and valorized through publishing or, in the case of archives, during appraisal. The item has gone through several processes in which it is selected-from the publisher to the acquisitions specialist to the curator and preservation specialist.

In the digital world, however, the act of selecting for preservation has become a process of constant reselection. We have to intervene continually to keep digital files alive. We cannot put a digital file on a shelf and decide later about preservation intervention. Storage means active intervention. One must refresh data regularly to keep it alive. It is as if suddenly every item in a library-every single book, manuscript leaf, and page of newsprint-demanded preservation action every 18 or 24 months. We do not lose books just because we do not use them, but it is possible to lose digital data just because no one wants access to it within a year or two of its creation. Indeed, many are saying that the preservation of digital data should begin at time of creation. The creator should make all decisions about file format, software and hardware, and even complexity of documentation, in light of the intended longevity of the object. This need to think prospectively about persistence introduces a strong element of intentionality among all actors in the drama of information creation, dissemination, and consumption that has implications for the meaning of authenticity. There is an accidental nature to the evidence borne by physical artifacts that serves to strengthen an item’s claim to authenticity. In one sense of the word, commonly used in scientific laboratories, “artifact” connotes the unintended byproduct of a process, a byproduct usually irrelevant to the outcome. Similarly, among the various physical media on which information is recorded there are byproducts of the recording or printing or manufacturing process that give vital clues to the authenticity of the object precisely because those byproducts were not intended. This is what we lose in the digital environment.

Authenticity in the Perspective of the Future

Fortunately, there are limited circumstances in which authenticity of information is critically important: biomedical data, legal documents, national security intelligence, proprietary trade and commercial information, and public records. In those cases, information is usually created and managed in controlled environments to reduce the risk of intentional or inadvertent corruption to an acceptable level. But there is much information in digital form that we rely on to be what it appears to be; for example, the historical documents we find on library Web sites, the e-mail messages we receive from colleagues in the course of doing business, photographs that we take on digital cameras, or the online news sources we check for stock quotes. We do not want to live in a world of constant suspicion that what we see is not what actually is. What we, as creators and users of information, need to do is to become digitally literate and to understand better how our machines fulfill the commands we send to them. Specific communities, such as scholars, scientists, and journalists, must decide what information they need to place high trust in and to develop protocols for ensuring the integrity of that information. This means creating appropriate documentation, following standard procedures that leave a transparent trail, and respecting those documents above others. The truth value of most information will always be a matter about which the user must make judgments. These judgments are not guaranteed in the print regime, nor will they be in the digital.

Looking ahead, we can reasonably expect that some digital objects will warrant greater skepticism than their analog counterparts. It took centuries for users of print materials to develop the web of trust that now undergirds our current system of publication, dissemination, and preservation. Publishers, libraries, and readers each have their own responsibilities to keep the filaments of that web strong. Making the transition to a trusted digital environment will require much conscious reexamination of what we take for granted in the print and audiovisual media on which we rely. We can begin by learning more about this new medium of digital information and by clarifying the terms we borrow from the physical world of analog materials to describe the new phenomena of virtual objects.

REFERENCE

Moore, G. E. [1902] 1988. Principia Ethica. Reprint. Amherst, N.Y.: Prometheus Books.