 |
Authentication of Digital Objects:
Lessons from a Historian's Research
by Charles T. Cullen
The issues stemming from authenticating digital objects are quite
similar, and in some cases identical, to those relating to holographs
or printed books. Everyone dealing with important material in any
form should approach it with a bit of skepticism, but scholars especially
need to question what it is they are using. In other words, they
need to authenticate all documentation they use in the processes
of learning and of creating new scholarship. An authentic object
is one whose integrity is intactone that is and can be proven
or accepted to be what its owners say it is. It matters little whether
the object is handwritten, printed, or in digital form.
Over time, we have established various measures of authenticity
for analog forms that we trust almost without question. Our trust
is, however, much greater for printed books than for handwritten
objects. In fact, handwritten objects raise many of the same questions
of authenticity as digital objects do. The difference is that in
the case of the former, the answers may be more easily found. Take
Thomas Jefferson's manuscript "Report on the Navigation of the
Mississippi," for example. Could he have written it? Is it his
handwriting? Is the paper watermarked, and from the appropriate time
period? Is the ink contemporary? Do other copies of the manuscript
exist? Has its recipient or any other contemporary endorsed it? Is
there other internal evidence? Who has described it for us? Has it
been identified by a trusted third party?
Is a book authentic? Who published it, and who wrote it? Can they
be trusted (are they worthy of one's research time)? Is the rare
book what it purports to be? Is the manuscript correspondence actually
by the person to whom it is attributed, and is its date accurate?
These questions are now being asked more openly of objects that originate
in digital form because we have not yet adopted practices or standards
for providing ready answers to them. When objects are presented digitally,
deciding what is required to authenticate them may be informed from
past practices with non-digital objects.
Two experiences with paper objects inform my views of this subject.
The first is a multi-page autograph document that lies in the John
Marshall Papers at the Virginia State Library. It is labeled in the
hand that wrote the entire piece, "John Marshall's Notes on
Evidence in Commonwealth v. Randolph, 1796." Although the title
itself might raise some question about who penned it (How often does
an authoreven an eighteenth-century authoruse his own
name in a title of one of his documents?), this document has been
used for decades for the source of historical articles and at least
one full-length book on the investigation of Richard Randolph for
murder. Randolph, a member of the famed Randolph family of Virginia,
was related to Marshall and to Thomas Jefferson and to many other
members of Virginia's "first families."
Examination of the writing by those familiar with John Marshall's
hand, however, quickly reveals that he did not pen this document.
Knowing who did write it is important, but does not help make it
more authentic as a Marshall document. The possibility that someone
in possession of Marshall's holograph could have copied the document
raises new questions, not the least of which assigns significant
importance to the value of the original of a document, regardless
of its form. Internal evidence, obtained by a close reading of this
document, reveals that it might be a partial transcript of a hearing
in Cumberland County Court where witnesses are questioned by attorneys,
and it has been used by historians as a partial record of Randolph's "trial." But
Marshall's name never appears as one of the questioners, and a knowledge
of Virginia law at the time would reveal that whatever was taking
place could not be an actual trial, because white men could not be
tried for felonies at the county court level during that period.
In short, efforts to authenticate this document raise more questions
than they answer. At the least, such efforts reveal that the document
may not be what many had long thought it to be, and that it may not
be even what its title says it is.
This example is somewhat esoteric, to be sure, because it is unlikely
that one would use a similar digital document without asking the
questions that eventually were asked of the document attributed to
Marshall. But the questions asked of it suggest attributes that must
be held by a holograph as well as by a digital object that is to
be regarded as authentic. Is it the author's work or a copy? Is it
what its title purports it to be? What tests can be applied to answer
these questions convincingly?
A second example is more to the point. In the collection of Thomas
Jefferson's papers at the Library of Congress is a document that
appears to be a list of letters written and received between 1791
and 1793, a period of time during which Jefferson was Secretary of
State. An examination of the handwriting reveals that it is most
likely Jefferson's, but the list is unlike his other journals of
letters sent and received. A close look at the original document
suggests that it was written in only one or two sittings (the ink
changes only once or twice), rather than over three years. The most
significant evidence relating to this document's authenticity lies
in the paper itself. Holding it before a light source reveals a watermark
that indicates the paper was manufactured in 1804. The document,
therefore, could not be an authentic 17911793 document.
Almost all these tests can be applied to digital objects, and they
need to be. But because digital objects bear less evidence of authorship,
provenance, originality, and other commonly accepted attributes than
do analog objects, the former are subject to additional suspicion.
Tests must be devised and administered to authenticate them.
In many cases, problems of authentication arising from objects that
originate as digits are obvious. In trying to find solutions to those
problems, however, we must carefully test all suggestions to ensure
that they do not themselves open new issues that may be inherent
in this medium. The problems of preserving digital objects have received
more attention than have questions of authentication (people, I suppose,
are less worried about authenticity than about preservation). But
why preserve what is not authentic? Might the preservation of a digital
object imply an endorsement of authenticity, even if nothing else
is done to it? More than one archivist has stated that the only sure
means of preserving a digital object is to save a printed copy. Concerns
with format codes, migrations from version to version, dependence
on hardwarewould all be solved by printing a copy (or many
copies) and putting it (them) in a safe place. Do that to a digital
object before confronting the questions of authenticity, and all
that is valuable may be lost. Converting a digital object from one
program to another, or migrating it from version to version, could
present problems of authenticity that may or may not be solved by
careful attention to provenance.
A digital object must be authenticated at the time of its creation
by a means that will convey a high degree of confidence to all users,
including subsequent use by the originator. Clifford Lynch wrote
an interesting and convincing article on the integrity of digital
information, published in the December 1994 issue of the Journal
of the American Society of Information Science. He seems to assume,
from traditional experience perhaps, that readers will be responsible
for authenticating copies being used on the basis of cataloging data
to which they must be alert. Retrieving electronic files by title,
for example, might lead one to a revised work, different from the
original. The reader must exercise caution, Lynch writes, and be
ready to detect signs of alteration. "The expectation should
be that violations of integrity cannot be trivially accomplished," he
says. Accepting this in the world of printed objects is relatively
easy. It is much more difficult in the realm of electronic digital
information.
Andy Hopper of Cambridge University suggests an authentication strategy
that is worthy of consideration, if not adoption. In his system,
the concept of a trusted third party is borrowed from the print world.
According to this concept, trusted librarians help authenticate their
print holdings through recognized acquisition processes, accepted
cataloging procedures, and careful stewardship of their collections,
especially those in manuscript form. If a special collection librarian
tells us, either directly or by means of a catalog card, that the
book in hand is one of two extant copies of Ariosto's Orlando
Furioso printed on vellum in Venice in 1542, and that it was
prepared for the dauphin of France, the library's and the librarian's
reputation go a long way toward instilling some degree of confidence
that the document is indeed authentic. Moreover, all of this information
may be checked. If another librarian delivers to a reader a box of
letters cataloged as Ernest Hemingway's, authentication is assumed
until internal or physical evidence suggests someone has made a mistake.
Knowing that the materialshard-copy objectshave gone
through a process of description and identification, if not authentication,
conveys a sense of trust that they are authentic, at least until
proved otherwise. Some of the problems of description that help authenticate
printed special collection objects have similar, if not identical,
examples in the digital world. Take one final example as evidence:
in the Newberry Library's special collections is a printed copy of
the classic book on rhetoric in Renaissance England, Arte of Rhetorique by
Thomas Wilson (15251581). This particular copy is identified
as having belonged to Elizabeth I as part of her royal library, and
it is authenticated as such by its original binding, which bears
the mark of the royal arms. Book historians know that until the time
of James I, the royal arms were put only on the books within the
monarch's own library. (After 1603, King James allowed them to be
placed on books bound for other members of court.) Elizabeth's coat
of arms on the binding of this copy of Wilson's Rhetorique therefore
marks it as authentic, as long as external evidence does not dispute
it. (If it could be shown that the binding was not sixteenth century,
for example, or that it resembled the work of a sixteenth-century
forger, the authenticity might be questioned).
Some accepted system of similar assumption of authentication needs
to exist in the digital world, but it is more difficult to achieve
because digital material is more changeable, accidentally or deliberately.
Andy Hopper and others suggest that some means of marking digital
objects could help solve many of the problems of authentication.
Hopper argues that libraries might serve as authenticators by marking
digital objects by some means that would remove doubt as to their
characteristics at time of origin. A method must be developed whereby
a trusted third party, ideally a trusted librarian, would put a marker
on a digital documenta marker that could not be predicted or
devised (guessed)that would mark the document's time and date.
The marker might be a number based on solar rays at various times
during the day, a number large enough to prevent guessing (Hopper
suggests 100 digits). A professor writing a paper could send the
document to the librarian to be marked, and it could then be returned
to and held by the author. In the future, the object could be authenticated
by its marker, regardless of who held it. Any change in the document
would remove the marker. This procedure would be used by librarians
who receive digital objects from donors. The marker would ensure
that digital objects are as authentic as analog objects at time of
cataloging.
Despite its science-fiction flavor, such a method seems to meet
accepted tests of authenticity. A trusted third party can claim nothing
more about an object, analog or digital, than what can be cataloged,
and that information derives largely from physical evidence. Identifying
an object in a catalog record or a collection description puts a
marker on it that most of us use as the first step in the process
of authentication. Is the document what it purports to be or what
its owner claims it to be? Scholars often require means to test the
cataloger, and the physical attributes of analog objects offer more
opportunities to do such testing than do those of digital objects.
Handwriting, publishing history, bindings, watermarks, inks, and
various forms of internal evidence provide answers to questions of
authenticity in analog objects that are lacking in digital objects.
Digital objects have attributes that can be used to help with authentication,
but none is sufficiently trustworthy or stable to be acceptable unless
a workable system of certain marking can be devised.
Certifying that a digital object is the product of its author is
difficult when the object originates in electronic form. Without
a deliberate and distinctive marking caused by the author that could
not be guessed by another or altered by anyone, it seems impossible
to authenticate an electronic document beyond doubt. Authors of files
or images must take steps to establish authorship of their work;
if not, our only option is to accept the assertions of others. Electronic
files left behind by someone who has not taken action to establish
authorship are subject to suspicion if authorship is asserted by
anyone else at the time of "cataloging." This leaves us
where we have been all alongat the mercy of catalogers. But,
in the case of a digital object, we are actually worse off than we
would be if we were dealing with an analog object. This is because
we lack the physical evidence provided by analog objectsevidence
that offers the means to test the cataloger. This ability to test
both reassures the user and helps keep the cataloger honest. I find
no corollary in the digital object realm.
The concern over authenticating digital copies of analog objects
is almost as important as that relating to objects that originate
in digital form. Scholars are keenly interested in having access
to documentary evidence in digital form, and librarians have begun
to consider digitization a desirable means of preservation, in spite
of the recognized problems inherent in it. Those who hope to use
this material, once it has been digitized, must be able to rely on
its authenticity, just as they have become accustomed to do in all
the forms currently available. Documentary editors, as well as librarians,
have new responsibilities as they publish and provide access to their
materials in digital form with all the value they have added intact.
The work of documentary editors offers some insight into the questions
raised over authenticity of digital objects, especially those that
derive from analog or holographic objects.
The first task of a documentary editor who is working on an edition
of a subject's papers is to locate all the objects that have ever
existed as part of the corpus, incoming as well as outgoing. This
sometimes requires reliance on copies of papers that evidence suggests
once existed in original form but which have not been found. Once
the collection is organized, each item must be dealt with separately.
That is the first stage for authentication tests, starting with the
question of whether the item is what it appears to be. All the available
physical attributes assist in answering these questions, but sometimes
only internal evidence leads to a final answer (as in the Marshall
and Jefferson examples described earlier). The editor is obliged
to share these findings with readers and to describe the item in
such a way that few, if any, questions remain about the document
as object. Not unimportant in this description is all available information
about other copies of an item, be they photocopies, carbons, letterpress,
polygraph, drafts, or additional holographs. Knowing as much as the
editor about all copies is the only sure way for other readers to
test the "cataloger's" description, and only by having
this information available can a reader have full confidence that
all questions of authenticity have been asked. In preparing digital
files of historic documents, editors begin their publication by attaching
a full document description to a transcript. This is the scholar's
seal of authenticity, as it were, or at least as much of a seal as
a scholarly editor can provide.
Preparing a digital transcript of a historic object introduces new
problems to the issue of authentication. How do we know the transcription
is accurate and that it is exactly what the editor prepared originally?
The method of providing access to journal articles adopted by JSTOR
may offer the best answer for authenticating modern digital transcripts
of manuscripts or printed material that originated in analog form.
They provide the user with a digital transcription of the text, which
is fully searchable and otherwise subject to all the vagaries of
digital files. They also provide an image of the original text. If
both copies could carry some form of marking that could not be manipulated,
the problem of authentication would be solved. This system should
work quite well for documentary editors and the readers of their
digital publications. Providing an image of the document that is
transcribed would be an important improvement over present forms
of presentation, because it would permit easy verification of transcriptions.
Inaccurate transcriptions are the downfall of documentary editors
(as they should be), and mistakes often go undetected. The reader,
who may have a high level of confidence in the scholarly work of
the editor, is left to assume that the transcription is accurate
and authentic. Having a means of testing this assumption would be
a great improvement.
Related problems that arise from considerations of authenticity
seem to offer little to assist us in answering the primary question.
Creating a digital file, and even marking it in such a way that will
ensure authenticating it as my own, will mean little if the file
itself cannot be read at any point in the future. If the file cannot
be read, it cannot be authenticated as mine. (It would be even more
maddening if the file could be authenticated but not read.) The same
can be said for provenance. If a file can be marked in such a way
that its authenticity is assured, issues of its subsequent provenance
might not matter in questioning its authenticity. But if a file cannot
be read, its provenance will mean little, even if it can be tracked
over a long period. Without a marker of authenticity, provenance
of a digital object would be of limited use in establishing authenticity.
It would help test the cataloger, but the current technology would
render uncertain any assertions of authenticity. The instability
of software alone would introduce questions that would challenge
any claims of authenticity suggested by a trusted provenance.
Paul Conway (1999) says the existence of digital objects moves challenges
of preservation from guaranteeing the physical integrity of objects
to assuring their intellectual integrity, including their authenticity.
He adds that librarians can control this by "authenticating
access procedures and documenting successive modifications" to
digital files. Authenticating access procedures may affect provenance
more than the integrity of the digital object itself, but it would
be difficult to guarantee authentication with only this control.
It seems that, in this argument, the alteration of an original record
is acceptable as long as it is documented. Acceptance of changes
with documentation is unreasonable over time and places unnecessary
burdens on users. In this case, as in others, preservation without
authentication results in a loss of intellectual integrity.
We are not close to having a means of marking digital documents
that cannot be challengeda means that would establish authenticity.
Absent such a technique, we are left to consider what other attributes,
if any, might approach the establishment of authenticity. Few suggest
any high degree of confidence that would come close to what we have
for analog materials, but consideration of the problem raises some
issues that relate to other concepts that bear on the problem. How
confident can one be when an object whose authentication is crucial
depends on electricity for its existence? Surely there are higher
degrees of confidence in some cases than in others, but something
more than provenance or traditional testing methods established for
analog objects is needed. I believe it is easier to describe the
characteristics of an authentic digital object than to support the
authentication beyond a reasonable doubt. My definition is conditional;
it depends on an object's capability of being proved to be authentic.
Establishing a method of authentication of digital objects that would
be unconditional may be possible. At the least, we must agree on
some means of testing the authentication of digital objects. The
consequences of not doing so are dire.
REFERENCE
Conway, Paul. 1999. The Relevance of Preservation in a Digital World.
Technical Leaflet, Section 5, Leaflet 5, p. 8. Andover Mass.: Northeast
Document Center. Available from http://www.nedcc.org/plam3/tleaf55.htm.
Next Previous
Return to CLIR Home Page >> |