Scientific literature, the published record of the history of sciences, is one of humanity’s greatest creations. I’m somewhat biased, but I think it is something we can all stand behind. The collection of ideas, methods, data, and discoveries-about our bodies and those of all the other animals in the world around us-especially as it pertains to human diseases, is an unbelievably rich and important creation of society over the last centuries.
The literature itself, one product of this endeavor, reflects the tremendous investment that society has made in raw dollar terms. Each year, probably $100 billion is devoted in one way or another to support scientific research, with the bulk of that going to medical research. It represents the life’s labor of many of our brightest and most dedicated citizens, who have devoted their careers to trying to find ways of making our lives better, in both the material and intellectual sense, for people in this country and for the entire world.
The transformation that has occurred in the last 10 years, from a world in which we communicated primarily in print to one in which we communicate primarily digitally, has profound implications not only for how we access information, but also for how we use it. The potential to discover new ways of using the accumulated scientific knowledge is essentially infinite and has barely begun to be tapped.
A Wealth of Information Remains Out of Bounds for Most
The premise of this talk, which also motivates much of my own work, is that this potential we all dream about will remain largely unrealized as long as the scientific community persists in distributing information, and supporting that distribution, using practices that were developed for the print age and then just grafted wholesale onto the electronic age. I believe, and I think a growing number of scientists would agree, that it is both morally and practically absurd that we continue to grant the ownership of the scholarly product and the scientific product of the world to scientific publishers. I’ll explain why I think this is true and tell you what some of us are doing to try to change that.
It is a travesty that this has occurred in science, because it is preventing people from doing interesting and new things in and with the literature. That is why I got involved. I studied genomes and am primarily a computational biologist. My main experimental tool is the computer, and I spend most of my days trying to recognize linkages between pieces of information that came either from experimental data or from the scientific literature. Six or seven years ago, large chunks of the scientific literature first started to become available in electronic form, and some of the most prominent journals started to be published electronically. I was a graduate student at the time, so I was somewhat naïve, but it seemed natural and obvious to me that we should be able to do something really interesting and useful with the text contained in all those papers, treating it not just as words on a piece of paper but rather as data.
I started to think about databases that would link the human genome sequence to the literature on the function of all of these genes and would allow people to navigate freely from the sequence to the literature. I began to imagine building these things, and I suddenly came upon a problem: it was impossible for me, as a research scientist, to actually do that. It was neither practically nor legally possible, and that seemed completely ridiculous to me. The scientific literature was produced by scientists, for people like me to use. The primary motivation for people who publish their work is that others will read it and use it. That is why I’m a scientist; that is why scientists are scientists. The fact that I could not do that just struck me as absurd.
Actually, this should be a scandal, both within the scientific community, where it is starting to get attention, and among the general public, who paid for this work in order to make it useful to them. It is silly that I cannot do this kind of research, and also ridiculous that people who have an interest in accessing this information but who do not have the good fortune of working for a major research university, or having access to a major research university library, cannot do this.
This is not just a theoretical idea. The chief executive officer of Elsevier is pretty happy to go around dismissing the idea that there is anybody in the world besides research scientists at Harvard and Stanford and Berkeley who actually want access to the scientific literature. But there are lots of other people, in this country and abroad, who have a real interest in accessing the scientific literature but cannot. Scientists at research universities in Zimbabwe do not have enough money to subscribe to even one or two journals, let alone all the scientific literature they are interested in. There are high school students and students and professors at small universities. My mother happens to be on the faculty at a small Catholic college in Washington and cannot access any of the literature she is interested in using in her classes. So I have to surreptitiously send it to her on the side. (Sorry, I’m probably breaking a law by doing that.)
Even more important, this is an age in which we as citizens are being asked to take a much more active role in our own health care. When my doctor says to me, “I don’t know, go study it yourself,” I’m lucky. I can go to a medical library and read all this information. But say I’m a patient in a rural hospital who has been diagnosed with a relatively rare form of cancer that the federal government has been paying researchers to study and find ways of treating. Today, if I’m that patient, I cannot readily access the information that describes research done for my benefit and that would help me immediately and practically play a greater role in understanding my own health. There are myriad examples of people throughout the world who do not have the opportunity to access the literature that is available today, solely because of the way in which we have decided to structure the distribution of scientific information.
Those Who Do the Work Should Own the Literature
I have great faith in the ability of the scientific community, the library community, and the business community to discover new and interesting ways to use this literature. For my own purposes, I’m thinking about the creation of massive databases with literature on genome sequences. We build them in our lab from genome sequences and all sorts of other pieces of knowledge that we collect and disseminate. But as scientists, we are failing to include in that body of accessible information the most important element: the accumulated ideas, results, and conclusions of scientific research that are contained in the scientific literature.
Currently, scientific journals own the scientific literature. There is no other way to describe it. Journals get the copyright, which they wield as effectively as if they were the owners. I won’t belabor the question of whether or not this is reasonable. We spend way too much of our time worrying about whether journals should own the literature. To me, it is obvious: they clearly should not.
Journals play an important role, a critical role, in bringing a scientific work to maturity. One writes a paper, submits it to a journal, and, after it has been peer reviewed, edited, and formatted, a different thing comes out the other end. But by no imaginable measure is the journal’s contribution to this process comparable to the effort put into it by the scientists. Most work that comes out of my lab represents about two years of labor for a post doc, and maybe weeks or months of my time devoted to conducting the research, studying the results, and writing the paper. Every published scientific paper probably represents a quarter to half a million dollars of public investment.
If you compare that sustained effort of scientists around the world, plus the investment of public institutions, with the small role that journals have played in bringing scientific work to maturation, I think it is quite clear that the weight of the contribution belongs to the scientists and to the public, and not to the journals. You cannot come up with a system in which it makes sense, moral sense, that the journals should own the literature. The only questions I think worth asking are: Why is it that journals own the literature? Is there some practical alternative to the current system?
Legacy of Print Stifles Access and Cooperation
The answer to this is, of course, contained in the history of scientific publishing. I don’t need to explain the business model of scientific publishing, or any publishing, that exists today. The journal largely takes on the burden of producing material and charges people who want to access the published information through subscriptions or through whatever licensing deal they have managed to convince libraries to agree to. This system evolved when we communicated on paper and the only effective way for scientists to communicate with their colleagues was to write a paper that was printed and shipped to libraries all over the world. In that world, most of the cost involved was in distributing printed copies of that manuscript. Since those costs scaled to the number of copies, it made some obvious economic sense for journals to charge on a per-copy basis if one bought a subscription.
This system was completely unfair in many ways. All these people who do not have access to the literature today did not have access to it in the print journal world, no matter how sensible and efficient that system was. But those restrictions were, at the time, logical and inevitable. There really was no better way to handle things. One can make a very strong and compelling case that the scientific journals have done a remarkably good and efficient job over the last century of disseminating scientific knowledge and that without the major boom in scientific journals after World War II a lot of the progress in science, especially in biological research, would not have occurred. We needed mechanisms to communicate our work to colleagues and researchers all over the world. I’m not here to criticize the journals for the job they have done; they truly have been quite useful to scientific society. And while there are many ways in which that system has been perverted-and some have started to charge more money than they should-that is not really our problem here today.
However, as soon as we started to communicate with one another electronically, all the premises of this business model completely evaporated. There are very few scientists today who get their literature primarily from the printed page. Most of us now download PDFs from a Web site and print out the document. We are reading printed copies, but the distribution itself is electronic, and when you have electronic distribution, you have completely different economics.
The costs involved in electronic scholarly publication are almost always in the preparation of the original, edited electronic document. These outlays are not trivial, but they remain as before the cost of managing peer review and hiring editors to oversee the process. But these are now the only costs involved. The cost of producing and distributing each additional copy is not zero, but it is very, very small, and there is almost no marginal cost every time someone wants to access or use a given copy of the literature.
Thus, the business model that developed in the print world, that of charging for each copy, has become economically irrational. It completely thwarts the best interests and goals of almost every stakeholder involved in the process other than the publisher. A lot of people try to make an analogy between scientific literature and Napster, or other methods of reproducing movies and music, where there is tension between the economic interests of the producer and those of the consumer. The producer clearly wants to get as much money and exposure as possible, while the consumer wants to get as much of the stuff he or she is interested in as cheaply as possible.
But in scientific publishing, the producers of the information and the consumers are the same people. I don’t make any money from selling my work. All I care about is that people read my work and that they cite it. My interests and the interests of the institutions that funded my research, the interest of the public, and the interest of almost everybody except the scientific publishers, are best served in a world in which the scientific literature is completely open and freely available. We have allowed publishers to graft an economic model that evolved for print publication onto electronic publication, and this has happened with the complete complicity of scientists, the scientific community, and libraries. The fact that we have let this happen when it really did not have to is now the single biggest barrier to creating and developing new and innovative ways of using the scientific literature.
Toward a More Equitable and Rational Model
It is time for the scientific community, the public, and the educational institutions that support us to rethink this relationship we have with scientific publishers-to try to make sure that we develop an effective process for communicating with one another that does not unnecessarily compromise our interests solely for the sake of serving the financial goals of publishers.
The basic premise of an economically and functionally sustainable system is that costs really do exist in scientific publishing. Now that I myself am turning into a publisher, I recognize that these costs are legitimate and tangible. It takes money to manage peer review; it takes money to hire professional editors who can recognize quality research and help authors produce papers that are interesting and readable; it takes money to turn a manuscript into something that looks pretty on the page and is consistent; and it takes money to turn Word documents into XML that people can search and store in databases. These costs are not trivial; they are hundreds or thousands of dollars per article.
But rather than trying to recover those costs by subscription, which necessarily requires that access to the work be restricted, it seems that these costs should now be viewed as indispensable costs of actually doing the research. When I publish a paper, it is not an isolated event; it is the final step in a long and expensive process. It is the most public part of the research process, but still it is only a part. If scientists, and the academic and funding institutions that support our research, decided to view the costs I just listed as part of the research process, it would be possible to provide permanent, completely free, and open access to the finished product-not only to scientists, but also to anybody who wanted to read or use this literature.
We are reaching the stage where you can say there is a movement within the scientific community and the broader academic community to ensure that the open access way of making the product of scholarly communication available is the way of the future. There is some confusion about what is meant by this. Some argue that they already are providing free and open access to the literature, but in my opinion, this really is not so. By open access, I mean that the producers, the publishers of the literature and the information, do not put any restrictions, either practical or legal, on how this information can be used.
It has to be freely available. You have to be able to download it, and you have to be able to do anything with it, not just read and print it, but redistribute it, put it into a compendium or database, link sections of it to other pieces of information, do anything that is otherwise legal. Other than fraud, there is really nothing that one should not be able to do with the scientific and scholarly literature. The only thing that authors of this material ask, the only thing we really care about when you use our work, is that when you do, you say it was our work. The commodity we deal in is the commodity of citation and attribution, and it is the only restriction that legitimately can and should be placed on how scholarly literature is used.
There is no doubt that that if we adopted a system in which the costs of producing the literature were paid up front by the institutions funding the research process-the same institutions that fund, albeit more indirectly, the subscription costs for libraries-all of the important aspects of the current scholarly publication system would be maintained. But by removing a lot of economic inefficiencies, the system would actually be quite a bit cheaper. It would certainly be fairer and would serve best the interests of scientists in their roles as both authors and consumers, as well as the interests of the public and of all the institutions that supported our research.
Open Access Movement Finds Support-and Lingering Resistance
I think it is almost impossible to argue that this would not be a good thing, but it still has not happened. Why not? It might help here to talk a little bit about the history of this idea.
Seven years ago, when the idea first came to me and others with whom I ultimately worked, we thought that the logic of the new system was so patently beneficial for the scientific community that all we had to do was give people a way to communicate with each other. Physicists, as many of you know, had already been circulating their research through a preprint server, arXiv.org, a unified global raw database established in 1991 at Los Alamos and now based at Cornell. The physicists were happily communicating with each other, with no restrictions on how the information was to be used.
I figured that what works for physics should work for biomedical research. Fortunately, Harold Varmus was at the time director of the National Institutes of Health, and he was quite active in promoting the creation of a free full-text archive for biomedical literature, called PUBMED Central. When PUBMED Central came online in 1999, I expected most scientific journals, especially those published by scientific societies or those nominally part of the scholarly community, to see the obvious benefits of this system and to more or less immediately make their content available in PUBMED Central. At that time I was a post doc, no longer a graduate student but clearly still pretty naïve, because it did not happen: this system was created and almost nobody put their content into it. PUBMED Central, even though its great potential and usefulness should have been evident to all scientists, did not garner support from within either the scientific community or the publishing community.
So we tried something different. We tried to make it clear to publishers that the scientific community really wanted this, that this was something important to scientists, and that if journals would take the simple steps necessary to make their content available in this free and open manner, scientists would reward these journals with their support. We began to circulate an open letter and formed the organization Public Library of Science. Scientists signing the open letter pledged only to publish their work in, review and edit for, and personally subscribe to journals that took what we thought was a reasonable compromise position: they would make their content freely available on PUBMED Central or other suitable archives after six months. It may not be a perfect system, but we gave the journals six months to recover their costs through subscription charges, at the end of which they had to make their material freely available. That is, they got a lease rather than permanent ownership of the literature.
The open letter received a tremendous amount of support. It has now been signed by almost 35,000 scientists across the world. But the response of the publishing community to an effort by scientists to make the scientific literature more useful was largely silence and, in many other cases, overt hostility. Although a few did, most journals did not respond to the open letter in any consistent way, and so we have now moved on to another step.
“I Have Been Moved to Commit These Things to the Press”
Publishers are there, but we do not have to work with the established publishers. They are not the only way that scientists can communicate with each other. At the Public Library of Science, we started trying to do it ourselves. If the scientific publishers were not going to do what the scientific community wanted, we figured we would have to do it ourselves.
But clearly, we could not just do this ourselves; we needed some support. We spent a year and a half trying to garner support and find financial backing for this endeavor. Finally, in December 2002, we received a grant of about $9 million from the Gordon and Betty Moore Foundation in San Francisco to launch Scientific Publisher, devoted to providing immediate and free open access to the scientific literature, for any scientific work that a scientist wants to make available in this way. Harold Varmus, now head of Memorial Sloan-Kettering Cancer Center, is the head of the Public Library of Science.
Over the last six months, we have begun the process of launching a scientific publisher devoted to the principles I just outlined. To give you an idea of what we are doing, I want to quickly take a step back and go through the process we have been trying to understand. That is, why has this movement not been successful?
To do so, I want to quote from the introduction to the famous work, On the Motion of the Heart and Blood in Animals, by William Harvey, who worked out the circulation system in the human body. I am sure many of you have read the old introductions in some sixteenth and seventeenth-century books. They had a wonderful practice in which the author essentially had to apologize for writing the work. Harvey, trying not to take too much credit, has a great paragraph in which he explains why he decided to publish this work. I think it encapsulates all the reasons why scientists publish today:
These views as usual, please some more, others less; some chid and calumniated me, and laid it to me as a crime that I had dared to depart from the precepts and opinions of all anatomists; others desired further explanations of the novelties, which they said were both worthy of consideration, and might perchance be found of signal use. At length, yielding to the requests of my friends, that all might be made participators in my labors, and partly moved by the envy of others, who, receiving my views with uncandid minds and understanding them indifferently, have essayed to traduce me publicly, I have been moved to commit these things to the press, in order that all may be enabled to form an opinion both of me and my labours.
Deconstructing this a little bit, Harvey wants to give further explanations of his work; he had been talking in public about some of this work, but he really needed to tell the whole story. This is one of the most important things we do in the published literature: give our complete stories. Harvey thought that other people would find use in this information. One of his prime motivations to publish the work was that others would use it, that others could be participants in his labors. I think it is important that through publishing, he did want not only to get the information out but also to give people an opportunity to judge his work.
I don’t know whether Harvey was up for tenure or not, but it is certainly a big concern. I would say that it is probably the biggest challenge that the Public Library of Science faces, as we evolve from an advocacy group into a publisher. How do you accommodate the need for scientists not only to communicate their work but also to get the proper credit and acclaim for their best work?
Public Library of Science Launches an Alternative
We have now spent quite a lot of time thinking about why scientists have not embraced open access publishing, why biologists, for example, feel uncomfortable about putting their published papers directly into an archive. Why is it important for them to actually publish in a scientific journal? I think the answer is fairly clear. If a paper is self-posted, it has not gone through a process in which somebody, whether a couple of peer reviewers, an editor, or a publisher, has given a stamp of approval and stated that this work is not only worthy of being published but is of a certain level of quality.
Most of you are aware that there is a great hierarchy of scientific journals; for biologists, if you publish in Science, Nature, or Cell, it means you are at the top of your game. The scientific community does not just use these journals as filters to the literature, they are not only venues in which I know to look for the most interesting and best science. We have also essentially given Science, Nature, and Cell the gatekeeper role in deciding who gets hired at the lead universities, who gets tenure, who gets grants. If you have a series of publications in one of these journals, you have a real leg up in getting an excellent position and getting tenure. And if you do not have any publications in these journals, even if you have published work in another journal, no matter how good it is, people largely will not pay attention to it, and you will not get proper credit for having done excellent new research.
We decided that the most important thing the Public Library of Science could do as a publisher was to serve as an option that competed directly with Science, Nature, and Cell in providing the best scientific research. As of May 1, we are formally in existence, and Public Library of Science Biology will now try to tackle these journals head on. Our goal is to provide an open access journal, not just for downloading, but for any use. Every work we publish will be made freely available immediately and will be effectively in the public domain. The only difference between Public Library of Science Biology and Science, Nature, and Cell (other than that we will be a little bit better) lies in how we fund this endeavor.
We are asking authors to cover our costs up front, through charges of roughly $1,500 to $2,000 for each published work. I should add that our estimated costs are less than most authors already pay in page charges for many journals. Our production system is in place, we’ve hired editors from elite journals-in fact, we stole the editor-in-chief of Cell and she now works for us. Others have been knocking down our doors to come work for us, because I think everybody involved in scientific publishing who does not have a direct material stake in the outcome recognizes that this is the future.
So, PLS Biology exists. We have already received submissions, researchers want to send us their best work, and we have an editorial board that is better than that of other journals because scientists are strongly behind us. An alternative has been created, a journal that can and should-and will-be competitive with the best in the field. Others will no longer have any excuse for not adopting open access. It is now a test to see whether or not the scientific community and the institutions that support us are really behind this.
Libraries Are the Gateway to an Open Access Future
I have not yet said anything about libraries; to some extent, what we’ve been doing has been happening outside of the library system. Scientists spend less and less time in libraries, as I’m sure you know, because we are spending more and more time online. And most libraries have yet to become a central resource for scientists or a focal point for their electronic access to scientific literature and to scientific knowledge.
My own view of this has been that libraries have recently been thinking much too much about the cost of subscriptions and how to drive that down. I understand that libraries need to subscribe to the most recent literature and that the rising cost of subscriptions is causing a serious problem. But I have a proposition, and it is not that libraries should say to the established scientific publishers, “Try to get these costs down as low as possible, we are not going to subscribe to your high priced journals, we are now going to support journals that have lower subscription costs.”
What I would like to hear libraries say is: “Basta!” Give PLS Biology journal some time, and then tell publishers, “No more subscriptions to scientific journals as of 2005. Libraries believe that the future of information, the future of scientific literature, is open access.” It is the obvious choice for scientists, and if libraries were freed of the responsibility and burden of negotiating with publishers over subscription costs and licensing deals, it would be possible for libraries to actually do what I think that they can do best.
And as I’ve listened to others here, it sounds like what everybody wants to do is become the primary gateway, the electronic gateway, for scientists and scholars to access information and knowledge. Instead of being a place where one goes purely to access information, a library is the place to access information effectively, efficiently, and interestingly. If the scientific community, the library community, and the academic research community all banded together and simply said, “This is what is going to happen, publishers. We are no longer going to play your game. Do it the open access way or you are no longer involved,” then everything would change overnight. Every publisher from Proceedings of the National Academy of Sciences to Elsevier would have no choice but to adopt this new, and I think much more efficient, publishing model.