May 12, 2003
The Evelyn Wood of Digitized Book Scanners
ALO ALTO, Calif., May 10 ó Putting the world's most advanced scholarly and scientific knowledge on the Internet has been a long-held ambition for Michael Keller, head librarian at Stanford University. But achieving this goal means digitizing the texts of millions of books, journals and magazines ó a slow process that involves turning each page, flattening it and scanning the words into a computer database.
Mr. Keller, however, has recently added a tool to his crusade. On a recent afternoon, he unlocked an unmarked door in the basement of the Stanford library to demonstrate the newest agent in the march toward digitization. Inside the room a Swiss-designed robot about the size of a sport utility vehicle was rapidly turning the pages of an old book and scanning the text. The machine can turn the pages of both small and large books as well as bound newspaper volumes and scan at speeds of more than 1,000 pages an hour.
Occasionally the robot will stumble, turning more than a single page. When that happens, the machine will pause briefly and send out a puff of compressed air to separate the sticking pages.
For Mr. Keller, the robot, made by 4DigitalBooks, one of two companies now introducing the first automated digitization systems, is a boon.
"Think about the power of bringing our library to little schools in the middle of Africa," Mr. Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"
The first book-scanning robots were introduced this spring by 4DigitalBooks of St. Aubin, Switzerland, and Kirtas Technologies of Victor, N.Y. The machines have already begun to generate interest from libraries and private and nonprofit groups now working to digitize books.
Until now, the job has been done mostly by students or armies of low-cost workers in countries like India and the Philippines. But manual digitization presents significant logistical problems. Book collections may have to be moved long distances to digitization centers.
And in some cases the process of scanning has damaged old books and journals, making it necessary to rebind them afterward.
The digitizing machines, by contrast, can be located close to book collections and offer speed and quality control unattainable by manual systems.
Even so, manual processing is still less expensive in many cases than acquiring a robot. The 4DigitalBooks robot, whose price neither the company nor Stanford officials would disclose, becomes cost effective on projects larger than 5.5 million pages, said Ivo Iossiger, the company's chief technology officer and a co-founder. It seems likely that the vast majority of digitization over the next several years will be done by hand.
Mr. Keller admits that his dream to have the entire Stanford library in a digital database is unlikely in the foreseeable future because such an undertaking ó involving eight million volumes ó could cost upward of $250 million.
In the meantime, the Stanford librarians have begun digitizing books and documents where there are no thorny copyright barriers and have important historical and political significance.
The newly installed robot is currently finishing two pilot projects, scanning books published by Stanford's Center for the Study of Language and Information and works for the Medieval and Modern Thought Text Digitization Project. It will soon begin work on the 2,500 titles published by the Stanford University Press.
Not long ago Stanford helped finance the manual digitization of the presidential papers of Eduardo Frey, the former president of Chile, who was concerned that records of his administration could be lost in a coup.
And beginning in 1999, the Stanford library system sent a team of specialists and students to Europe, where the university is engaged in a multiyear project to digitize selected documents produced by the General Agreement on Tariffs and Trade and its successor organization, the World Trade Organization in Geneva. The project, which will take five years, will ultimately scan about 2.2 million pages of information.
Other ambitious undertakings like Carnegie Mellon University's Million Book Project will also continue to rely on manual digitization for several more years. Another project, led by the Internet Archive in San Francisco, recently shipped 80 tons of old books acquired from the Kansas City Library to Hyderabad, India, where they will be scanned, according to Michael Lesk, a former National Science Foundation official and digital library expert who works with the archive.
Mr. Lesk said that currently in India or the Philippines it is possible to scan and digitize a book for $1 to $4. But he acknowledged that there were significant costs in quality control.
For Mr. Keller the most vexing challenges are neither labor costs nor technology. Librarians, he said, must find a way to address the copyright restrictions that appear to be tightening as a result of new federal laws like the Digital Millennium Copyright Act of 1998.
Stanford is struggling to comply with copyright restrictions while making works that have recently lost their copyright protection available digitally. Mr. Keller said the library increased the circulation of its collection by 50 percent when it computerized its card catalog. Digitizing out-of-print books could likewise make them available to a much wider audience, he said. The payoff for building such a digital collection, he added, is vastly improved availability of a huge store of knowledge and information for teaching, learning and research.