An American academic has started a massive project to upload 12 million free-to-use photos onto Flickr. These images were derived from the scanned pages of library books published from 1500 to before 1922, before copyright restrictions came into play.
My mind is still unable to wrap around that gigantic figure. This is probably the largest undertaking of this kind and to date. Mr Kalev Leetaru, 32, a Yahoo! Fellow in Residence at Georgetown University in the United States has completed 2.6 million pictures culled from 600 million pages from about 1,000 libraries worldwide, to Flickr and all of them are in the public domain. The pages were scanned by Internet Archive, a non-profit digital library in San Francisco.
A man on a mission, Mr Leetaru worked by himself on the project tirelessly on nights and weekends without funding. The ambitious academic expects to finish uploading all the images within the next six months. His vision?
“I hope more libraries will contribute images so we can create a massive online gallery of the world’s history.”
Mr Leetaru has observed that libraries have been digitizing their books for years but had a tendency to focus on words and largely ignored the photos. Most scanned pages were put up as PDFs or text searchable works, he told BBC in an interview.
What’s different about his method is, his work inverts that process. He worked with a focus on the images and their captions instead and ignores the text. He started by returning to previously digitized works from the Internet Archive and wrote his own software around the digitized books in the collection.
The Internet Archive used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the JPEG picture format. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book. Each JPEG and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.
“Instead of reams of text, I feel books can be treated as galleries of images,” said Mr Leetaru. It was a spontaneous desire last December to find images of the telephone over the years which prompted him to start the project. Mr Leetaru goes on to explain, “There were many books that were digitized about the telephone, but there was no way to see a collage of all the images. I realized I could help others in a similar situation.”
Besides merely running his software through The Internet Archive, plucking out images, re-saving and re-cataloging them, Mr Leetaru looks forward to materialize a tie-up with Wikipedia once his project is completed next year. He hopes that Wikipedia will use the images to illustrate their articles and felt that the images in his Flickr collection will be able to enrich the online encyclopedia’s articles.
He added that he also plans to offer his code to others. “Any library could repeat this process,” he explained. “That’s actually my hope, that libraries around the world run this same process of their digitized books to constantly expand this universe of images.”
From a historical perspective, it’s fascinating to find old images of Singapore and looking through them is like tunneling through time. You should give it a try.