As part of a long-running project called the Internet Archive, millions of books that are out of copyright have been digitized and put online. However, although the text was put through an OCR system and made easily available, the OCR systems were programmed to ignore any areas of the pages which contained pictures.
Which resulted in a huge searchable archive of text from 600 million pages, but no easy way to look through the pictures that were on those pages. Until now.
An academic in the US managed to write some special software which automatically trawled through those 600 million pages in search of pictures. The software then tagged them with some useful metadata and uploaded them to Flickr. So far, 2.6 million images have been uploaded. In total, 12 million images were found, and all of them are in the process of being uploaded too.
Because the scanned books were all out of copyright, all of the pictures are copyright-free too. You can browse and download them as you wish, and use them for any purpose that you choose.
https://ift.tt/W5Vthd

via https://ift.tt/W5Vthd