The General Index, created by American archivist Carl Malamud, was released on October 7 and is free to use. The index holds over 355 billion sentence fragments and words listed next to articles in which they appear. “It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers,” Malamud told Nature journal.
The primary objective of this index is to help with text mining, a process of using computers to quickly scan millions of data points to find references to something specific. Humans can’t possibly read data from millions of journal articles, but a computer programme connected to the General Index can.
A set of researchers, who have had early access to the index, termed it as a big development. Gitanjali Yadav, a computational biologist at the University of Cambridge, UK, who studies volatile organic compounds emitted by plants, said this index will help researchers in accessing many research papers that already existed but were previously lost somewhere. Researchers were earlier restricted to mining only open-access papers or those that they had subscribed to. But this index will be of great help to them.
Malamud said his index contains only snippets up to five words long, so releasing it does not breach publishers’ copyright restrictions.