Monday, March 3, 2014

OSTI's Revolution: "Findability" in Science and Technology

OSTIblog article by David Wojick on Tue, 1 Apr, 2008

When it comes to science and technology development, OSTI people are writing one of the biggest Internet success stories. Everyone talks about how the Internet is changing science but OSTI is making it happen, and doing it on a shoestring budget.

The reason is simple, what OSTI makes happen goes to the heart of what science does, which is to share and combine thinking. Science is a colossal exercise in thought sharing, and has been for 400 years. Every achievement is incremental. Thus scientific communication is essential for scientific progress.

That the Internet greatly increases the potential for communication is well known. What often goes unrecognized is the great gap between raw Internet accessibility and actual communication. The missing element that bridges this gap is something we call "findability." If something is available via the Internet, can it be found with reasonable effort? If not then it might as well not be there.

OSTI is leading a revolution in findability. OSTI does not create new content; rather it creates portals and search engines that find vast quantities of hard-to-find scientific and technological content that already exists. This is extremely important to science because the general purpose search engines like Google rarely find scholarly content.

In some cases OSTI works alone but in many cases it collaborates with other national and international organizations. Sometimes OSTI crawls the surface Web but in many cases OSTI has led the application of federation to deep Web databases. In all cases the goal is the same, to make important scholarly content findable by those who need it.

The various portals that OSTI either owns or operates form a rough hierarchy. That is, some are more general than others and in many cases the narrower, more specialized portals are incorporated into the more general ones to some degree. This architecture reflects the interlocking nature of scientific activity.

A few of OSTI's many search tools are described below, from narrow to broad. Each is a technical tool that has to be understood to be properly used. None is simple. Also, each is relatively crude. Google spends over $4 billion a year, including $500 million on R&D. The National Library of Medicine spends around $100 million on R&D. OSTI's total budget, not just R&D, is just $9 million so there are few bells and whistles. But there are over 200,000,000 pages of findable research results and technical material on OSTI portals, with more every day. Collectively this is by far the largest source of Web-based, scholarly science and technology available. An astounding feat for such a small agency.

Some special OSTI collections

Information Bridge
This is OSTI's foundation collection, the filing cabinet of all DOE research reports for the last decade. Tens of billions of dollars worth of research are documented here, much of it power related. It has 165,000 fully searchable full-text documents, each with extensive bibliographic information. This makes it possible to do complex advanced searches using different metadata fields in the document database.

A powerful and independently useful feature in the advanced search function of Information Bridge is the subject "select" button. This brings up a very large semantic structure or word-word link system that is designed to help users find the best technical search terms. The system combines a taxonomy of energy related words with what is called a thesaurus. The thesaurus does not provide synonyms, but rather clusters of terms that are closely related from a scientific or engineering point of view. The system includes 30,000 words, about 200,000 word-word relations, and 45,000 taxonomic pathways from broader to narrower concepts. The system is useful in understanding the concept structure of energy science and engineering.

E-print Network

This is a federated and crawled collection of about 5 million scholarly articles and related materials found in databases and on the web. It includes what are called preprints which include articles that have not yet appeared in scholarly journals. It also includes the publication web pages of over 28,000 university faculty, mostly in science and research engineering departments. This makes it easy to go from a single paper to the whole body of a researcher's related work.

Science and Engineering Conference Proceedings

Conference proceedings often precede publication of research results by a year or more and this collection federates 26 large databases. There are hundreds of thousands of papers and presentations, many from professional societies.

OSTI wide search

Science Accelerator

The Science Accelerator searches ten major OSTI collections, including Information Bridge, E-print Network, and the Conferences portal, described above. It also searches R&D project descriptions, the Energy Citations Database, DOE R&D Accomplishments, DOE-sponsored patents, and EnergyFiles, a collection of energy-related databases and  websites.

Government wide search

Federal R&D Project Summaries

This is a federated gateway to individual project summaries from six of the largest research funding agencies. In many cases the search results include recent awards, which may precede research reports or publications by several years. is a search engine for government science information and research results. Currently in its fourth generation, provides search of more than 50 million pages of science information with just one query, and is a gateway to over 1,800 authoritative scientific Web sites and over 30 large scientific databases.

World wide search

Whereas federates the US Government science and engineering databases and websites, the idea behind is to combine similar resources from many different countries. While still very new, already includes major collections from 44 different countries, in every inhabited continent. is the major US contribution.

Taken together this is an impressive list of integrated science and technology portals. But believe it or not, there is a lot more coming.

David Wojick, Ph.D.
Senior consultant for Innovation


No comments:

Post a Comment