OSTIblog article by David Wojick on Wed, 5 March, 2008
Part of OSTI's R&D aims at understanding how scientists use information. This goal was originally articulated by OSTI's Thurman Whitson, who has since retired. To that end we have begun to look at the different kinds of information provided by the different Web-based science resources. Different kinds of information imply different uses. It is not that one resource is better than another overall; it is that they are very different and support different uses.
Below are four initial results that show clearly that Google tends to return lay information, while Science.gov returns scholarly information. (Note: Google Scholar also returns scholarly information, but very little is free, so that is a different issue.)
For the purpose of this analysis, we postulate two Search categories: Layman's Level and Scholar's Level.
LL = Layman's Level. Includes news, magazines, blogs, educational material, product or company information, etc.
Education grade level is basic undergraduate or lower, often high school level.
SL = Scholar's Level. Includes journal articles, research reports, conference proceedings, etc. Education level is advanced undergraduate or higher.
Hit counts are based on the first 20 hits. Note this is rough data, based on the snippets. We have not examined every hit. But the differences are so dramatic that the results will not change much with refined analysis or more cases.
Hits on "biofuel", Google Results LL = 20, SL = 0. Science.Gov Results LL = 1, SL = 19
Hits on "nanocatalysis", Google Results LL = 17, SL = 3., Science.Gov Results LL = 1, SL = 19
Hits on "Higgs boson", Google Results LL = 20, SL = 0., Science.Gov Results LL = 1, SL = 19
Hits on "quantum dot", Google Results LL = 20, SL = 0., Science.Gov Results LL = 0, SL = 20
Conclusion: Science.gov and Google are very different tools, so support different uses. Google returns lay information, while Science.gov returns scholarly information. Moreover, all of OSTI's Science Accelerator resources follow the Science.gov pattern. Each is a far better source of scholarly information than Google.
The technology behind Science.gov (i.e., federated search) allows information product designers to discriminate among information resources, while the technology behind Google (i.e., crawling) does not lend itself to such discrimination. In the case of Science.gov, federated search enables product designers to focus on R&D results, which are typically scholarly in nature.
Possible next steps:
1. The LL and SL categories can each be refined and related to different uses. For example, some LL hits are simple news items while others are research center home pages. The research centers usually provide linkage to scholarly publications, while the news items usually do not. Likewise, some SL hits are just abstracts while others are full text articles. Also some LL and SL hits are more technical than others in the same category. A simple, faceted taxonomy to capture these differences should be easy to build.
2. This LL/SL analysis can be extended to the Science Accelerator, as well as individual OSTI products like Information Bridge, E-Print Network, or to other content search systems, such as World Wide Science or the Web of Science. How the content of these differ should be useful for users to know.
3. The different uses each of these different content systems might support can also be described. Sometimes scientists need lay information, sometimes scholarly, in different combinations depending on what they are looking for. Cases range from seeking broad understanding to looking for a specific document or problem.
Senior consultant for innovation