OSTIblog article by David Wojick on Wed, 5 March, 2008
Part of OSTI's R&D aims at understanding how scientists
use information. This goal was originally articulated by OSTI's Thurman
Whitson, who has since retired. To that end we have begun to look at the
different kinds of information provided by the different Web-based science
resources. Different kinds of information imply different uses. It is not that
one resource is better than another overall; it is that they are very different
and support different uses.
Below are four initial results that show clearly that Google
tends to return lay information, while Science.gov returns scholarly
information. (Note: Google Scholar also returns scholarly information, but very
little is free, so that is a different issue.)
For the purpose of this analysis, we postulate two Search
categories: Layman's Level and Scholar's Level.
LL = Layman's Level. Includes news, magazines, blogs,
educational material, product or company information, etc.
Education grade level is basic undergraduate or lower, often
high school level.
SL = Scholar's Level. Includes journal articles, research
reports, conference proceedings, etc. Education level is advanced undergraduate
or higher.
Hit counts are based on the first 20 hits. Note this is
rough data, based on the snippets. We have not examined every hit. But the
differences are so dramatic that the results will not change much with refined
analysis or more cases.
Hits on "biofuel", Google Results LL = 20, SL = 0.
Science.Gov Results LL = 1, SL = 19
Hits on "nanocatalysis", Google Results LL = 17,
SL = 3., Science.Gov Results LL = 1, SL = 19
Hits on "Higgs boson", Google Results LL = 20, SL
= 0., Science.Gov Results LL = 1, SL = 19
Hits on "quantum dot", Google Results LL = 20, SL
= 0., Science.Gov Results LL = 0, SL = 20
Conclusion: Science.gov and Google are very different tools,
so support different uses. Google returns lay information, while Science.gov
returns scholarly information. Moreover, all of OSTI's Science Accelerator
resources follow the Science.gov pattern. Each is a far better source of
scholarly information than Google.
The technology behind Science.gov (i.e., federated search)
allows information product designers to discriminate among information
resources, while the technology behind Google (i.e., crawling) does not lend
itself to such discrimination. In the case of Science.gov, federated search
enables product designers to focus on R&D results, which are typically
scholarly in nature.
Possible next steps:
1. The LL and SL categories can each be refined and related
to different uses. For example, some LL hits are simple news items while others
are research center home pages. The research centers usually provide linkage to
scholarly publications, while the news items usually do not. Likewise, some SL
hits are just abstracts while others are full text articles. Also some LL and
SL hits are more technical than others in the same category. A simple, faceted
taxonomy to capture these differences should be easy to build.
2. This LL/SL analysis can be extended to the Science
Accelerator, as well as individual OSTI products like Information Bridge,
E-Print Network, or to other content search systems, such as World Wide Science
or the Web of Science. How the content of these differ should be useful for
users to know.
3. The different uses each of these different content
systems might support can also be described. Sometimes scientists need lay
information, sometimes scholarly, in different combinations depending on what
they are looking for. Cases range from seeking broad understanding to looking
for a specific document or problem.
David Wojick
Senior consultant for innovation
OSTI
No comments:
Post a Comment