OSTIblog article by David Wojick on Mon, 21 April, 2008
If I had to describe the OSTI revolution in ten words or
less it would be "OSTI is making the Web work for science."
It is a colossal irony that the Web does not work for
science. The World Wide Web was developed by high energy physicists at CERN,
for the purpose of sharing scientific papers. HTML is basically very simple,
with features that were specifically designed to display scientific writings.
But the Web quickly transitioned into popular culture,
becoming a revolutionary new medium of global communication. Simple HTML has
been tricked into producing complex, magazine-like displays and much more.
This trickery is a technological marvel in its own right.
Web pages now number in the billions and the pace is quickening, not slowing,
as video, blog and personal mega-sites take off. Social, consumer and popular
content dominates these huge numbers.
In the process science got left behind. As the Web exploded
in size it quickly came to pass that the Web only works where search works, and
ordinary Web search does not work for science. The ordinary search engines do
not find science, for several reasons.
First, and foremost, most science is in the deep Web. This
is explained in detail in other OSTIblog articles. Second, scientific content
on the surface Web is swamped by non-scientific content using the same
language, especially news and consumer or company information. It is
interesting to take a scientific article on a scientist's Web site and see what
it takes to get Google to return it as a top hit. If you know the exact title
it can be done, otherwise not likely.
A number of attempts have been made to solve this problem
and make the Web work for science. Unfortunately most of these have focused on
subscription journal articles, which are not in the public domain. There are
several projects which provide abstracts in large numbers, but no actual
content. These may be very useful for certain purposes, but do not solve the
basic problem. Others search large numbers of articles but most of the results
are only available on a pay-per-paper basis. This is a good way to buy
articles, but browsing is prohibitively expensive in most cases.
OSTI's revolutionary approach has been to focus on full text
scientific content in the public domain. Search results include specialized
surface Web content, but especially deep Web content. This too is explained in
detail in other OSTIblog articles. An estimated 200 million pages of scientific
content are now available through specialized search.
Has OSTI made the Web work for science? Yes and no. Yes,
because the OSTI owned or operated portals now provide huge amounts of content
for science. (Also for technology, which has similar problems.) No, because
this is still just a small fraction of the total that is Web accessible. So the
OSTI revolution is just beginning, because the Web will really only work for
science when everything that is Web accessible is also findable.
David Wojick
Senior consultant for innovation
OSTI
No comments:
Post a Comment