Monday, March 3, 2014

Making the Web work for Science

OSTIblog article by David Wojick on Mon, 21 April, 2008

If I had to describe the OSTI revolution in ten words or less it would be "OSTI is making the Web work for science."

It is a colossal irony that the Web does not work for science. The World Wide Web was developed by high energy physicists at CERN, for the purpose of sharing scientific papers. HTML is basically very simple, with features that were specifically designed to display scientific writings.

But the Web quickly transitioned into popular culture, becoming a revolutionary new medium of global communication. Simple HTML has been tricked into producing complex, magazine-like displays and much more.

This trickery is a technological marvel in its own right. Web pages now number in the billions and the pace is quickening, not slowing, as video, blog and personal mega-sites take off. Social, consumer and popular content dominates these huge numbers.

In the process science got left behind. As the Web exploded in size it quickly came to pass that the Web only works where search works, and ordinary Web search does not work for science. The ordinary search engines do not find science, for several reasons.
First, and foremost, most science is in the deep Web. This is explained in detail in other OSTIblog articles. Second, scientific content on the surface Web is swamped by non-scientific content using the same language, especially news and consumer or company information. It is interesting to take a scientific article on a scientist's Web site and see what it takes to get Google to return it as a top hit. If you know the exact title it can be done, otherwise not likely.

A number of attempts have been made to solve this problem and make the Web work for science. Unfortunately most of these have focused on subscription journal articles, which are not in the public domain. There are several projects which provide abstracts in large numbers, but no actual content. These may be very useful for certain purposes, but do not solve the basic problem. Others search large numbers of articles but most of the results are only available on a pay-per-paper basis. This is a good way to buy articles, but browsing is prohibitively expensive in most cases.

OSTI's revolutionary approach has been to focus on full text scientific content in the public domain. Search results include specialized surface Web content, but especially deep Web content. This too is explained in detail in other OSTIblog articles. An estimated 200 million pages of scientific content are now available through specialized search.

Has OSTI made the Web work for science? Yes and no. Yes, because the OSTI owned or operated portals now provide huge amounts of content for science. (Also for technology, which has similar problems.) No, because this is still just a small fraction of the total that is Web accessible. So the OSTI revolution is just beginning, because the Web will really only work for science when everything that is Web accessible is also findable.

David Wojick
Senior consultant for innovation


No comments:

Post a Comment