Sunday, March 9, 2014

Engineer Tackles Regulatory Confusion

ENR (Engineering News Record) cover story
April 3, 1980

Inside title:
Logician shears woolly regulations
Blueprints untangle complex rules

Several times a week, David E. Wojick drives from his Revolutionary War-era estate in Orange, Va., to the nations's capital to work on a revolution of his own in a field he dubs "regulation engineering." Armed with a technique for simplifying complex issues, Wojick says he can make regulations systematic, coherent and efficient. According to clients, Wojick's four-year-old consulting firm has scored victories with dozens of major regulations — both in critiquing them for industry and in rewriting them for government agencies.

Regulation writing should be a design science based on principles of efficiency, not a political process, contends Wojick, a professional engineer with a doctorate in logic and the philosophy of science. "A regulation is every bit as complex as a major structure. It requires the same care in construction. No one would let a committee of lawyers design an office building or a nuclear power plant, but the regulatory programs for all things are designed by committees of lawyers," Wojick says. "As a result, regulations read like insurance policies, and regulatory programs proceed like lawsuits."

Wojick says the 90,000 pages of government regulations now in force are among the most complex structures ever fabricated. "Today a 100-page regulation is small, 500 pages is not unusual, and the 10,000 pages of federal income tax regulations are a wonder of the world," he says. Because regulation writing is dominated by lawyers, regulations today are powerful and respond to popular concerns, but, Wojick claims, they are generally costly and incoherent.

Counting kinds of Confusion

Wojick's firm, Adams & Wojick Associates, has developed a matrix identifying 126 kinds of confusion in regulations. It first classifies six aspects common to any law or regulation — concepts, rules, procedures, text, structure and logic. Then it lists 21 kinds of faults, such as being ambiguous, overly complex, or ineffective. The matrix yields 126 combinations. Typical examples Wojick cites are an Environmental Protection Agency regulation with more than 3,000 exceptions and nuclear power plant quality assurance regulations that have ambiguous rules and vague procedures.

"We've been successful," Wojick says, "because everybody sees the problem, but nobody's been able to put a finger on it." When Wojick goes to an agency and says a regulation is confusing for three or five specific reasons, he says, the common reaction he gets from officials is, "You're right."

The foundation of Wojick's ability to pinpoint these problems is a technique that applies the idea of a blueprint — a visual picture of a structure — to the structure of an idea. He discovered that through the blueprints, any discussion or piece of text can be broken down and all the individual ideas can be laid out so the relationships become visible.

The development of the technique springs from the unusual combination of engineering and logic in Wojick's background. Shortly after he graduated from Carnegie Institute of Technology in 1964 with a B.S. in civil engineering, Wojick went to work designing dams for the Pittsburgh district of the Corps of Engineers. His exposure there to environmental controversies, watching people "getting lost in complex issues," stirred a longstanding interest in reasoning. He began studying logic and philosophy at the University of Pittsburgh.

Wojick left the Corps in 1970 to work on his dissertation and began teaching at Carnegie Mellon University, where he helped found a Department of Engineering and Public Policy. At Carnegie, he was influenced by the research on human problem solving done by his colleague, Herbert A. Simon, who in 1978 won the Nobel Prize in economics.

Mapping the structure of ideas.

Wojick realized that all issues have a basic underlying structure, one that can be mapped out like an engineering drawing. Using existing theories of conceptual analysis, Wojick "atomized" issues (and later texts) into basic elements. His discovery, he explains, "was that the ideas are held together by unspoken questions. What point is this sentence making? What point is it responding to?"

He was surprised to find that the thousands of pieces of a complex issue fit together in a simple scheme with a logical pattern. Because the kind of hierarchical structure developed is called a "tree" in mathematics, Wojick calls the structures "issue trees". His first practical application, in 1975, was an analysis of the interaction between environmental, energy and economic issues for the Pennsylvania governor's science advisory committee.

A year later, Wojick left the university to devote full time to issue analysis, going into business with his wife and partner, Diane W. Adams. As chief executive officer, Adams manages the finances of the firm and now oversees billings of more than $500,000 a year. She navigated the group's recent move to a 176-acre estate in Virginia, reputed to be the birthplace of President Zachary Taylor. Adams says they chose the property, which includes a house built in 1790, offices and a horse farm for its hour-and-a-half proximity to Washington, D.C.

Devil in the details.

A third key member of the seven-person firm is John E. DeFazio, a chemical engineer who now handles a lot of the analytical work while Wojick hits Washington looking for complicated issues that involve a lot of money. Not all prospective clients can afford the firm's services, because "it takes several person-months to tree out a major regulation," Wojick says. "On the other hand, that's why the process is so powerful. It has the same power that detailed drawings give in constructing a building. We can make hundreds, sometimes thousands, of improvements."

The firm's early jobs included writing compliance manuals on regulations for industry. It wrote a quality assurance manual for Levinson Steel Co., Pittsburgh, for example, setting up the structural steel fabricator's working program for compliance with Nuclear Regulatory Commission standards on nuclear power plant fabrication. Alvin Stein, Levinson's quality assurance director, calls Wojick a "wizard" because the program set up in 1976 in "untested waters" is still working successfully. And it has been flexible enough to allow the company to satisfy the differing regulatory interpretations of different designers.

Next, Wojick landed jobs critiquing regulations for industry. PPG Industries, Inc., Pittsburgh, hired the firm to do a coherence analysis of EPA's proposed premanufacturing regulations under the Toxic Substances Control Act. "The goal of the law is to prevent chemical catastrophes," Wojick explains, "but EPA takes the meat-ax approach of trying to find out everything there is to know about all the chemicals in existence, and then they're going to sort through and find the problems." "We suggest techniques for identifying lines to follow that are most likely to be fruitful," says Wojick. The analysis also found parts of the regulations so unreadable that most accepted scales of readability could not measure them.

Teaching regulators logic.

EPA acknowledged the value of the critique by hiring Wojick to teach EPA regulators how to write logically coherent regs. The firm is also negotiating a contract to rewrite EPA's dredge and fill permit regulations. Wojick has already rewritten regulations for the Water Resources Council. WRC first hired the firm to critique its proposed rules for evaluating the costs and benefits of water projects, then asked the firm for a complete rewrite. DeFazio says, "We threw away 70% of the text, and the other 30% we completely restructured -- all without losing any of the basic ideas." They pared down the roughly 350-page draft to about 80 pages. The firm also rewrote the Council's principles and standards for planning water resource projects.

Adams & Wojick is working with the Department of Commerce and the Office of Management and Budget on a study of information collection burdens. Regulators have a tendency to treat information collection as if it were free, Wojick says, but its costs mount up. "We're working on a computer search program that uses key words like 'document' and 'record' to spot these hidden burdens -- the molasses in the system."

In Occupational Safety and Health Administration regulations, for example, Wojick finds that many added costs of compliance are hidden in inspectors' manuals and other appendixes. OSHA has regulations for worker exposure to more than 500 chemicals, he says, and the regulations say only that exposure levels must stay under certain numbers of parts per million. "The costly record-keeping requirements are in the attachments," he says.

The key to the firm's approach, Wojick says, is that "we're not institutional players. If you're locked into the system, you can't jump on people and make noise. I'm free to offend anybody, and I do." Wojick will tear apart regulations for industry or work for the government writing them. "We're not for one side or the other, we think they're all making mistakes. Our interest is clarity and sound design," he says.

Regulations writing is just the beginning for Wojick. In the future, he says, "We want to design laws for Congress."

Monday, March 3, 2014

Making the Web work for Science

OSTIblog article by David Wojick on Mon, 21 April, 2008

If I had to describe the OSTI revolution in ten words or less it would be "OSTI is making the Web work for science."

It is a colossal irony that the Web does not work for science. The World Wide Web was developed by high energy physicists at CERN, for the purpose of sharing scientific papers. HTML is basically very simple, with features that were specifically designed to display scientific writings.

But the Web quickly transitioned into popular culture, becoming a revolutionary new medium of global communication. Simple HTML has been tricked into producing complex, magazine-like displays and much more.

This trickery is a technological marvel in its own right. Web pages now number in the billions and the pace is quickening, not slowing, as video, blog and personal mega-sites take off. Social, consumer and popular content dominates these huge numbers.

In the process science got left behind. As the Web exploded in size it quickly came to pass that the Web only works where search works, and ordinary Web search does not work for science. The ordinary search engines do not find science, for several reasons.
First, and foremost, most science is in the deep Web. This is explained in detail in other OSTIblog articles. Second, scientific content on the surface Web is swamped by non-scientific content using the same language, especially news and consumer or company information. It is interesting to take a scientific article on a scientist's Web site and see what it takes to get Google to return it as a top hit. If you know the exact title it can be done, otherwise not likely.

A number of attempts have been made to solve this problem and make the Web work for science. Unfortunately most of these have focused on subscription journal articles, which are not in the public domain. There are several projects which provide abstracts in large numbers, but no actual content. These may be very useful for certain purposes, but do not solve the basic problem. Others search large numbers of articles but most of the results are only available on a pay-per-paper basis. This is a good way to buy articles, but browsing is prohibitively expensive in most cases.

OSTI's revolutionary approach has been to focus on full text scientific content in the public domain. Search results include specialized surface Web content, but especially deep Web content. This too is explained in detail in other OSTIblog articles. An estimated 200 million pages of scientific content are now available through specialized search.

Has OSTI made the Web work for science? Yes and no. Yes, because the OSTI owned or operated portals now provide huge amounts of content for science. (Also for technology, which has similar problems.) No, because this is still just a small fraction of the total that is Web accessible. So the OSTI revolution is just beginning, because the Web will really only work for science when everything that is Web accessible is also findable.

David Wojick
Senior consultant for innovation


OSTI's Revolution: "Findability" in Science and Technology

OSTIblog article by David Wojick on Tue, 1 Apr, 2008

When it comes to science and technology development, OSTI people are writing one of the biggest Internet success stories. Everyone talks about how the Internet is changing science but OSTI is making it happen, and doing it on a shoestring budget.

The reason is simple, what OSTI makes happen goes to the heart of what science does, which is to share and combine thinking. Science is a colossal exercise in thought sharing, and has been for 400 years. Every achievement is incremental. Thus scientific communication is essential for scientific progress.

That the Internet greatly increases the potential for communication is well known. What often goes unrecognized is the great gap between raw Internet accessibility and actual communication. The missing element that bridges this gap is something we call "findability." If something is available via the Internet, can it be found with reasonable effort? If not then it might as well not be there.

OSTI is leading a revolution in findability. OSTI does not create new content; rather it creates portals and search engines that find vast quantities of hard-to-find scientific and technological content that already exists. This is extremely important to science because the general purpose search engines like Google rarely find scholarly content.

In some cases OSTI works alone but in many cases it collaborates with other national and international organizations. Sometimes OSTI crawls the surface Web but in many cases OSTI has led the application of federation to deep Web databases. In all cases the goal is the same, to make important scholarly content findable by those who need it.

The various portals that OSTI either owns or operates form a rough hierarchy. That is, some are more general than others and in many cases the narrower, more specialized portals are incorporated into the more general ones to some degree. This architecture reflects the interlocking nature of scientific activity.

A few of OSTI's many search tools are described below, from narrow to broad. Each is a technical tool that has to be understood to be properly used. None is simple. Also, each is relatively crude. Google spends over $4 billion a year, including $500 million on R&D. The National Library of Medicine spends around $100 million on R&D. OSTI's total budget, not just R&D, is just $9 million so there are few bells and whistles. But there are over 200,000,000 pages of findable research results and technical material on OSTI portals, with more every day. Collectively this is by far the largest source of Web-based, scholarly science and technology available. An astounding feat for such a small agency.

Some special OSTI collections

Information Bridge
This is OSTI's foundation collection, the filing cabinet of all DOE research reports for the last decade. Tens of billions of dollars worth of research are documented here, much of it power related. It has 165,000 fully searchable full-text documents, each with extensive bibliographic information. This makes it possible to do complex advanced searches using different metadata fields in the document database.

A powerful and independently useful feature in the advanced search function of Information Bridge is the subject "select" button. This brings up a very large semantic structure or word-word link system that is designed to help users find the best technical search terms. The system combines a taxonomy of energy related words with what is called a thesaurus. The thesaurus does not provide synonyms, but rather clusters of terms that are closely related from a scientific or engineering point of view. The system includes 30,000 words, about 200,000 word-word relations, and 45,000 taxonomic pathways from broader to narrower concepts. The system is useful in understanding the concept structure of energy science and engineering.

E-print Network

This is a federated and crawled collection of about 5 million scholarly articles and related materials found in databases and on the web. It includes what are called preprints which include articles that have not yet appeared in scholarly journals. It also includes the publication web pages of over 28,000 university faculty, mostly in science and research engineering departments. This makes it easy to go from a single paper to the whole body of a researcher's related work.

Science and Engineering Conference Proceedings

Conference proceedings often precede publication of research results by a year or more and this collection federates 26 large databases. There are hundreds of thousands of papers and presentations, many from professional societies.

OSTI wide search

Science Accelerator

The Science Accelerator searches ten major OSTI collections, including Information Bridge, E-print Network, and the Conferences portal, described above. It also searches R&D project descriptions, the Energy Citations Database, DOE R&D Accomplishments, DOE-sponsored patents, and EnergyFiles, a collection of energy-related databases and  websites.

Government wide search

Federal R&D Project Summaries

This is a federated gateway to individual project summaries from six of the largest research funding agencies. In many cases the search results include recent awards, which may precede research reports or publications by several years. is a search engine for government science information and research results. Currently in its fourth generation, provides search of more than 50 million pages of science information with just one query, and is a gateway to over 1,800 authoritative scientific Web sites and over 30 large scientific databases.

World wide search

Whereas federates the US Government science and engineering databases and websites, the idea behind is to combine similar resources from many different countries. While still very new, already includes major collections from 44 different countries, in every inhabited continent. is the major US contribution.

Taken together this is an impressive list of integrated science and technology portals. But believe it or not, there is a lot more coming.

David Wojick, Ph.D.
Senior consultant for Innovation

OSTI versus Google: Different content, different uses

OSTIblog article by David Wojick on Wed, 5 March, 2008

Part of OSTI's R&D aims at understanding how scientists use information. This goal was originally articulated by OSTI's Thurman Whitson, who has since retired. To that end we have begun to look at the different kinds of information provided by the different Web-based science resources. Different kinds of information imply different uses. It is not that one resource is better than another overall; it is that they are very different and support different uses.

Below are four initial results that show clearly that Google tends to return lay information, while returns scholarly information. (Note: Google Scholar also returns scholarly information, but very little is free, so that is a different issue.)
For the purpose of this analysis, we postulate two Search categories: Layman's Level and Scholar's Level.

LL = Layman's Level. Includes news, magazines, blogs, educational material, product or company information, etc.
Education grade level is basic undergraduate or lower, often high school level.

SL = Scholar's Level. Includes journal articles, research reports, conference proceedings, etc. Education level is advanced undergraduate or higher.

Hit counts are based on the first 20 hits. Note this is rough data, based on the snippets. We have not examined every hit. But the differences are so dramatic that the results will not change much with refined analysis or more cases.

Hits on "biofuel", Google Results LL = 20, SL = 0. Science.Gov Results LL = 1, SL = 19
Hits on "nanocatalysis", Google Results LL = 17, SL = 3., Science.Gov Results LL = 1, SL = 19
Hits on "Higgs boson", Google Results LL = 20, SL = 0., Science.Gov Results LL = 1, SL = 19
Hits on "quantum dot", Google Results LL = 20, SL = 0., Science.Gov Results LL = 0, SL = 20

Conclusion: and Google are very different tools, so support different uses. Google returns lay information, while returns scholarly information. Moreover, all of OSTI's Science Accelerator resources follow the pattern. Each is a far better source of scholarly information than Google.

The technology behind (i.e., federated search) allows information product designers to discriminate among information resources, while the technology behind Google (i.e., crawling) does not lend itself to such discrimination. In the case of, federated search enables product designers to focus on R&D results, which are typically scholarly in nature.

Possible next steps:

1. The LL and SL categories can each be refined and related to different uses. For example, some LL hits are simple news items while others are research center home pages. The research centers usually provide linkage to scholarly publications, while the news items usually do not. Likewise, some SL hits are just abstracts while others are full text articles. Also some LL and SL hits are more technical than others in the same category. A simple, faceted taxonomy to capture these differences should be easy to build.

2. This LL/SL analysis can be extended to the Science Accelerator, as well as individual OSTI products like Information Bridge, E-Print Network, or to other content search systems, such as World Wide Science or the Web of Science. How the content of these differ should be useful for users to know.

3. The different uses each of these different content systems might support can also be described. Sometimes scientists need lay information, sometimes scholarly, in different combinations depending on what they are looking for. Cases range from seeking broad understanding to looking for a specific document or problem.

David Wojick
Senior consultant for innovation