E L E C T R O S P H E R E    Issue 1.05 - September 1995

Encyclopaedia Britannica Online?

By Robert Rossney

In 1768, the American colonies were bristling at the taxes imposed on them from England. Captain James Cook set sail on a scientific endeavour that would eventually lead him around the world. In Vienna, Mozart wrote his first operetta. In Edinburgh, Colin Macfarquhar and Andrew Bell went into the publishing business.

There aren't many companies in business today that opened their doors before the Boston Massacre and have been making the same thing ever since. But Macfarquhar and Bell's product was a good one. Not only has it survived all this time, but for over two centuries it's been a respected household name.

Over the last year, however,it has entered a sphere its founders couldn't possibly have imagined: the Encyclopaedia Britannica has arrived on the World Wide Web.

Though it is currently available only to colleges and universities, Britannica hopes to make Britannica Online, the Web-based edition of the big blue reference set, available to individuals by the end of this year. The company is making a high-stakes bet on the potential of the new medium to remake its age-old business and transform its product into a powerful, up-to-the-minute, universal tool. An enormous body of information (66,000 articles, 44 million words), Britannica Online is readily accessible, simple to use, and easy to navigate. Every article has pointers to related articles, and many also include links to outside Web sites; these simple-to-follow threads of inquiry lead you as far as you choose to go.

But before we wax too enthusiastic, let's ask ourselves: Do we really need an online encyclopaedia to satisfy our craving for information?

Think about it. The encyclopaedia is an artifact of the Enlightenment notion that the universe is rational and finite, and can be understood once and for all - if we just put our heads together and work at it. But this idea stands in stark contrast to a postmodern worldview in which every aspect of knowledge is impossibly complex: just summarising an area of knowledge does violence to the truth, omitting more than it reveals. Hence, isn't putting a summary of the world's knowledge on the Web a spectacularly useless thing to do?

This question makes Robert McHenry laugh out loud. McHenry is the Encyclopaedia Britannica's editor in chief. If there's anyone in the English-speaking world who should decide what falls into the realm of general knowledge and what doesn't, it's him. Spend an afternoon with McHenry and you'll think he couldn't be more qualified for the job; there must be something he doesn't know - but it's hard to imagine what.

McHenry has no problem fitting the encyclopaedia into today's information ecology. Let's suppose you need to find out about something that you know nothing about - say, who lives on the northeastern frontier of Laos. "If you want to know about Laos," says McHenry, "there is a continuum of knowledge about Laos. On one end of this continuum, you can actually go to Laos and see it yourself. The other end is the simple knowledge that there is a place called Laos, that it's a bounded region on the Earth's surface, and that, as such, it has a northeastern frontier."

The intent of the Encyclopaedia Britannica is to provide the interested person with a door into an area of knowledge. "The encyclopaedia sits in the middle of this continuum," McHenry says. "It provides a commonly accepted level of detail about the subject and bibliographic references so that if you need more detail you can find it."

The great advantage to putting the encyclopaedia online is that it vastly expands the number of subjects an encyclopaedia can cover, since the amount of digitised information it can hold has virtually no limit. Its content can evolve over time, without regard to the limitations of space and print distribution economics.

For the print version, on the other hand, once the marketing department settles on a price, it also decides on the number of pages. Because its size is fixed, new information is forcing out old all the time. The result is a juggling act. Add an article to the "Chicago to Death" volume, as the editors were compelled to do after the 1992 US presidential election, and something else has to go. When International Harvester changed its name to Navistar International Corporation in 1986, the editors had to not only update the article but move it to a different volume.

Historical information changes as well. In the past, Britannica's article on Rembrandt cited the painting The Polish Rider, which hangs in the Frick Collection in New York. But when Britannica last revised the Rembrandt entry, scholars were questioning the painting's authenticity. To avoid any inaccuracies, Britannica rewrote the article without the reference to The Polish Rider - though the new entry had to fit the same hole as its predecessor.

But the old limitations of paper had their uses. "They told you just how summary the summary needed to be," says McHenry. "Being freed from them is liberating and chaotic. We have to impose disciplines on the process and invent a new set, based on the merits of the case rather than the physical limitations of what will fit in the books."

Why not simply edit based on usage? Drop the articles that nobody reads? It may sound reasonable, but it's not that simple. "If I found out that there was an article in Britannica on one of the obscure emperors of the Byzantine Roman Empire that nobody ever looked at, would I cut it? Absolutely not," says McHenry. "You see," he grins, "then there would be a Roman emperor we didn't have."

The most technologically impressive capacity of Britannica Online is also practically invisible. Making the text electronic is no big deal, and Web masters everywhere are continually coming up with Web-based front ends for popular databases.

But what happens between the time you type in your query and the time the relevant articles come up onscreen? That's where the development hours come in - and how the people at Encyclopaedia Britannica's Advanced Technology Group in La Jolla, California, have been earning their daily bread.

The starting point for using Britannica Online is a simple text box into which you enter a word, phrase, or question. Suppose you type in a question like, "When did Cesare Borgia die?" First, the parser strips out "stop words," those too common to appear in the index, like "when" and "did." It then shepherds the rest - "cesare," "borgia," and "die" - to the search engine.

A simple search engine, one that could find articles containing all three words, is practically useless for handling this type of query. The Cesare Borgia article mentions that he was killed in 1507, but not that he "died." The engine has to be smart enough to know that we need this article even though it isn't exactly what we asked for.

But broadening the search to include all articles that contain any one of these three words causes another problem. This set of hits includes not only the article on Cesare Borgia but those for Lucrezia Borgia, Pope Pius III, Niccoló Machiavelli, the town of Imola, Leonardo da Vinci, the Montefeltro family, and Pope Alexander VI. And Joan Sutherland, because she sang in Die Zauberflöte and Giulio Cesare. In fact, there are dozens of articles that contain two of the three terms, as well as thousands that contain only one.

The tricky part is ranking hits by estimating how likely it is that each hit is the one we want. The search engine achieves this by using what Britannica's artificial intelligence and computational linguistics people call "experts": algorithms that score each article on a particular scale.

There's the "title word" expert, which ranks an article by how many search terms appear in its title. "Borgia, Cesare" gets two points on this scale, while "Sutherland, Joan" gets none. There's the "proximity" expert, which ranks articles by how closely together the search terms are found in the article itself. (An article that mentions "Cesare Borgia" gets a higher score than one that mentions "Die Zauberflöte" in one paragraph and "Giulio Cesare" in another.) In all, five different experts evaluate how closely the hits match the query.

Once the experts have marked every hit, the results are weighted and combined, and the articles sorted by their final scores. In the end, Cesare Borgia heads the list. And while Lucrezia and Machiavelli make the top 10, Joan Sutherland doesn't cut it.

There's no denying that Britannica Online is handy to have around. If you wake up in the middle of the night wondering what bees use to build their hives (propolis), or if you're writing an article and you can't remember what company Gordon Moore co-founded (Intel), the answer is no further than your nearest Net onramp.

"The automated search capability is a big thing," says Nancy John, who manages library systems for the University of Illinois at Chicago. The university has been a beta site for Britannica Online. According to figures supplied by Britannica, students look up about 4,600 articles a month.

"Many people tend not to use the index - or the references to other articles - in the print encyclopaedia," she says. "Watching people work online, it's easy to see the difference. You just click and go." But even the best electronic encyclopedia in the world is no good if the price isn't right. Britannica Online has a fairly simple pricing formula: an annual fee of one dollar per full-time student gives a university the right to tap into it from the campus network. Only those users accessing the Net from the school's IP address can get in. At a huge institution like the University of Illinois, that would cost around £41,000.

"We would like to give people 24-hour-a-day library service," says Nancy John. But keeping the libraries open around the clock is out of the question. "We don't have the staff or the funding. So we're building a system that gives people access to information services even when the library is closed. You ought to be able to use the card catalogue to find out if we have something on the shelves; to search a good dictionary and have access to a newspaper. You also ought to have an encyclopaedia."

The dollar-a-student figure makes sense t o Ann Mueller, too. Mueller is the technology manager for Portfolio, Stanford University's Web-based online information system. After Stanford signed on with Britannica Online last year, Mueller folded it into a suite of databases that Portfolio makes available to Stanford's 14,000 students.

"Library budgets are tight," she explains. "These services extend the library's resources without a lot of dollars added to the budget. And being able to search the encyclopaedia in an online environment is a value you can't get by having the encyclopaedia on your bookshelf," she adds. "The only thing you can't do is take it to bed with you and browse."

Joe Esposito, the president of Encyclopaedia Britannica, North America, believes that Britannica Online is the key to the company's future. "This is not just another format," he says. "In the long term, digital media will fundamentally destabilise the way we do business. Usually, people talk about the revolution in digital media in terms of putting interactivity within the product itself. But the real revolution is in the market."

And it is a revolution. Over the last five years, encyclopedia publishing as an industry has become quietly desperate. When sales started dipping industrywide in 1989, publishers blamed the recession and gritted their teeth. But when the recession ended, sales didn't bounce back.

Indeed, on paper, Encyclopaedia Britannica looks like a company in trouble. Since 1990, when sales were £400 million, its annual revenues have steadily declined, dropping by 30 per cent last year to £25 million. Those revenues were sustained by Britannica's other product lines (notably the company's thriving English-as-a-second-language instructional materials); actual North American sales of encyclopaedias have declined from 117,000 units in 1990 to 51,000 in 1994.

Behind this decline is a big change in the market. Traditionally, the people who buy encyclopaedias are parents of school-age children. They're people willing to hand over a chunk of money for a stalwart, long-tested tool that will help their kids get an educational leg-up. They're also buying an expensive status symbol that sits on their bookshelves and reports to the world how brainy they are. But in the last five years, parents have found a new way to do both: the money they once spent on encyclopedias is now spent on computers.

And since those computers often come bundled with the likes of Microsoft's Encarta, parents feel they're getting the best of both worlds. Never mind that Encarta - which Britannica staffers call an "alleged" encyclopedia - won't tell you what bees use to build their hives, or why Laotian mountain people are called "Kha."

It's awfully hard for the paper edition of the Encyclopaedia Britannica (which, in its cheapest bindings, retails for £950) to compete with bargain-basement CD-ROMs (retailing for about £40). It's hard enough for the CD-ROM encyclopaedia publishers to sustain themselves. For instance, in January, Compton's NewMedia, publishers of Compton's Interactive Encyclopaedia, cut its work force by 30 per cent and took an £7 million hit against quarterly earnings.

This particular example shows that perhaps Britannica is savvier than sales figures would lead you to believe. Britannica originally produced Compton's Interactive Encyclopedia, selling the product to Tribune Company (the present owner) in 1993 for £36 million.

Meanwhile, the explosion of the Internet has made delivering Britannica online look viable, particularly if you believe, as Esposito does, that Windows 95 will bring another 10 to 20 million people online.

Britannica's owners are betting the company on it. While they won't disclose any overall dollar figures, they report that they currently have about 35 universities signed on. Along with per-student fees from big universities, the reasoning goes, the advent of digital commerce will turn their service into a healthy enterprise. In April, the company, which is privately held, announced it was actively seeking outside money (so that it doesn't have to rely on its declining sales) to provide the capital for ongoing investment in online distribution and new product development.

It's a risky gamble but an appropriate one for an industry being turned upside down. As Esposito points out, online distribution changes everything about publishing. "For example," he says, "general trade book publishers now sell to retailers: Barnes & Noble, Crown, and the like. These publishers are looking at CD-ROMs with great interest, because if you sell CD-ROMs, your warehouse, distributors, and business structure haven't changed."

Online, you become a direct marketer. You have no sales force. You have people maintaining your server - people who look strange and listen to funny music and run this little box holding everything you've ever published since day one," explains Esposito. "Everything has changed."

At bottom, what's driving this revolution is the logic of Moore's Law. It was Intel co-founder Gordon Moore who foresaw that microprocessors would double in speed and density every year. Microsoft has taken its present form because Bill Gates asked himself a decade ago, "What would my business look like if the hardware were free?"

And now Esposito is asking what publishing will look like if distribution is free. "I don't claim that we've answered that question," he says. "But if this line of thinking is good enough for Bill Gates, we ought at least to look at it. The falling cost of distribution is going to have more impact on publishing than interactive digital media."

Obviously, transforming a company like Encyclopaedia Britannica into a content provider isn't cheap, though Esposito says it's impossible to put a number on it. "The costs are spread across different product lines. We have the development of production capabilities and the conversion of text to electronic media. We have the digital search engine and the distribution architecture." Putting what these components have achieved online is another incremental step, and so is developing new products.

But, he says, you have to take the long view. "We're not out there buying bonds, hoping for 8.5 per cent on our money. We're investing in a bedrock asset - a foundation for our growth into the 21st century."

It will be interesting to see what the 21st century holds for Britannica. As Bob McHenry admits, "The dirty little secret of the encyclopedia industry is that we don't know whether or not people read what we publish." Read or not, the sturdy presence of the Britannica in the home has been a marker for the educated middle class for more than 200 years.

However useful Britannica Online may be, it's a rotten status symbol: one that's invisible. And so into the next century, it appears, the Britannica may stand or fall on the basis of whether or not people do actually read it.

Robert Rossney (rbr@well.com) writes the Online column for the San Francisco Chronicle and is co-author, with his wife Sonia Simone, of Quiet Americans.