The Birth of Web 3.0
The World Wide Web has developed into a gigantic and loosely organized collection of billions of documents connected by links and search engines, which we know as Web 2.0. Now, the computer industry is developing technology to make that data the basis of a sophisticated human intelligence network.
This vision, called Web 3.0, calls for the Web to be wrapped with a layer of meaning that will allow systems that use human reasoning to retrieve the underlying data. This use of artificial intelligence will allow machines to guide humans to the information they need, rather than dumping information on their screens in the form of search results.
Web 3.0 is an emerging set of technologies backed by large companies like Google, Yahoo, and I.B.M., by small companies dedicated to the concept, such as Radar Networks, MetaWeb Technologies, and Cycorp, and by academic researchers in universities throughout the world. Most of the projects being pursued today have commercial implications in travel (making vacation recommendations) and entertainment (predicting the next hit), but future applications could be implemented in financial planning, education, or other areas. For instance, a Web 3.0 application could be an intelligent system that maps out a retirement system, or that helps choose the right university for a student.
What these applications all have in common is that they leverage the current network of data and powerful computers and connected to the Web. As Nova Spivak, founder of Radar Networks put it: “I call it the World Wide Database; we are going from a Web of connected documents to a Web of connected data.” Web 2.0 allows users to find documents, then to sift through them to find the answer to a question. To plan a vacation today, we might sift through lists of flights, hotels, and car rentals, often with inconsistencies between the lists. Web 3.0 uses semantics to answer a question – “Can you recommend a vacation on the sea for a family of 2 adults and 1 teenager for less than $4,000?”
It is difficult to predict how these systems will be built, and how soon they will be up and running. Vendors and academic researchers are investigating and debating two approaches. Some would like to create a new structure to replace the current Web, while others are developing tools that can extract meaning from the existing Web. Whichever approach is taken, these new systems will have a much greater commercial value that current search engines.
Google’s Page Rank is probably the most well-known Web 3.0 application, and also the most profitable. Page Rank exploits human knowledge and makes decisions about what is important, and uses that information to order search results. It does this by interpreting a link from one pager to another as a type of vote. But these votes are not equal – those cast by more popular pages are weighted more heavily.
New companies dedicated to Web 3.0 are taking this idea to the next step. Radar Networks is one company that mange social computing sites with semantics. These sites let users gather in a virtual space, and create content by adding their thoughts are many types of content, including travel and movies. Radar uses an associative (also called “semantic”) database to store information on these sites. For instance, it might store one person’s relationship to another (friend, relative, boss, teacher, etc.).
Google has financed a group of faculty and students at the University of Washington to create a social computing system using this technology. The Opine system extracts and aggregates user-posted information from product and review sites. One Opine system is designed to answer questions about hotels based on user-posted information. It understands the concepts of room temperature, bed comfort, and price. It can tell the difference between common descriptive words like “great,” “almost great” and “mostly O.K.” When Opine is asked about a hotel, it weighs and ranks all of the comments, ands determines the right hotel for a particular users. “The system will know that ‘spotless’ is better than ‘clean’,” said Oren Etzioni, an artificial intelligence researcher who leads the project. “There is a growing realization that text on the Web is a tremendous resource.”
“Many people haven’t realized this spooky thing about how much they are depending of A.I.,” according to W. Daniel Hillis, a veteran artificial-intelligence researcher who founded Metaweb Technologies in 2005. Metaweb is not publicly describing its service or product, but is focused on building a better infrastructure for the existing Web to extract information semantically. Mr. Hillis believes that “It is pretty clear that human knowledge is out there and more exposed to machines that it ever was before.”
The basis for the work being done by Radar Networks and Metaweb is partly the technology development done for military and intelligence agencies in the United States. The National Security Agency (N.S.A), the Central Intelligence Agency (C.I.A.), and the Defense Advanced Research Projects Agency (D.A.R.P.A.) initiated research into semantic technology a decade earlier than the call for a semantic Web by Tim Berners-Lee in 1999. (He is credited Web was created at about the same time, in 1989).
Doug Lenat is a computer scientist who founded Cycorp in Austin, Texas. His work was underwritten partly by intelligence agencies, and his company now sells systems and services to the U.S. government and large corporations. Mr. Lenat claims that his artificial intelligence system, Cyc, will be able to reason, and to answer spoken or written questions. Cyc, which Mr. Lenat has been working on for over 25 years, was originally created by entering millions of common-sense facts that Cyc would learn. But last year, at a lecture given at Google, Mr. Lenat said that Cyc is now learning by mining the World Wide Web, which is a semantic Web 3.0 process. He implied that Cyc can now answer a natural-language question, and gave as an example: “Which American city would be most vulnerable to an anthrax attack during summer?”
I.B.M. researcher are now using a digital snapshot of the six billion documents that make up the World Wide Web (excluding pornography) to perform survey research and answer questions for corporate customers, according to Daniel Gruhl, a staff scientist at I.B.M.’s Almaden Research Center in San Jose. One query, performed for an insurance company, was used to determine the attitudes of young people on death. The system, named Web Fountain, was also internally to choose between the term “utility computing” and “grid computing” for an I.B.M. branding effort. (“Utility computing” won). Web Fountain has also been used for television network market research. This effort mined a popular online community site. Researchers were also able to predict songs that would be hits by mining the “buzz” on college music. This research had a higher rate of accuracy than current market research predictions.
Most players in this space believe that it is unlikely that there will be complete artificial intelligence systems in the near future, but that the content of the Web is growing more intelligent every day. In Flickr, for instance (Yahoo’s bookmarking and photo-sharing system), users tag photos, making it easier to identify images. Smart Webcams provide security, and web-based e-mail systems can identify dates and locations. It is this type of program that is demonstrating the birth of Web 3.0 today.