According to Webster's Online Dictionary semantic means "the relationships between symbols and what they represent." Tim Berners-Lee, the man who invented the World Wide Web in 1989 at CERN, the European Particle Physics Laboratory in Geneva, has used the term to christen the internet of the future.
The Semantic Web is a set of technologies he's developing right now as director of the World Wide Web Consortium. Born in London in 1955, Berners-Lee was knighted by Queen Elizabeth II in 2004. In this exclusive interview, he explains his vision of the future Semantic Web, which he says will be much more powerful than anything we have seen before.
IDG: First of all, shall I call you Sir Timothy, Professor Timothy or Mr. Berners-Lee?
Berners-Lee: You can call me Tim.
IDG: Well, Tim, my first question is the most obvious one: Can you explain in simple terms what the Semantic Web is?
Berners-Lee: I have often been asked about that. And the simple thing to point out is: in your computer you have your files, your documents that you can read, and there are data files which are used in applications, data files like calendars, bank systems, spreadsheets. These contain data which is used in documents that are out of the web. They can't be put on the web.
So, for example, if you are looking at a page, you find a talk that you want to take, an event that you want to go to. The event has a place and has a time and it has some people associated with it. But you have to read the web page and separately open your calendar to put the information on it. And if you want to find the page on the web you have to type the address again until the page turns back. If you want the corporate details about people, you have to cut and paste the information from a page into your address book, because your address book file and your original data files are not integrated together. And they are not integrated with the data on the web. So the Semantic Web is about data integration.
When you use an application, you should be able to put data there so that you could configure that data. I should be able to inform my computer: "I'm going to that event." And when I say that, the machine will understand the data. The Semantic Web is about putting data files on the web. It's not just a web of documents but also of data. The Semantic Web of data would have many applications to connect together. For the first time there is a common data format for all applications, for databases and pages.
IDG: Did you come up with the term "Semantic Web?" Is this the so-called Web 3.0? What's the difference between the Webs 2.0 and 3.0?
Berners-Lee: Yes, I did. It was in 1999, in my book "Weaving the Web." Web 2.0 is a name to describe how the files using the web work. You have user-generated content, and you have people logging on sites and tagging things, uploading a photograph, making community sites. So Web 2.0 is about the community-based Web sites. That is not a term that I invented. Tim O'Reilly invented that term in 2003.
About Web 3.0, some people had used that term to mean a coming architecture. Some people use it to think about the regulation of web technology. But think about the future of web technology. A well-known problem which is typical to a 2.0 file is that the data which appears is not on the site, it's in the database. It's not on the web. So people can't reuse that data. You might take a professional website with information about some of your colleagues and the people you work with, and another site with information about your friends, and other sites about different communities. With Web 2.0 you can't see the whole picture; nobody could see the whole picture. So some people said, well, web 3.0 will happen when your site provides data that you can navigate. For example, if one of several sites which use web technology finds useful data about my friends on my journal, then I can set up an icon to inform the computer "Get back data out and look at it and add it to the data which I got from other sites and then look at them all together."
IDG: So what's the difference between a web of documents and a web of data?
Berners-Lee: There are many differences between documents and data. Take, for example, your bank data. There's two ways you could look at it. If you just look at a plain web page, then it looks like a sheet of paper. All you could really do is read it. Now if you look at it on a Web 3.0 site, you could maybe use a Java search to change the order of the data, and you could reach much better access to data.
Today, before you prepare to do something like paying your taxes, you need to use software like Quicken or Microsoft Money or your favourite financial program. When you do that, you don't load it as a web page, you load it as a data file. That's the difference between data and documents. When you look at your bank data for documents, you can just read it. When you look at data, you can find how much tax you owe, you can see how much your bills are, there are all kinds of things you can do with data.
We don't have the ability to do this with data on the web. If you could do that with data, the characters you gather with bank data would become a standard that would only work with banks. It's a financial standard for bank data. There will be completely separate standards for calendars, for example. What you can't do today is, say, to ask the computer: "When did I write that check? When did I have that meeting?" You can't connect items in different data files, unless you use the Semantic Web. This is much more powerful, because you can connect the people, connect data, which is about the same person, which is about the same place, which is about the same time.
IDG: But with this full connection between personal data, companies' data and government data, don't you think the first concern people will have is around the issue of privacy?
Berners-Lee: Yes. And I have that concern, too! An important aspect of Semantic Web technology is called provenance - where the data comes from and what it can be used for. Our research group at MIT is developing systems to show what allowed uses the information is for, so you can keep check on where it comes from, what it stands for, and make sure that it won't be used in any different way. We call this capability "information accountability."
IDG: In order to define preferences on how personal data can be used, an individual has to have some technology skills; but that's not the case of the largest part of the world's population, is it?
Berners-Lee: First of all, when you use the Semantic Web for personal data, you're not putting it out on the web. You have a personal web for your data about your life on your computer, and you use it to navigate locally. You are not putting it on the open internet. There are a lot of tools like Quicken or Microsoft Money where the bank systems come down the Net in a secure channel and they are supposed to work locally. You're not using web technology; you're not going over the internet. You don't put your personal data files on the Internet. We're talking about allowing you to combine - on your desktop - personal information to which you have rights - enterprise information to which you also have rights, and public information in a very rich view of the world.
Well, you said that for people to be able to handle data they need a lot of skill. Sometimes this is true but, for example, to use a calendar, you are creating data, right? When you create an address book, you are creating data. So these things have user interfaces which allow you to make things and never have a data problem, unless you are using an incompatible program. We are working at the moment to make this technology available to those who want it to do enterprise documents. We do not yet have Semantic Web technology available which is that easily usable by grandparents and children. That is true. That is something which we are developing at MIT. We have a team working exactly on that, making programs to allow people, normal people, to read and write and process their data.
IDG: When the Semantic Web achieves its full potential, will it start a second Internet boom?
Berners-Lee: Well, in a way it's already starting, but I don't think the web has reached its full potential yet, and it's been around for almost sixteen years now. The Semantic Web is going to take off particularly when we see people using it for data processing, when we see people using it in more and more things, adding personal data, adding files to government data. But I think it will take many years, because so much will be done on top of it.
IDG: What is Net neutrality? What's your position on it?
Berners-Lee: Net neutrality is the fact that when I pay money to connect to the internet and you pay money to connect to the internet, then we can communicate, no matter who we are. What's very exciting at the moment is that video is happening on the web. YouTube gets a lot of attention, because they are delivering video over the web.
Now suppose I'm in Massachusetts and I want to find a Brazilian movie. I go to the internet to find my favorite independent movie and filmmaker. But then the cable company in Massachusetts blocks the transmission and says, "No, we won't let you do this, because we sell movies. So, yes, we do the internet but on the other hand we will stop you from seeing internet movies. We want to be able to control which movies you buy."
We've seen cable companies trying to prevent using the internet for Internet phones. I am concerned about this, and am working, with many other committed people, to keep it from happening. I think it's very important to keep an open internet for whoever you are. This is called Net neutrality. It's very important to preserve Net neutrality for the future.
IDG: In 2003, several governments proposed an international administration of the internet, mirroring the set-up of the likes of the United Nations or the European Union. Do you think that Washington will ever allow that to happen?
Berners-Lee: I think that slowly the internet will get more bureaucracy. I think it's inevitable. It's important to allow people in different countries, developing countries, to develop their use of the internet as quickly as possible. But the administration of something so big will never be controlled by a unique bureaucracy. I don't know what form that bureaucracy will take, since there is a lot of politics involved. But I would say it's very important that it should be government free and without censoring the people who use it.
IDG: You once said the web was created to solve a frustration you had at CERN. What was that and how did it happen?
Berners-Lee: CERN is a wonderful diversity of cultures, because people come there from all over the world to do physics. In 1989, at the time before the Web, I wrote a memo explaining what it would be like to have the web. I mentioned the hypertext system, the World Wide Web if you like, as a method to add and edit data. My perception was that I wanted all the information in CERN's network to be available easily. I wanted to develop the tools to allow people to collaboratively build and use information. I wanted people to be able to design software and specific experiments, by using something together in different aspects.
So the web originally was supposed to be for collaboratively designing things. The first tool was a web browser and editor as well, allowing people at CERN to use a document, edit it, change it and then send it, making links between web pages and scientific documents.
The frustration was that I wanted to be able to work with people very easily in different countries, where they were using different machines, working with different database systems and sorting data in different formats.
IDG: Lots of researchers made millions on the web, but you preferred to keep developing standards. Don't you feel you missed the chance of a lifetime by not creating a proprietary web?
Berners-Lee: No, I don't, because if it was proprietary, people would not have used it, they would not have contributed to it. It would not have taken off and we would not be talking about it right now.
IDG: Some people like Nova Spivak and Microsoft's cofounder Paul Allen work with a timeline that envisions the arriving of Web 3.0 by 2010 and a future Web 4.0 by 2020. Can you imagine what this web 4.0 is supposed to be?
In the future we will have the Semantic Web that will allow a whole lot of other things. One of the powerful things about networking technology like the internet or the web or the Semantic Web, one of the characteristics of such a technology is that the things we've just done with it far surpass the imagination of the people who invented them. Take for example the inventors of TCP/IP, the original protocols for communication between computers over the internet, created by Vinton Cerf and Robert Kahn in 1974.
When I invented the web, I thought of it as an infrastructure; I designed the web as a foundation for many things. With Web 2.0, social networks and all kinds of things happen on top of it. When the Semantic Web arrives in the next few years, things will be using it in a way we cannot know yet. So, in a way, it's foolish to try to imagine what Web 4.0 will be like when we still don't know what will be done with 3.0.
For Web 3.0 to succeed, the people who are studying it at this moment will have ideas which will enable the new technology. They will design fantastic things just like people with Web 2.0 are designing fantastic things right now. People working with the Semantic Web will make much more powerful things. We can't imagine what they will do. But we have to build the web to be an infrastructure. It shall never be used for particularized purposes but just to be a foundation for future developments.