Alexi Terry what was the idea behind the research, this notion of page rank?
Terry They started doing the research in an era when people had just begun to do search engines on the web. The web started off, erm, really the idea, there was a bunch of interesting stuff and you browsed, you surfed. You went from page to page saw what was there and that was fun. Erm, and then people realised that there was enough interesting and serious stuff, they might want to actually go somewhere, where they could find something they wanted. So a number of people at different places erm, created what were called search engines. Erm and the basic idea was that you create an index that let you find where things are in the web. So if you have here, and this is sort of a sketch of what it might be, web pages, each of these boxes is a page, a, b, c, and d. Each one has certain words in it, television, computer, circuit, whatever it is. And each one can have links, where the links point to another page. So, this page on computers and net's may point to this one for televisions and computers and so on. Now, what they realised, this is before Google, with the people doing the original 'spiders' they were called on the web. What the spiders could do, is they could give them the address, give the computer the address of this page. The computer could make a list of all the words that were on that page and also, find this page, cause there was a link. Then it would go to this page, make a list of all the words on that page and then it could follow the links there. And computers had gotten fast enough and powerful enough and the web was small enough, that you could actually build a complete index. So you'd end up with something, think of the index in the back of a book, so the word computer appears in pages a, b, and d, the word television appears on this page and so on. So I went to AltaVista let's say, which was one of these early search engines and I typed in computer, it would look in the index it had made and it would give me a search, a list of results that said, a, b, d, and so on. And this made it possible to go find something on the web, instead of just browsing around and seeing where you got to.
Alexi But the problem of course is that, if somebody said computer a thousand times, because that was the key word that was being searched, it would push the result up and it wouldn't necessarily be the most
Terry Exactly, so they have to decide, if there are three results, it's not problem, but if there's a hundred results or a thousand results, which ones do you show? And how do you know that a, is more interesting than d, or be is more interesting than d? So the question of what was interesting, what was irrelevant, wasn't addressed by having just a regular index like this. So, the problem really, here's where Google, the founders of Google came in, Serge and Larry decided, that they could do a better job of, finding the interestingness, the relevance, what makes a page something you want to see, other than just that it happens to have the words that you search for.
Alexi And how did they go about identifying interestingness, because that's a very subjective idea, isn't it?
Terry So interestingness is of course subjective, and there is no, what plays things like Yahoo did, is, had human beings go through and say, here's an interesting page, here's an interesting page. That was the, the people, Yahoo was the most famous now, but there were a lot of people in that era, who would go through and check out pages. And again that worked when the web was very small.
Alexi Exactly that would not scale
Terry And as the web gets bigger you can't have higher people to go out and look at all the pages. So the question is, how do you get people who you don't hire, to in some sense give you judgements on which pages are interesting. And they had a very interesting sort of metaphor for this, which is, imagine a crowd of people all surfing the internet. So you take millions of people, start them out all over the internet, and they get to a page and they'll follow a link and from there maybe they'll follow another link. Now if you could actually get millions of people and all the paths they take, you would see that traffic would end up concentrating on certain places. A lot of people would end up here on this page and only a few people went on this page. Then when you've got around to giving your search results, you would give the ones that got a lot of this virtual traffic. Now this is not actual people going, cause you don't have millions of people, you don't have data on that. But you can imagine, where would they go.
Alexi So in, so if we kind of take this outside of the web, this would be like places in a City, that have a lot of people driving through it, for example, it's a particular junction, it's an important building or something like that. That's what these websites, that's what the search algorithm identified?
Terry That's what would decided what's the most relevant, what's the most interesting. So, there's no, there is no simple way to actually get that data. Because the people who know where other people went on the web are only the service providers and they don't give that information. But what they realised is, if they used links, they could get an approximation of how interesting pages were. So they built a second index, which, not only kept track of what words were on each page, but were, it was linked from, so, you might here say that page b, has a link coming in from a, and a link coming in from x.
So they actually had information that gave them the full link structure of the web, where does every link go from and to. Then they could take this and they applied a mathematical algorithm, it's called the page rank algorithm. Which was intended to basically simulate in some sense, the result of what would happen if you had an infinite number of monkeys. If you put thousands, millions of millions of people on the web and let them just start browsing. And the result that they can get out of running this algorithm, which of course didn't require millions and billions of things going on, erm, was a good approximation that page b, lets say is the one that would get the most traffic of a, b, and d. So then when you search for computer, it brings b to the top of your listing.
Alexi So if a page had a lot of people going to it or referencing it, then that would increase its interestingness, it would increase its reputation?
Terry It's a little bit like in academics, were you have citations. So I write an academic paper and I say, see so and so's paper from such and such year. That indicates that, that's an interesting paper. And it's sort of the same thing here, if you have lots of links pointing to you, that indicates that a lot of people have decided you're interesting enough to put in a link pointing to you. So that's really the basis of the algorithm.
Comments Post your comment