The road not yet taken on the Web.

Backward Links

Links on World Wide Web pages contain a destination address; when clicked, the target page is loaded. These are “forward” links; they provide passage from the current page to some other page on the Web.

There is no converse way to find pages which have links pointing to the current page. The Web has no “backward” links.

Forward links are so handy that backward links are often forgotten. Yet there are several reasons they’re useful:

To track how many people have links pointing to your site (a key demographic indicator on the Web), you want to count backward links.

Anyone who is linking to your site must think you are interesting in some way, either good or bad. Backward links allow you more-or-less direct contact with whoever is publicizing your site.

If you’ve got something to say about someone else’s site, whether it be a sales brochure, political platform, or research paper, you can link to it from your comment. It doesn’t do much good, though, as no one on that site can find that link. Backward links would allow others to hear you say your piece.

Possible Implementations

There are a few ways to do backward linking. Thanks to many Web folks for illuminating me on this topic.

One method is to extend HTTP servers to store information about backwards links. This way, when you make a link to a page, you also inform that page’s server of your link, and it makes a note of it. This permits very rapid search over the available links. Unfortunately, it requires extending Web server software. The Foresight Institute has a project to implement backlinks this way. I have also been informed of a backward-links package written by the inimitable John Walker, which seems to do about 80% of what the Foresight Institute wants to do. Note that this solution permits fine-grained linking, letting other people at other sites link to just a part of your document.

Another method is to use a little-known feature of the HTTP specification called the HTTPD_REFERRER variable. Most Web clients send this as part of a request to get a Web page; the value is the URL of the page the user is currently looking at. This will more often than not be a page which links to you! You can then write a CGI program to log these entries, and periodically to scan them and verify whether they point to pages which actually link to you. For example, you can view the backlinks to this page. (However, the script which processes these logs isn’t working correctly for me right now, and I haven’t bothered to fix it. Why not? See my final method.)

Note that any scheme which stores backlink info near the document also makes it likelier that users will be able to tamper with the backlinks information. (Remove all links made by your competition!)

A final method is to extend a big Web database to allow searches on the HREFs of anchors–searches on the actual literal pathnames which address the anchor target. Some useful new search requests become possible with such a database, such as:

“find all pages which link to this site” (i.e. search for all pages with “http://www.unrealities.com” inside < A > delimiters)

“find all pages which link to this page” (i.e. search for “http://www.unrealities.com/web/backlink.htm” likewise)

These are “indirect” or “intensional” backlinks, generated by a database query. There is no need for any new client or content-server software. The database is retroactive–detecting links about which the document server has never been informed. And the database’s backlink information is not alterable by the document server, so you can’t censor your backlinks.

I know of a couple of Web databases that allow such searches. One is the Open Text Web Index. Choose Within: Hyperlink in the Compound Search form. It really works! I found one hotlist pointing to my site that I didn’t previously know about. Another, and a better-known- link-searchable index is good old Alta Vista. Search for “+link:www.mysite.com -url:www.mysite.com” and find PLENTY of pages pointing at you!

What then?

All these ideas have scaling problems… popular sites have thousands of links to them, much of which are likely uninteresting. Extending the implementations to support compound queries (“show me only links from pages in .edu domains” or “show me only links from pages containing the word ‘chemistry'”) would be straightforward. Perhaps anchors which are closely enclosed within < LI > tags could be filtered out, to reduce the hotlist clutter.

Eventually, ratings systems maintained by databases could also be used to augment queries, allowing better backlink filtering–again, potentially, without affecting servers. And for most not-so-popular pages, the simple URL search would result in fewer (and more focused) hits.

Summary and Plan

Backward links are useful to a wide segment of Web users, and potentially straightforward to implement on top of current Web infrastructure. Additional features are desirable, but perhaps not initially necessary.

Other people are tracking backlink work–Robin Hanson for example.

If you (or your company) are interested in backward links, or have information about other techniques or packages for backlinking, let me know.