SEARCH ME

Brian Whitaker wants more surfers to find his Web site

The Guardian, 4 June, 1998

SEARCH ENGINES are the lifeblood of the Internet: they index Web pages and help users to find things. Without them, you're lost in cyberspace. In a moment of wild optimism, about 10 minutes after posting my new Web site on the server, I began contacting the search engines to tell them about it. Several replied - churlishly, I thought - that they would take a look in a few weeks.

Well, their time is up and I have now discovered the first law of the Internet. This states that if you give a chimpanzee a computer and an infinite amount of time, it will type out the complete works of Shakespeare long before it finds your Web site.

Since my Web site is about Yemen, let’s begin with an easy test: "yemen constitution". Connoisseurs will know that Yemeni constitutions have almost as much built-in obsolescence as a Pentium chip, and my Web site has the world's finest and largest collection of them.

The search engines, however, show neither taste nor appreciation. They point relentlessly to the law department at Wuertzburg University, which has just one long-dead constitution split into multiple files.

To make it easier still, I try: "yemen gateway" - the name of my Web site. This time most searches do put it top of the list, though one - bizarrely - puts it second, behind a banker's memo about honey exports. One search even states, with great authority, that my site is only "22 per cent relevant". They'll be hearing from my solicitors.

Search engines rely mainly on meta data, such as key words, inserted at the top of HTML files, though some check ordinary text as well. Meta data is normally hidden, but you can look at it with the "view source" command on your browser.

The meta data on my own site was done largely by guessword - trying to anticipate the search terms readers might use, and following examples in other Web pages. But without special knowledge, it's virtually impossible to work out how different search engines respond.

The 10 most-searched words on the Internet are: sex, nude, pictures, adult, women, software, erotic, erotica, gay, and naked - in that order. I'm tempted to drive the Wuertzburg site into obscurity by labelling my files "Yemen's adult constitution without erotic pictures of naked women or gay sex" but I might disappoint someone.

While pondering this, I received an e-mail from Fabio Salvadori of www.pegasoweb.com advertising Engenius, a bit of software that claims to improve searchability. My hopes were not high because, on looking at the meta data on his own Web pages, I found rather a lot of typing mistakes. Still, I asked him to choose any two files from my Web site and run them through Engenius to see what it could do. What Salvadori did is rather complicated but you can view the tweaked files athttp://www.al-bab.com/gov/gov1.htmorhttp://www.al-bab.com/artic/gdn3.htm, orhttp://www.al-bab.com/forum/if you want to see his explanation. Salvadori decribes this as "optimising" files, which sounds entirely innocent. But some people use more unscrupulous tricks, such as creating "ghost" pages or concealing key words in image data, which to my mind deserve a yellow card if not a sending-off.

After I let readers into the secret of my hidden Web counter, people with nothing better to do keep hitting the home page then e-mailing me with the latest score. One reader urged me to spoil their fun and make the counter truly invisible by changing its colour to match the background. Another pointed me tohttp://www.cranfield.ac.uk/docs/stats/. Here you'll discover that Web statistics are total nonsense anyway. Yet another - who'll obviously go far - told me how to fiddle the counter. To think that Guardian readers would even dream of it! Well, I shall do the honourable thing if I can, and remove a few of those gratuitous hits by turning the counter back.

FEEDBACK:

© Copyright Guardian Newspapers Ltd 1998