Transliteration: Arabic words and the Roman alphabet

The writing of Arabic words in English texts presents a number of difficulties, even for those who are familiar with both languages.

In 1926, when T E Lawrence ("Lawrence of Arabia") sent his 130,000-word manuscript of Revolt in the Desert to be typeset, a sharp-eyed proof-reader spotted that it was "full of inconsistencies in the spelling of proper names".

Among other things, the proof-reader noted that "Jeddah" alternated with "Jidda" throughout the book, while a man whose name began as Sherif Abd el Mayin later became el Main, el Mayein, el Muein, el Mayin and le Muyein.

"Arabic names," he replied, "won't go into English, exactly, for their consonants are not the same as ours, and their vowels, like ours, vary from district to district."

Such inconsistencies may not matter much in a literary work but in many other situations they do matter. Alternative spellings can make one person appear to be several different people – a problem which, among other things, has hampered efforts to track down al-Qaeda supporters.

The late Libyan leader's name was especially problematic, and newspapers never agreed on how to spell it. Some started it with the letter G, others with the letter Q. According to one website, there were 32 possible spellings.

There is no ideal, all-purpose solution, but there are several ways of approaching the problem ...

One approach is to take Arabic words as they are pronounced and write down approximately similar sounds in the Roman alphabet. This is what early European travellers to the Middle East usually did, and the results were often bizarre or, in some cases, almost unrecognisable.

Inexact spellings such as "Mecca" and "Koran" entered the English language a long time ago and have become so entrenched that they are now difficult to eradicate. In old books the Prophet's name is frequently spelled as "Mahomet" and this is still used to sometimes today. There is no logical reason for it because Muhammad is one Arabic name that can easily be rendered in a way that is both phonetically accurate and faithful to its written form.

The Roman alphabet, of course, is used by a number of European languages, so phonetic representations of Arabic words vary according to the mother tongue of the writer. Romanised spellings adopted by Arabs themselves often reflect previous colonial influences: an Arab in a country with strong English influence might spell his surname as "Shaheen", while a cousin in a French-influenced country would spell it as "Chahine". In both cases, the original Arabic name is the same.

A further consideration is that there are also significant regional variations in pronunciation by Arabs. So a single Arabic word spoken by a Moroccan, an Egyptian and a Saudi could easily appear as three different words if written phonetically in the Roman alphabet.

The spellings of Arabic words found in the western mass media are often at least partly phonetic but rarely do justice to the original.

In some circumstances, more precise phonetic spelling is needed – in phrase books for tourists, for instance, or in pronunciation guides for broadcasters. The following examples come from a guide issued by Associated Press to help American radio stations with their pronunciation:

This is not only thoroughly unscientific but highly inaccurate. The guide happily inserts various sounds that don't exist in the original Arabic (a K in "Rahman", for example) and ignores several others that do exist. It also offers two different pronunciations of "Abdel-", for no logical reason.

Truly phonetic spelling follows the International Phonetic Alphabet which is used academically by linguists. Its disadvantage in general use is that it requires characters outside the normal alphabet and is therefore more or less incomprehensible to non-specialists.

A different approach is to start with Arabic words in their written form and transcribe (or "Romanise") them by replacing individual Arabic letters with corresponding letters from the Roman alphabet. This sounds simple but is actually very difficult. For example:

The ideal solution would be to have a standard, internationally agreed, system. Several have been proposed but unfortunately none has been universally accepted.

Probably the earliest attempt at standardisation was Deutsche Morgenländische Gesellschaft proposal, adopted by the International Convention of Orientalist Scholars in 1936. It is the system used in the Hans Wehr Arabic dictionary. Another standard was agreed in 1971 at a conference of Arab experts in Beirut and - theoretically, at least, accepted by the countries of the Arab League. It has met some resistance, particularly in those Arab countries where French predominates over English. Other transcription/Romanisation systems include:

Adopted by the US Library of Congress and the American Library Association for cataloguing books, the system has found its way into wider academic use. It covers a multitude of languages: there are 54 Romanisation tables for more than 150 languages and dialects written in non-Roman scripts. The table relating to Arabic may be viewed here and here. Alternatively, a complete set of the tables can be purchased from amazon.com

Published by the International Standards Organisation. Copies may be purchased here.

Not widely used - which is hardly surprising since the British Standards Institution holds the copyright (it cannot be reproduced here) and copies are expensive to buy (about $39 for an eight-page document).

Overseen by the Group of Experts on Geographical Names (UNGEGN), this aims to promote "consistent use of accurate place names" on maps and similar products. Work on the project has been continuing since 1972. A progress report on Arabic romanization, dated March 2000, can be viewed in PDF format here.

The transcription/romanisation systems described above all suffer from the same disadvantages, to varying degrees:

This last point is particularly important today, though it could not be foreseen when most of the romanisation systems were devised. Currently, the most advanced approaches involve precise letter-for-letter transcription systems which allow a text files originally produced in Arabic to be romanised by a simple computer program and converted back again into perfect Arabic. Beyond straightforward text files, this has important implications for the use of databases.

The Buckwalter Transliteration, developed by Tim Buckwalter, a lexicographer, is a system for "practical storage, display and email transmission of Arabic text in environments where the display of genuine Arabic characters is not possible or convenient". It avoids special characters and can be used quite simply by anyone with a knowledge of Arabic because the Roman equivalents of the Arabic letters are easy to remember. For details of the Buckwalter System see the encoding chart.

There is a lot of disagreement in the English-language media about how to spell Arabic words and names in the Roman alphabet. Apart from variations in the spellings adopted by individual newspapers, magazines and news agencies, many of these organisations have no clear guidelines or fail to follow them consistently.

With increasing use of electronic archives, the spelling variations can make it almost impossible to retrieve all relevant articles with a reasonable degree of certainty.

Variations in spelling can also confuse readers, as well as journalists themselves, and leave them wondering whether two (or more) apparently different names refer to the same person.

The two existing standards that seem most relevant to journalism are the ALA-LC and UN guidelines (see above). Both are very similar but in some instances they resort to special characters that are impractical for media usage and would also baffle readers.

The romanisation scheme suggested below is a simplified version of the ALA-LC and UN guidelines which eliminates the need for special characters.

Suggested Romanisation for media usage

alif

tha

jim

kha

dal

dhal

zay

sin

shin

sad

dad

tah

zah

ayn

ghayn

qaf

kaf

lam

mim

nun

waw

‘ (alt+ 0145)

* when waw or ya is used as a consonant

NOTES:

Short vowels: u, a, i (e and o are unnecessary).

Long vowels: uu; aa; ii. The principle of doubling a short “a” to make a long “aa” is well established (e.g. "salaam"). Logically, it could be applied to the other vowels. Some may prefer "oo" to "uu", but "ou" could be mis-promounced as "ow". Again, "ee" may be preferred to "ii".

Diphthongs: aw, ay. Some may prefer "au" and "ai".

Doubled consonants: normally write as double; omit in the case of digraphs (gh, th, etc) for visual reasons. Doubling is not always obvious from the written Arabic; omit if uncertain.

Digraphs: to avoid ambiguity, two-letter combinations which are not digraphs but resemble them should ideally be separated by ´ (ctrl+ ', space). Example: ad´ham.

Definite article: al- (no assimilation with "sun" letters, e.g. "al-shams" not "ash-shams").

Capitalisation: capitalisation of "bin" in Arabic names would logically follow in-house capitalisation rules for "von" (German) or "du" (French). Logically, "abu" and "abd al-" require the same treatment as "bin".

ta marbouta: the options currently in use are: a, at, ah, eh, et. Readers' views on the acceptability or otherwise of these options are welcome.

jim: in a colloquial context, g can replace j where that is the normal pronunciation.

qaf: in a colloquial context, g can replace q where that is the normal pronunciation

hamzah: may be omitted at beginning of word; elsewhere use apostrophe ’ (alt+ 0146).

The point of setting a standard is to apply it universally, or at least to make as few exceptions as possible. However, this is difficult to achieve with Arabic words because so many mis-transliterations have entered common usage. It is suggested that the guidelines may be waived in the following circumstances:

1. Names, where a person or organisation has clearly indicated a preferred spelling.

3. Religious terms, where a particular spelling has been adopted locally by believers.