Siwi and Kabyle: same language family, but not same language

Just back from a nice evening with the Siwi community of Qatar. A Kabyle friend came along (hello if you're reading this!), giving me a chance to see first-hand to what extent Siwi and Kabyle are mutually comprehensible. The answer is: very little indeed. Looking through basic vocabulary it's not hard to find cognates; but when it comes to even short sentences, mystified expressions on both sides were the order of the day. The Berber languages of Algeria and Morocco may shade into one another to some extent, even across sub-family boundaries - there seem to be dialects for which it is difficult to decide whether they should be called Kabyle or Chaoui, for example. But by the time you get to Siwa, it's quite clear that you're dealing with a different language, even by Arabic speakers' rather generous standards. Further confirmation, if any was needed, that Berber is a language family, not a language.

Songhay and Nilo-Saharan

Following up on the preceding post, I've been looking at Greenberg's (1966) Nilo-Saharan comparisons - specifically, the 29 ones involving Songhay that have reflexes in Kwarandzyey, the Songhay language least likely to be involved in recent contact with Nilo-Saharan. Of these, 20 have comparanda in Saharan (Kanuri/Kanembu + Teda/Daza + Berti + Beria/Zaghawa), 17 in Eastern Sudanic (Nubian, Nilotic, Surmic, etc.), vs. a maximum of 13 for any other branch. (At least 7 also have plausible Mande comparisons.) Now, Saharan only consists of about 4 languages (9 by Ethnologue standards.) For Eastern Sudanic, excluding Kuliak, the Ethnologue counts 103 languages, and a huge amount of internal diversity. If Songhay were equally distant from the whole of Nilo-Saharan, you would expect far more cognates with Eastern Sudanic than with Saharan; the figures suggest that the link (whatever its nature) is primarily with Saharan, and only secondarily, if at all, with the rest of the languages he classified as Nilo-Saharan.

The grammatical comparisons that Greenberg offers are interesting but not compelling; there are only 10 of them (only 4 with Kwarandzyey reflexes), and they often incorporate misrepresentations (as Lacroix noted, for example, -ma forms verbal nouns, not relatives/adjectives, and 1sg ay < *agay, reducing the similarity to forms like Zaghawa ai.) Some of the lexical ones, however, are rather good; similarities such as Koyraboro Senni kokoši “scale (of fish)” = Manga Kanuri kàskàsí “scale (of fish)” cry out for explanation, and, though quite rare, look sufficiently numerous that chance seems unlikely. But whether they should be explained by contact or borrowing remains unclear. Either scenario would be historically interesting, since at present rather a large expanse of Tuareg and Hausa-speaking land separates Songhay from even Kanuri, and Saharan originated closer to modern-day Darfur than to Lake Chad.

Arabic loanwords in "proto-Nilo-Saharan"

Ehret 2001 (or see looks at first sight like an astonishingly detailed reconstruction of Nilo-Saharan, with nice binary splits and loads of technology-related words for archeologists and anthropologists to sink their teeth into. Why shouldn't specialists take advantage of this amazing opportunity to correlate historical developments to linguistic ones?

I just found a handy answer to that question. Bender (1997:175ff) gives the 15 cognate sets in Ehret 2001 that are represented in the most sub-families of Nilo-Saharan. 3 of the 15 look distinctly like Arabic loans.

1387 *wàs “to grow large”: Fur wassiye “wide” and Songhay wásà “to be wide” are both from Arabic wāsi`- واسع. The other items cited – Ik “stand”, Kanuri “yawn”, Kunama “increase, augment”, and Uduk “to tassel, of corn” – are scarcely obvious candidates for being related to one another in the first place.

1297 *là:l “to call out (to someone)”: Kanuri làn “to abuse, curse” and Songhay láalí “to curse” are obviously from Arabic la`an- لعن; Kunama lal- “to denigrate” might be from the same source. That only leaves Uduk “to persuade, incite to do something” and Proto-Central-Sudanic “to call out”.

718 *t̪íwm “to finish, complete”: almost certainly Songhay tímmè “to be finished”, very likely Uduk t̪ím “to finish”, Ocolo t̪um “to finish”, and maybe even Fur time “total”, are from Arabic tamm- تمّ (impf. -timm-), as Bender (ibid:177) considers probable. That leaves Proto-Central-Sudanic, Kunama, and Maba “all”, Kanuri “ideophone of dying animal” (!), and Proto-Kuliak “buttocks”. The “all” set looks rather promising – the whole etymology, not so much.

There are plenty of other Arabic loanwords in Ehret's “Proto-Nilo-Saharan” – a particularly egregious example is Kanuri zàmzàmíyɑ̀ “leather bottle-shaped water vessel for journeys” (#1223 *zɛ̀m “to become damp, moist”), and other especially clear-cut cases include #1173 < sawṭ, #1185 < šamm – but the fact that they include a significant proportion of the best cognate sets is what really strikes me. If a reconstruction attempt can't distinguish a widely distributed recent loan from a cognate set that split more than eleven thousand years ago, any information it gives about readily diffused items like technologies is completely unreliable. For another review from a similar perspective, try Blench 2000 (not sure why it appeared a year before the book's nominal publication date...)

The more I read about Nilo-Saharan, the less convinced I am that it exists (much less that Songhay belongs to it.) That means the classification of the languages of quite a lot of Africa is basically up for grabs. It would be great to have a reexamination of the area.

Why would "qaswarah" be claimed to be Ethiopic?

In the Qur'ān, 74:51, an interesting word occurs:

{ كَأَنَّهُمْ حُمُرٌ مُّسْتَنفِرَةٌ } * { فَرَّتْ مِن قَسْوَرَةٍ }
ka'annahum ħumurun mustanfirah * farrat min qaswarah
As if they were wild donkeys. Fleeing from a Qaswarah.

This tends to be rendered as "lion" in English, but the early commentators indicate that that is only one of several possible meanings of the word. al-Ṭabari (d. 310 AH), gives four (all supported by chains of transmitters whose reliability I am not competent to judge): الرماة archers, القُنَّاص hunters, جماعة الرجال a group of men, الأسد a lion. The point of interest here is that two of these explanations are supported by allusions to Ethiopic:
حدثنا هناد بن السريّ، قال: ثنا أبو الأحوص، عن سِماك، عن عكرِمة، في قوله: { فَرَّتْ مِنْ قَسْوَرَةٍ } قال: القسورة: الرماة، فقال رجل لعكرِمة: هو الأسد بلسان الحبشة، فقال عكرِمة: اسم الأسد بلسان الحبشة عنبسة.

...[`Ikrimah] said: "al-qaswarah is archers." Then a man told `Ikrimah: "It is 'lion' in the language of the Ḥabashah (Ethiopians)." Ikrimah said: "The name of the lion in the language of the Ḥabashah is `anbasah."

حدثني محمد بن خالد بن خداش، قال ثني سلم بن قتيبة، قال: ثنا حماد بن سلمة، عن عليّ بن زيد، عن يوسف بن مهران عن ابن عباس أنه سُئل عن قوله: { فَرَّتْ مِنْ قَسْوَرَةٍ } قال: هو بالعربية: الأسد، وبالفارسية: شار، وبالنبطية: أريا، وبالحبشية: قسورة.

...[Ibn `Abbās] said: It is 'asad (lion) in Arabic, and in Persian šēr (شير), and in Nabataean 'aryā (ܐܪܝܐ), and in Ethiopic: qaswarah.
The thing is, it looks like `Ikrimah was right: in Ethiopic, "lion" is indeed `anbasā (ዐንበባ), and no Ethiopic word qaswarah has been found. Qaswarah is most likely an originally Arabic word. But these were intelligent people, and the saying attributed to Ibn `Abbās above is obviously right about Persian and Nabataean; why would they say that qaswarah was the Ethiopic word for "lion" if it wasn't? One obvious possibility is that they were referring to another language of the Ethiopia region. This cannot be ruled out, since many languages of the area have no doubt gone extinct without documentation since then; but it looks as though the words for "lion" in Somali, Oromo, Beja, Agaw, Sidamo, Nubian, Nara, and Kunama are rather different. One might momentarily be tempted to think of Berber, cp. Nafusi war, but that's certainly not long enough.

Could the idea that qaswarah is "lion" in Ethiopic have derived from a misreading of `anbasa at some point? That certainly wouldn't be plausible in Arabic. It doesn't look all that plausible in Ethiopic either: ዐንበባ doesn't look all that similar to ቀስወራ. But there is another alphabet that might conceivably have been involved: the musnad, the Old South Arabian letters that continued to be used in Yemen into the Islamic period. In this alphabet, ` ع is quite similar to q ق, and n to s. The other two letters are rather less similar, but I can imagine b plus the right side of s being miscopied as w, and the remainder of s being reinterpreted as r. Here's roughly how the two words (qswr on the left, `nbs on the right) would have looked (ignoring the possibility of a final feminine -t):

Suppose this is right. Why then would someone at the time have learned an Ethiopic word from a text written in the musnad, rather than by asking an Ethiopian? Histories and travelogues are both genres attested in the Middle East of the time, and might have found occasion to mention in passing the Ethiopian word for "lion", given its cultural importance (it is a common theme in Aksumite art, and in later Ethiopia was adopted as a royal title.) Some Yemeni scholar who's never been to Ethiopia reads a miscopied version of such a history, thinks: ah, this must be the same word as in the Qur'ān, and goes on to tell everyone he knows, including (if the attribution is correct) Ibn `Abbās.

But there's a difficulty here: all that's ever been discovered in the musnad is stone inscriptions and occasional letters. No books have survived at all, much less histories or travelogues. And if there were books, you would think they would be written in the cursive script used in the letters, rather than the monumental script of the inscriptions - which reduces the similarity of the two words even more (see the table on p. 13 of History of the Arabic Script for cursive forms.)

On the other hand - anyone have a better idea?

Ibn Hazm again, and Cypriot Arabic

I just found a full translation online of the fifth chapter of Ibn Hazm's 11th-century work Iħkām fī Uṣūl al-Aħkām, discussed previously - a chapter remarkable for anticipating the ideas of a language instinct and of conlanging, and for clearly stating the relationship between Arabic, Hebrew, and Syriac. Enjoy! The Origins of Language: Divine Providence or Human Codification.

Not long before Ibn Hazm's time, some Arabic-speaking Maronites fled the Levant for Cyprus. In the village of Kormakiti, they have kept their language up to the present. YouTube being what it is, you can hear some on a program called Sanna (ie لساننا - our language) - go straight to 2:40, 5:00, 7:04 to hear the language itself. (Ignore the video's ill-informed claims that this is descended from Aramaic, by the way.) If you speak Greek, there are even lessons at Hki Fi Sanna. This is far more incomprehensible to me than any mainstream Arabic dialect I've ever heard, including the Levantine Arabic from which it presumably derives - a remarkable case study in how much isolation from related varieties speeds up language differentiation.

BBC Berber report

A couple of people have forwarded me this BBC article: Trail-blazing for Morocco's Berber speakers. It's a rare instance of Anglophone media noticing North African developments - in this case, the gradual establishment of Berber as a subject in Morocco's educational system. The phenomenon is rather interesting, and their efforts to create a common Tamazight "Fusha" would be a great subject for debate. But this article, sadly, is a pretty poor effort. Some of the errors of fact:

"previously oral-only language": Berber has been written, on and off, for 2500 years or more. The biggest single source of surviving Berber manuscripts (in the Arabic script) is southern Morocco. While Arabic has been - and still is - the main language of literacy for Berber speakers, Berber has not been "oral-only" in Morocco for millennia.

"an alphabet based partly on the mystical signs and symbols of the Tuareg found inscribed on tombs and monuments" - the Tifinagh characters of the Tuareg, on which Moroccan Neo-Tifinagh is based, are not "mystical signs and symbols", they're a perfectly normal consonantal alphabet, used mainly for graffiti and short letters.

"Berbers, until recently excluded from jobs in education and government": no. Their language has been excluded from both, but Berbers have held posts in both positions for as long as Morocco has existed. (The first prime minister of independent Morocco, Mbarek Bekkai, is one of many examples.) Negative attitudes towards Berber language and culture can disadvantage Berbers, but a statement like this one is frankly dishonest.

"young Moroccans either listen to Western music, or to rap in Amazigh" - I won't swear this is wrong, but that sure isn't the impression I got last time I was in Morocco. As far as I could tell, most popular Moroccan Berber music is not rap (thankfully), and certainly much (probably most) Moroccan popular music - including rap - is in Arabic.

Also, they quote Abdallah Aourik saying "Most Moroccans grow up speaking Berber" - this is possible, but is probably no longer true. Most recent-ish estimates on Berber speakers for Morocco (like within the past 50 years) hover around a third. (Wikipedia, for once giving reasonable references.)

For a more opinionated/less polite takedown, try Lounsbury. I guess the lesson is the usual one that the past decade has really drummed in: treat all reporting with scepticism.

Child language acquisition and constructions

A memorable line from a talk by Ewa Dabrowska that I went to recently:

"It is generally agreed that the representations assumed by generative theories cannot be learned from the input. For generative linguists, this fact is a fundamental premise of arguments for the innateness of at least some aspects of these representations: since they cannot have been learned from the input, they must be available a priori. An alternative conclusion, of course, is that we need a better theory - one that does not assume representations that are unlearnable."

Her answer is construction grammar: kids first learn individual low-level constructions like "What's ___ doing?" as unanalysed units, and only later come up with higher-level schemas of which these constructions are special cases (the next stage in this case would be "What's ___ ___ing?") Judging from the evidence she presented, showing that the vast majority of a 3 year old's utterances could be accounted for solely on the basis of simple substitutions within sentences they are known to have already heard, "children's [linguistic] creativity seems to involve superimposing and juxtaposing memorised chunks." This view of language more or less inverts the usual grammarian's perspective: the most general rules are developed only after specific cases have been learned, and the specific cases presumably continue to be stored independently. It strikes me as a rather promising way of thinking about historical syntax.

The Piraha discussion continues

Via Language Log/John Cowan: Dan Everett's finally gotten around to publishing a few more examples of his claims about Piraha - notably, that they have no recursion, and in particular no subordinate clauses Even quoted speech and conditionals, he claims, are not embedded. Here it is: Pirahã culture and grammar: A response to some criticisms.

Now, recursion means being able to embed a given kind of phrase within another example of the same kind of phrase, as many times as you want. In "the door of the house", one noun phrase ("the door") is embedded within another one ("the door of the house"); in "I will visit you when it stops raining", a clause "it stops raining" is embedded within a larger one ("I will visit you when it stops raining"). You can also keep doing this ("the edge of the handle of the door of the house", "I will visit you when I know whether Khaled said that James is right about the forecast that it will rain tomorrow.") In Piraha, Everett reports that for noun phrases you can only do this once (no more than one possessor), and for clauses that you can't do it at all (he insists that all the examples that look like subordinate or adverbial clauses are actually separate sentences whose linkage is left for the listener to interpret, and in this paper presents some arguments for this.)

The thing is, a language with such properties has obvious potential to be expanded into a language like English or Arabic. For possessors, all it would take is a little analogical expansion - that's what allows us to interpret a phrase like "my brother's wife's cousin's friend's cat's teeth" as grammatical, even though you may well never have heard a noun phrase with six possessors before. For subordinate clauses, all it would take is grammaticalising some kind of erstwhile adverb or intonation pattern or quotative marker into a signal that these two clauses are more closely bound than others; such changes occur all the time in languages that already have subordinate clauses (eg "with what" > "in order to" in Algerian Arabic.) If the Piraha haven't done this, then why not? If they used to speak a language with multiple possessors and subordinate clauses in the past, why and how did they abandon these features - and if they never have, then why have most languages gained these features? In short, what motivates the expansion of grammar, and how does it happen?

One place (doubtless not the only one) where I think you can see expansion of grammar in action is technical terminology; consider mathematics. "The set of all p/q such that q!=0 and p, q are integers" is perfectly clear mathematical English, but is rather unlikely to be heard in everyday English (? "the set of all couples such that the husband is not an accountant and both the husband and wife are from Belgium"). The needs of mathematical communication have motivated the use of a kind of relative clause, with a complementiser and neither a gap nor a resumptive pronoun nor a relative pronoun, which is at best marginal in normal English; if enough people were trained as mathematicians, it might get used more widely. Maybe multiple possessors and subordinate clauses are technical features to cope with the demands of socialising with large numbers of people. Or maybe Piraha has a little more embedding than Everett reports. Speculation is fun, but a nice big, searchable, publicly available corpus would be a lot more convincing.

More on Nile Valley Berber [?]

I finally got around to borrowing Bechhaus-Gerst's Sprachwandel durch Sprachkontakt am Beispiel des Nubischen in Niltal. It's tough going because I don't really speak German, but she briefly suggests (p. 37) that the C-Group Culture of 2200 BC-1500 BC in lower Nubia, known as Temehu to the Egyptians, were Berbers (referencing Behrens 1984/5), and that Nobiin-speaking Nubians came in about 1500 BC and replaced them. This would explain the possible Berber loanwords in Nobiin, notably aman "water". Apparently, the archeology shows a change of cultures and of body types around 1500 BC, and ancient Egyptian paintings first begin depicting their southern neighbours as black around this period, while the Egyptian loanwords in Nobiin seem to date to the New Kingdom or later.

The identification of the Temehu with the Berbers is not based on linguistic evidence, as far as I know, and the small inventory of possible Berber loans in Nubian is neither conclusively established nor necessarily dates from as early as 1500 BC. So I don't know how much confidence to put in this scenario. However, it points to an interesting avenue for studies of Berber to explore. A lot of evidence suggests that Afroasiatic originated further east than North Africa, so it would make sense for there to have been Berber speakers in the Nile Valley - that could even be where Berber spread from in the first place. I previously discussed this issue in The Berbers of Southern Egypt.

The book is interesting for other reasons, incidentally - if her scenario for the development of Kenzi/Dongolawi is correct, it has borrowed an astonishing amount of grammatical material from Nobiin.

Behrens, P. 1984/5. "Wanderungsbewegungen und Sprache der frühen saharanischen Viehzüchter", SUGIA 6:135-216.

Open to interpretation

Songhay's lexical economy - the way it keeps its lexicon rather smaller than its neighbours' by using a single word to fulfill the functions of what in most languages would be several different words - has attracted the attention of several of those who have written about the language from the 1850s onwards. While Kwarandzyey (Korandje) is so full of Berber and Arabic loanwords that the size issue probably no longer applies, it still has many striking examples of polysemy. Take "open", for example.

fya (from Songhay *feeri) is best translated as "open" (its commonest sense). Of course, to open one's mouth can be to start eating - hence the frozen compound fya-mmi "open-mouth" means "breakfast". But opening is also what you do to release something from an enclosed space; hence to "open water (for something)" (fya iri), or just "open", is to irrigate, and to "open for an animal or person" is to release them. Likewise, to "open a rope (for something)" is to untie it. To release something from your grasp is to let it fall - hence to "open for something" is also to drop it. And for a man to release his wife from her obligations towards him is to end the marriage - hence to "open for a woman" is to divorce her.

We can map the connections between these easily enough, making it clear that they form a coherent network of meaning:

breakfast untie
\ / \
open - release
\ / \
irrigate divorce

But not only will any single English translation applied literally and consistently yield ludicrous results for at least some of these cases - translating it differently in different circumstances will force you to choose a single meaning in cases where the text is ambiguous. "He opened for the woman" probably means he divorced her, but in principle it could mean he released her (eg from prison), or untied her, or (literally) dropped her; in fact, since Songhay has no gender distinctions in pronouns, it should even be able to mean "It (eg an automatic door) opened for her". And of course, this kind of ambiguity can be deliberately exploited for effect, as in puns.

In Kwarandzyey, this is never likely to cause serious ambiguity - the language is almost never written down, and it's a small enough community that the context is usually known to everyone anyway. But imagine worrying about this kind of thing in a millennia-old text in a language that no one today speaks natively, and you can really see why even the most literal translation of such a text is unavoidably an act of interpretation.

Why dead snakes are like clothes

What would you say if, in some science-fiction novel, you read of a language where the situations that in English would be described as "The clothes blew down from the clothesline", "Push that dead snake away with a stick", and "I see where he's carrying the rabbits he killed hung from his belt" were all naturally expressed with the same root, plus nothing more than different affixes? What about "I slammed together the hunks of clay I held in either hand", "I slung away the rotten tomatoes, sluicing them off the pan they were in", and "I picked up in my mouth the already chewed gum from where it was stuck on the table"? My inclination would have been to dismiss it as a neat but implausible idea, placing some strain on the reader's suspension of disbelief. But - until no more than thirty years ago - such a language existed right in California. Go to Part III of Leonard Talmy's dissertation Semantic Structures in English and Atsugewi to get the data; here's a slightly less surprising example as a taster:

Subject=I, Object=3rd personfrom a linear object moving axially [with one end] non-obliquely against the FIGUREfor a small shiny spherical object to moveout of a snug enclosure/a socketfactual
I poked his eye out (with a stick.)
Subject=I, Object=3rd personfrom the mouth/interior of a person, working ingressively, acting on the FIGUREfor a small shiny spherical object to moveall about, here and there, back and forthfactual
I rolled the round candy around in my mouth.

Of course, people are people; after explanation, the similarities are easy enough to make out, and presumably given enough time anyone can learn to look at a situation and decompose it into elements like these, rather than the elements that "leap out" at an English speaker. In fact, I suspect that having to learn to see things the way the people you talk to do is one of the subtler drivers behind contact-induced language change. But cases like this provoke thought: just how much can the attributes of a situation most relevant to formulating a sentence vary from language to language?

Eastern Berber vocabularies on Google Books

Some digitised Eastern Berber vocabularies from the first half of the 18th 19th century for your perusal, if you're into that sort of thing. I was particularly impressed to find a Sokna vocabulary - I haven't yet read any other source on that language, though admittedly I haven't looked that hard.

* Lyon's vocabulary of the Berber of Sokna, from 1820
* Hornemann's vocabulary of Siwi, from 1798 (at my homepage)
* Caillaud's vocabulary of Siwi, from 1826
* Minutoli's vocabulary of Siwi, from 1827
* Koenig's vocabulary of Siwi, from 1839 (lots of other vocabularies in here - Somali, for example, and Nubian and even Fur)

Some Zenaga (Mauritanian Berber) words

Zenaga is the barely surviving Berber language of southwestern Mauritania around Boutilimit. Here are a few words I think are found only in Zenaga (and in some cases Tetserret), all from Taine-Cheikh. Unfortunately, I haven't found any really comprehensive dictionaries of (for example) Tashelhit, so I could well be wrong. If I am (as I was with agwəḍ), I'd love to hear it!

  • ämkän "young herd animal (eg sheep, goat)" - p. 308
  • ārwiy "scorpion" (< *arwəl) - p. 452
  • täygaḌ "young she-goat" (< *talgaḍ) - p. 577
  • agaḏ̣iy "Moor, bidani (white man)" - p. 181
  • täššänḍuḌ "mirror" - p. 129
  • taʔgaṛḏ̣aS "paper". (Other varieties have similar forms, but without any final s.) - p. 24
  • tämärwuS "bride" (Ahaggar Tuareg has rwəs "to be in rut" - obviously related, but not quite the same sense!) - p. 451

French among Algeria's elite

The key issue in Algerian linguistic politics - substantially overshadowing the question of the role of Berber - is what should be the language of bureaucracy and education: Standard Arabic (the official language, and the primary pre-colonial language of literacy for all Algeria) or French (the colonial language, and hence ironically the language which most of the few educated Algerians at independence had studied in.) In practice, it's settled on the one setup most certain to minimise social mobility: Standard Arabic is the primary language of education and symbolism, and French of bureaucracy and social climbing. On top of that, the language of everyday life is Algerian Arabic or Berber, from either of which reaching fluency even in Standard Arabic, let alone the much more different language French, is an uphill struggle.

I recently came across a very illustrative quote from a survey specifically focusing on minor political actors in Algeria - party cadres, journalists, bureaucrats, businessmen, trade unionists, etc:
"To a limited extent, the only space open to [political] actors with little or no knowledge of French were independent unions, independent NGOs, the Arabic press and Islamist parties. This tendency was illustrated by the fact that third-generation elites barely speaking French - only one out of ten interviewees - came from one of these domains. Most other interviewees were either Francophone or bilingual, the latter having difficulties determining which language they considered to be their mother tongue [a footnote suggests she means "primary language"]. The same interviewee often gave different answers depending on whether he filled in this author's questionnaire prior to the interview, or whether he was asked in the course of an interview what language he felt most comfortable speaking and writing. A huge majority of the third-generation interviewees according to their own assessment were better with written French than Standard Arabic. As far as oral skills went, a third of the interviewees said they spoke Standard Arabic as well as or better than French. Over half the interviewees put their oral French skills at the same level as their command of Algerian Arabic or Kabyle Berber dialect, and one out ten claimed to speak French better than anything else." (Isabelle Werenfels, Managing Instability in Algeria, pp. 85-6)
This kind of situation is a recipe for resentment. The government has spent years educating people to be better at Standard Arabic and telling them that it was everyone's duty to use it rather than French; but unfortunately their passion for reform, after creating legions of eager Standard Arabic-using job-seekers, stopped at the gates of the Civil Service. Check out Algerian government websites sometime - many of them don't so much as have Arabic versions (eg Energy, Health, CNRC Finance), and most default to French.

As always, I think language skills should be a barrier only when they're necessary in themselves, not merely as a badge of class membership (and regionalism - people from Algiers or Kabylie are enormously more likely to speak good French than people from, say, the Sahara.) I'd certainly prefer Standard Arabic to French - it's much more like Algerian Arabic than French is, and more a part of Algeria's identity - but in the long run it would be better to create a situation where people could use their own mother tongue for official purposes.

Healed by the right words

We all know that placebos can be surprisingly effective. But - though it's not exactly surprising - I hadn't realised that there is experimental evidence that simply saying the right thing can have a curative effect.

Two hundred patients with abnormal symptoms, but no signs of any concrete medical diagnosis, were divided randomly into two groups. The patients in one group were told "I cannot be certain what is the matter with you", and two weeks later only 39% were better"; the other group were given a firm diagnosis, with no messing about, and confidently told they would be better within a few weeks. 64% of that group got better in two weeks." (Bad Science, p. 75, citing Thomas 1987)

I can imagine a lot of factors that could affect the effectiveness of the doctor's words here - mainly anthropological, but some of them would certainly fall within the domain of linguistics. For example, the intonation pattern will affect the patient's perception of the doctor's confidence; does that affect the efficacy? Likewise, the accent and the choice of vocabulary could both affect comprehension and perceived competence, and hence presumably the efficacy. Not really my field, but it could be a line of research with unusually clear-cut potential benefits. The obvious problem with this example is that it involves doctors lying to patients, but if the effect could be reproduced without that it would certainly be worth doing.

Thomas KB. General practice consultations: is there any point in being positive? BMJ (Clin Res ed) (9 May 1987); 294 (6581): 1200-2.

"Political complexity predicts the spread of ethnolinguistic groups"

An interesting paper: Political complexity predicts the spread of ethnolinguistic groups. Two basically unsurprising claims that it's good to have calculations supporting: "pastoralists were found to have larger language areas than agriculturalists" and "languages associated with more politically complex societies cover significantly larger areas than those of less complex societies". They also present arguments that "although regions of high biological and cultural diversity do overlap to a striking degree, it is unlikely that biological diversity has any direct effect on cultural diversity on a global scale." Surprisingly, mountainousness was found to correlate with larger language areas, not smaller ones - seems a little suspicious that, though some mountainous areas are pretty un-diverse. Flaws: well, it relies on Ethnologue data and GMI maps, both of which are often unreliable, and systematically more splittist in some areas than in others; but it's not obvious that that would substantially affect the result. Also, ethnic groups, languages, and political units very often don't match up, and their measure of political complexity is based on data for ethnic groups rather than for languages.

(Via GNXP.)

A Fulani village in Algeria

Anyone acquainted with West African history will be aware of the remarkable extent of the Fulani diaspora, stretching from their original homeland in Senegal all the way to Sudan. However, I was surprised to read the following note in a history of the Tidikelt region of southern Algeria (around In-Salah):
"Le village actuel de Sahel a été créé en 1779 par Sidi Abd el Malek des Foullanes, venu à Akabli dans l'intention de se joindre à une pèlerinage, dont le départ n'eut pas lieu... Les Foullanes sont des Arabes originaires du Macena (Soudan); il y a encore des Foullanes au Sokoto; Si Hamza, le cadi d'Akabli appartient à cette tribu." (L. Voinot, Le Tidikelt, Oran:Fouque 1909, p. 63)
(The current village of Sahel was created in 1779 by Sidi Abd el Malek of the Fulani, who had come to Akabli with the intention of joining a pilgrimage whose departure never occurred... The Fulani are Arabs originating from Macina (Sudan [modern-day Mali]); there are still Fulani at Sokoto; Si Hamza, the qaid of Akabli, belongs to this tribe.)

I very much doubt there would be any traces of the language left - even assuming that Sidi Abd el Malek came with a large enough entourage to make a difference - but wouldn't it be interesting to check?

How many words are there in a language?

In a recent discussion, the question came up of whether a language's vocabulary could be tallied (briefly addressed at Language Log a while back, and at FEL.) I have no firm answer to that (and it's logically independent of whether or not you can estimate the proportion of the vocabulary coming from a given language - that's a sampling problem.) But, notwithstanding the bizarre if occasionally entertaining acrimony of that discussion, it's actually a rather interesting question.

Clearly, any given speaker of a language - and hence any finite set of speakers - can know only a finite number of morphemes, even if you include proper names, nonce borrowings, etc. ("Words" is a different matter - if you choose to define compounds as words, some languages in principle have productive systems defining potentially infinitely many words. The technical vocabulary of chemists in English is one such case, if I recall rightly.) Equally clearly, it's practically impossible to be sure that you've enumerated all the morphemes known by even a single speaker, let alone a whole community; even if you trust (say) the OED to have done that for some subset of English speakers (which you probably shouldn't), you're certainly not likely to find any dictionary that comprehensive for most languages. Does that mean you can't count them?

Not necessarily. You don't always have to enumerate things to estimate how many of them there are, any more than a biologist has to count every single earthworm to come up with an earthworm population estimate. Here's one quick and dirty method off the top of my head (obviously indebted to Mandelbrot's discussion of coastline measurement):
  • Get a nice big corpus representative of the speech community in question. ("Representative" is a difficult problem right there, but let's assume for the sake of argument that it can be done.)
  • Find the lexicon size required to account for the 1st page, then the first 2 pages, then the first 3, and so on.
  • Graph the lexicon size for the first n pages against n.
  • Find a model that fits the observed distribution.
  • See what the limit as n tends to infinity of the lexicon size, if any, would be according to this model.

A bit of Googling reveals that this rather simplistic idea is not original. On p. 20 of An Introduction to Lexical Statistics, you can see just such a graph. An article behind a pay wall (Fan 2006) has an abstract indicating that for large enough corpora you get a power law.

But if it's a power law, then (since the power obviously has to be positive) that would predict no limit as n tends to infinity. How can that be, if, for the reasons discussed above, the lexicon of any finite group of speakers must be finite? My first reaction was that that would mean the model must be inapplicable for sufficiently large corpus sizes. But actually, it doesn't imply that necessarily: any finite group of speakers can also only generate a finite corpus. If the lexicon size tends to infinity as the corpus size does, then that just means your model predicts that, if they could talk for infinitely long, your speaker community would eventually make up infinitely many new morphemes - which might in some sense be a true counterfactual, but wouldn't help you estimate what the speakers actually know at any given time. In that case, we're back to the drawing board: you could substitute in a corpus size corresponding to the estimated number of morphemes that all speakers in a given generation would use in their lifetimes, but you're not going to be able to estimate that with much precision.

The main application for a lexicon size estimate - let's face it - is for language chauvinists to be able to boast about how "ours is bigger than yours". Does this result dash their hopes? Not necessarily! If the vocabulary growth curve for Language A turns out to increase faster with corpus size than the vocabulary growth curve for Language B, then for any large enough comparable pair of samples, the Language A sample will normally have a bigger vocabulary than the Language B one, and speakers of Language A can assuage their insecurities with the knowledge that, in this sense, Language A's vocabulary is larger than Language B's, even if no finite estimate is available for either of them. Of course, the number of morphemes in a language says nothing about its expressive power anyway - a language with a separate morpheme for "not to know", like ancient Egyptian, has a morpheme for which English has no equivalent morpheme, but that doesn't let it express anything English can't - but that's a separate issue.

OK, that's enough musing for tonight. Over to you, if you like this sort of thing.

Houhou yentakheb rouhou

(Warning: this post contains no significant linguistic content.)

The results are in: Bouteflika has been “re-elected” as President of Algeria with a staggering 90.24% of votes cast. According to Government figures, 74.54% of eligible voters voted (although oddly enough, the polling booths looked deserted in all the main towns.) He had already served two terms, which had been the limit, so, to let himself run for re-election, he had had the constitution changed shortly beforehand. I would start mocking the guy, but why bother? With figures like that, he's making a fool of himself with no help from me. Time was when he was willing to settle for figures that naive observers might be capable of taking seriously; as he turns senile either his intelligence or his capacity for shame must be declining. The best measure of the glory of his achievements is the 50% of Algerian youths who intend to try to leave the country.

In case you were wondering how this result was achieved, here's my best somewhat informed guess: In the countryside, especially in areas like the Sahara where tribalism is still present, the local patriarchs simply tell everyone to vote en masse for the President, on the basis that he will stay in power no matter what they do and a conspicuous display of loyalty will earn them government investment (although even that wouldn't be enough to produce things like the 97% turnout in Tissemsilt without further fraud.) In the cities or the larger towns of the north, practically nobody bothers to vote apart from people on government payrolls, so they simply exaggerate the participation figures. In Kabylie, uniquely, we have a largely rural, somewhat tribal region fed up enough with the government that even the villages have organised themselves to refuse it legitimacy, so conspicuously that even government figures acknowledge a much lower turnout. If we assume that the government figures are broadly accurate regarding relative turnout (though certainly not absolute), then the situation shows up in the negative slope on this plot of population against turnout (participation); the two 30% wilayas are Tizi-Ouzou and Bejaia, the main Kabyle regions.

Another post on this worth looking at: Victory over the People.

When goals create blind spots

You're watching a ball game attentively. A person in a gorilla suit walks right through the middle, remaining visible for 5 seconds. Can you imagine not noticing the gorilla guy? Well, it turns out that nearly half of all people undergraduate volunteers don't, if they're busy trying to count passes - and the authors of that study cite 7 other experiments confirming the same principle.

It strikes me that there's a lesson there for linguists. Often linguists study a language for a specific theoretical goal - looking at Malagasy primarily to see what VOS syntax is like, or Oneida primarily to learn how polysynthesis works, or Songhay primarily to see whether it's related to Nilo-Saharan or not. That's fair enough; no one can focus on everything at once. But we can miss some really interesting stuff by focusing on one aspect of the language to the exclusion of others. For example, when Laoust studied Siwi, he was interested almost exclusively in its Berber origins - and as a result, his generally excellent study somehow ignored the vowels e and o (which are found even in Berber words, but are not phonemic in the Moroccan Berber varieties he was more familiar with), and mistakenly attributed the Arabic elements of Siwi to the adjacent Bedouin dialects, when in fact they show some very distinctive non-Bedouin characteristics. This is something we all need to watch out for.

Flora of the Central Sahara and elsewhere

Ever found yourself trying to sort out a plant name you've elicited, not knowing any botany worth mentioning? Well, it turns out the botanists are a step ahead of the linguists on the digital libraries game, at least in Spain: the Digital Library del Real Jardín Botánico CSIC has a pretty remarkable array of books to browse online. The one that just saved my etymology of the Kwarandzyey plant name tsifəṛfəẓ is Etudes sur la flore et la végétation de la Sahara centrale. Vol. III: Hoggar, which gives both Tamasheq and binomial names for each plant mentioned. Unfortunately it's clear that not all the works give translations of the names, but it's still worth a look.

On a similar note, I've found Sahara-Nature handy sometimes.

Beni-Snous: Two unrelated phonetic forms for every noun?

I got flabberghasted recently by a casual statement in Destaing (1907:212)'s grammar of the Berber dialect of Beni Snous in western Algeria (near Tlemcen). I nearly missed it as I skimmed it; see if you can spot it. (The translation is mine, as are the bits in brackets.) All the numerals above 1 are from Arabic here, but that's nothing surprising - the same is true in Tarifit, and few Berber varieties have retained the numbers above 3.
"The numbers from 2 to 9 inclusive are followed by the Berber noun in the plural [eg]:

two men ..... θnāịẹ́n ịírgǟzĕn
six women ... sttá n tsénnạ̄n

From "10" to "19" inclusive, the number is followed by the Arabic singular substantive:

eleven women ... aḥdăɛâš ĕrmra (Algerian Arabic mṛa "woman" مرة; contrast Beni Snous Berber θä́mĕṭṭūθ "woman")
fifteen cows ... ḫamstaɛâš ĕrbégra (Algerian Arabic bəgṛa "cow" بڨرة)
sixteen mares ... sttɛâš ĕrɛấuda (Algerian Arabic `əwda "mare" عودة; contrast Beni Snous Berber θáimārθ "mare")

After the number nouns "twenty, thirty, forty" etc., one uses the Arabic substantive[...]

twenty women ... ɛašrîn ĕmra
fifty mules ... ḫamsîn beγla (Algerian Arabic bəγla بغلة "mule")

a thousand rams: âlĕf kebš (Algerian Arabic kəbš كبش "ram"; contrast Beni Snous Berber išérri "ram")"

If I thought it were remotely possible for Destaing's claim to be true of counting every noun in the language - rather than, say, just the six nouns he gives appropriate examples for - I would be putting together an application to head out to Tlemcen instead of making this posting. (I might still do that anyway some time, mind you.) But for rather a lot of minority languages, all or nearly all speakers are bilingual. And if all speakers are bilingual, what in principle is there to prevent the grammar from containing a rule like this?

So I ask: have you ever come across anything similar elsewhere?

Scanned Multi-Alphabet Arabic Manuscript Online

The Princeton Digital Library of Islamic Manuscripts has put a large number of Arabic, Persian, and Turkish scanned manuscripts online. Plenty of interesting stuff there, but one that particularly stood out for me was the untitled Treatise on ancient, alchemical and magical alphabets. Behold the Omniglot of its day! (Well, it's apparently only from the 1700s, but probably a copy of an older work.) It gives tables for the supposed alphabets of each prophet, with the letter names on one page and the letter forms on the next. I'll just point you to a few of the highlights:

Knowing my readers, I suspect I'll have identifications of several of the alphabets I didn't recognise coming soon - although many, perhaps most, of them are certainly made up. Extra points for anyone who can come up with a picture of a magic bowl or something actually using one of the made-up alphabets.

Two other Arabic manuscripts there of potential interest: The conquest of Africa, from Qayrawan to Zab; Book of the Roman months.

išni: a Berber ovine, or a Songhay goat?

In Kwarandzyey (Tabelbala), the non-specific word for a sheep or goat is išni. It looks kind of Berber, and the words for different ages or sexes of sheep and goat are definitely from Berber, so I had assumed it must be Berber. But I've never found a term like it in any Berber dictionary. Maybe some reader will tell me that the word is familiar from his/her own hometown, but I just realised that there's an alternative explanation...

The word for "(female) goat" across Songhay may be reconstructed as *hìnčìnì (Nicolai 1981 gives *hìnkìnì, but in all the Songhay languages he cites except Kwarandzyey, original *k and *č both turn into the same sound before front vowels.) Nicolai 1981 gives amkkən "male goat" as the Kwarandzyey reflex of this word, but in fact (as Kossmann first pointed out to me) that turns out to be another one of the Berber etymologies that only Zenaga seems to explain: ämkän "jeune bête (tout animal de pâturage)" (Taine-Cheikh 2008). Instead, I'd like to propose that išni is the Kwarandzyey reflex.

*n is occasionally lost in Kwarandzyey (eg gwa "see" < *guna); I don't know any rule for this so far, but here it might be motivated by dissimilation. Initial *h is lost fairly commonly (at least "water", "man", "two", "three", "hunger"), so that's not necessarily a problem. Short vowels, most commonly (but not always) *i and *u, are frequently deleted, according to a rule whose conditioning I've been investigating lately. *č regularly becomes ts, but when immediately followed by a consonant regularly simplifies to s for all but some of the most conservative speakers. And s and š are not phonologically distinct (except for younger speakers, under heavy Arabic influence); the consistent use of š here would be explained by the i's flanking it. So that would yield *hìnčìnì > *inčni > *itsni > isni = išni.

Of course, if išni is attested in Berber then all this reasoning may have to be rethought - so if you speak Berber and have heard the word before, please tell me now!

Arabic (and Berber?) loanwords in southern Italy

Just came across a little monograph on Arabic and Berber loanwords in the dialects of the Basilicata (southern Italy): Sopravvivenze lessicali arabe e berbere in un'area dell'Italia meridionale, la Basilicata by Luigi Serra. Most of the loans listed are from Arabic, some quite obvious (eg taūt "coffin" < تابوت, źir "a copper or terracotta container for liquids" < زير, zammîl "big pannier with which various goods are transported on a beast of burden's back" < زنبيل), others rather less clear-cut.

Only three loans (and one placename) are claimed as from Berber. Two of them look acceptable, but all of them seem questionable, and they all refer to objects that there would have been no obvious reason to borrow terms for. It's possible that Berber influence can be found in southern Italian dialects, but this doesn't present a terribly convincing argument. Still, here they are:
  • źembr / źimbr / zimr / źimmr "billy-goat" (caprone, becco) < pan-Berber izimmər "ram", p. 39. (Looks good, but why the shift in species? - Also, see comments for an alternative Greek etymology.)
  • aččáta "big meal" (scorpacciata, mangiata, spanciata) < pan-Berber əčč "eat", p. 11. (The semantic and phonetic match are great, but the word is so short that coincidence seems hard to rule out.)
  • šéḍḍa "wing" (ala) < Zenati Berber "bird", eg Siwi ašṭiṭ, p. 26. The author mentions an alternative possibility - deriving it from Italian ascella "armpit" - that seems much more plausible.
  • Zaza (placename) < Berber azəzzu "thorny broom (plant sp.)" - not discussed in any detail (author cites Renisio), p. 41.

Tawalt closing down

Tawalt is a nine-year-old Libya-focused Amazigh/Berber website with a remarkable collection of audio recordings, sketch grammars, vocabularies, and resources for some of the least well documented Berber languages - those of Tunisia, Libya, and Egypt. It is thus rather a shame that Tawalt is shutting down - updates stopping immediately, and site to go down by the end of the year. Sure, the Wayback Machine should preserve all the texts on it - but not its remarkable audio archives (which have already disappeared from the main page.) Their plans are probably related to political problems - the site's political postings had gotten rather outspoken. If you have any interest in Berber linguistics, I suggest looking around now before it disappears...

No, Berber isn't descended from Arabic

A few days ago I got lent a copy of a recent book in Arabic by Othmane Saadi: Dictionary of the Arabic Roots of Amazigh (Berber) Words معجم الجذور العربية للكلمات الأمازيغية (البربرية) (Tripoli: Academy of Arabic Language 2007.) My reaction, in brief, is that it's unscientific jingoistic claptrap. But I happen to have friends (not linguists, of course) who take it seriously; and I am told that the author, a proud member of the Chaoui Berber Nememcha (Nmamša) tribe, genuinely believes his own theory. I will therefore try to explain as simply as possible where the book goes wrong.

His starting point is noting the existence of strong similarities between Arabic and Berber in the vocabulary and grammar (p. C: “90% of Amazigh Berber words are pure or Arabised Arabic, and the grammar of Berber agrees with the grammar of Arabic.”) This is substantially correct, and has been known for a long time (see, for example, Igor Diakonoff's Afrasian Languages, Moscow: Nauka 1988, or at a more basic level one of my first posts), except that 90% is a substantial exaggeration – many of the comparisons he puts forward are at best questionable, as will be seen below. But he claims that the explanation for these similarities is that Berber descends from Arabic. Not just Berber either, as he says on p. B: “The term Arabitic عروبية means the ancient Arabic languages which are wrongly called the Semitic languages and which branched out from the source language Arabic thousands of years ago, such as Babylonian, and Assyrian, and Akkadian, and Phoenician Canaanite, and Aramaic, and Himyaritic, and Sabaean, and Thamudic, and Lihyanite, and Ma'inic, and ancient Egyptian, and Berber, and others.” Linguists subscribe to a rather different explanation for the observed similarities: that Berber and Arabic (and all the other languages he listed, and many he doesn't list such as Hausa and Somali) are all descended from a single language, called for convenience Proto-Afroasiatic (Greenberg 1950), which was different (and probably about equally different) from any of them.

How would you choose between these two hypotheses? Well, if the original language was different from Arabic, then you would expect some original forms to have been lost in Arabic but kept in other languages. Oddly enough, Saadi himself gives evidence for exactly that: he links the Berber ur “not” to Akkadian ul (p. 12), and the Berber -as “to him/her” to Akkadian -šu (p. 12), and the Berber nəkk “I” to Ancient Egyptian ink and Akkadian 'anāku, none of which are attested in Arabic. Unless you believe that Akkadian and Berber each independently invented the same new forms, or that they are more closely related to each other than to Arabic – which Saadi (correctly) does not claim – you have to conclude that the common ancestor of Arabic and Berber included words like ur/ul for “not”, and 'anāku for “I”, and so on, and hence was different from what we know as Arabic, just as it was different from Berber.

So maybe this common ancestor was Arabic in a different sense: Saadi argues that it was originally spoken in Arabia, so Arabic would be the one language that stayed at home, and presumably got less affected by foreign influence. Unfortunately, he doesn't have much of a case. His first argument (p. 1) is frankly risible: “Europe and North Africa were covered with ice before [18000 BC], whereas the Arabian peninsula enjoyed a climate similar to that of southern Europe now. The ice melted in the former and drought hit the latter, so mankind left the Arabian peninsula and settled North Africa and southern Europe.” The quote he cites on this actually says nothing about North Africa, and for good reason: even at the last glacial maximum North Africa was never covered by ice (see map), and was if anything more habitable before 18000 BC than it is now. He also notes (p. 2) that Berber princes have long claimed Yemenite origins. Such claims are questionable for many reasons (the desire for prestige, the originally matrilineal traditions of many Berber tribes, and no pre-Islamic attestations) – but even if true, it would prove nothing about the language: people change their language all the time without changing their ancestry, as any emigrant can tell you. The rest of his argument is a hotchpotch of miscellaneous quotes which at best claim that various early North African peoples or languages or cultures originated in the Middle East; in a particularly ludicrous case, he blithely quotes Bousquet (1957) to the effect that the Berber language “came from Asia Minor” [Turkey!] None of these quotes so much as mention the Arabian peninsula.

In fact, the linguistic evidence means that Proto-Semitic may well have been spoken in Arabia and certainly was spoken in the Middle East, but the common ancestor of Berber, Egyptian, and Semitic was most likely located in Africa. You see, as noted above, these three language families are also quite closely related to Chadic (spoken mainly in Nigeria and Chad) and Cushitic (spoken around the Horn of Africa) – which means that 4 out of 5 branches of this family are native to Africa. It is more likely that one branch left Africa than that 4 branches each separately followed the same narrow path across Sinai or crossed the Red Sea. (For theoretical background, see Campbell 2004.)

In other words: whether the similarities this book gathers between Arabic and Berber are valid or not, they don't do anything to support the author's claim that Berber descends from Arabic. Do they at least have the merit of being valid comparisons? Sometimes, but not with any consistency. Many of his comparisons look rather far-fetched, eg on p. D:

taməṭṭuṯ “woman” < Ar. ṭāmiṯ طامث “menstruator”
argaz “man” < Ar. rakīza(tu l-'usrā) ركيزة الأسرى “pillar (of the family)”
ixəf “head” < Ar. xf' خفأ “appear”, because the head stands out
tadaγt “armpit” < Ar. daγdaγah دغدغة “tickling”
alγəm “camel” < Ar. luγām لغام “the foam that comes out of camels' mouths”

Many others are clearly genuine loanwords, often featuring sounds that cannot be reconstructed for Proto-Berber, though I don't think many of these are original suggestions, eg:

(p. D) axərraz “cobbler” < Ar. xaraza خرز “to sew leather”
(p. H) abrid “road” < Ar. barīd بريد (confirmed by the Tuareg pronunciation of this word, abărid)
(p. 38) ləbṣəl “onion” < Ar. baṣal بصل (Siwi happens to preserve an older word for "onion": afəllu)
(p. 78) taħzamt “belt” < Ar. ħizām حزام

A couple are known Phoenician loanwords:

(p. 57) agadir, ažadir "wall" - Ar. jidār جدار

A few are well-known Afroasiatic cognates, and scattered among them may be other valid cognates:

(p. 250) iləs “tongue” - Ar. lisān لسان
(p. 110) iđammən “blood” - Ar. dam دم
(p. 292) tiqqad “burning” - Ar. wqd وقد

But the book makes no attempt to distinguish between words taken from Arabic comparatively recently and words inherited from the common ancestor of Berber and Arabic, and seems to assume that any word found in both dialectal Arabic (Darja) and Berber must automatically be originally Arabic, rather than possibly being a borrowing from Berber into Arabic. There is a well-known technique for sorting out inherited cognates from loanwords from coincidental similarities: sound correspondences. Sounds don't usually change at random: they change systematically, just as all j's in Egyptian Arabic become g. You establish which Berber sounds normally correspond to which Arabic ones under what circumstances, based on looking at what happens in the clearest cases; that gives you a standard by which to judge the doubtful ones. Saadi has made no effort to do this, and the unfortunate result is that in his comparisons the chaff far outweighs the wheat.

Berber and Arabic both descend from the same language, but that language was neither Berber nor Arabic, and probably didn't come from Arabia - and if you want to know about that common source, then you'll learn more from the works of Diakonoff or Greenberg, or even from more problematic sources like Orel and Stolbova 1999 or Militarev's online database, than from Saadi 2007.

Endangered Languages Week

It's half-over already, but I really ought to mention: Endangered Languages Week is happening at SOAS this week, and may be of interest to readers in London.

Also, a interesting news story, a reminder that many countries still have legal restrictions on what language you can speak where: A prominent Kurdish lawmaker gave a speech in his native Kurdish in Turkey’s Parliament on Tuesday, breaking taboos and also the law in Turkey.

`baskundza igwạḍən!

I don't suppose there are more than about two or three people on earth who care, but I just figured out an etymology that's been puzzling me for a while. In Kwaṛandzyəy, the word for "genie" is agwəḍ, plural igwạḍən. It looks Berber for its form alone, but I had never found it in any dictionary - until now, going through Taine-Cheikh's new Zenaga dictionary, when I came across ugṛuđ̣an (original singular *ugṛuḍ) "démons, diables (plus dangereux, plus forts que les autres)". It turns out to have been borrowed into Hassaniya too - īgṛäwṭən. The loss of is more or less regular in Kwarandzyey (usually it's restricted to intervocalic positions, but there are a few other examples like this); so is the shortening of a long vowel to ə in a final closed syllable, with a w remaining to indicate its former quality. Quite possibly the next commenter will tell me that actually this word is well-known in Kabylie or Morocco or something, but for now it's another piece of evidence for my claim that Kwarandzyey includes a number of loanwords specifically from the Zenaga branch of Berber.

UPDATE: see comments - it wasn't the next commented, but the third one who established that this word is attested in southern Morocco too, which makes sense both since that region is also fairly close to Tabelbala and since it tends to be easier to find Zenaga cognates there than further north or east.

The Tyranny of Morphology

Coming out of an airport, you have to pick one of two exits: "Goods to Declare" or "Nothing to Declare". You have to go through one to get out; but (at least in Customs' eyes), by going through either exit, you state whether or not the contents of your luggage are legally subject to import duties. If you feel so scrupulously honest and so intensely secretive that you decide you have to leave that question unanswered - your only option is to stay inside.

Often your language does that too (Whorf said it first.) Just like the airports, the trick is to set things up in such a way that trying not to answer the question is either unacceptable (ungrammatical) or automatically interpreted as implying a particular answer. If you're talking about a friend in English, you don't have to indicate whether the friend is male or female until you refer back to the friend with "he" or "she"; in Arabic or Spanish, you have to state which it is from the start; and in Chinese or Songhay you can get away with never saying it at all. If you believe something definitely happens at some point, but don't want to say whether it's already happened or not yet, there's no simple way to say that. At best, you end up having to use cumbersome disjunctions like, if you're into apocalyptic prophecies, "The Antichrist either will be born some day or already has been"; and disjunctions like that will always be interpreted as meaning that you don't know which, not that you know but don't feel it's relevant.

In Korean (according to a talk by Peter Sells I heard today), a special verbal affix -si- (one among many, many politeness indicators) is used to indicate that the human subject of the verb (loosely speaking - it may also be a possessor of the subject, or a topic) is notionally of higher social status than the speaker. Thus:

sensayng-nim-i ka-si-ess-ta
"The teacher went."


koyangi-i ka-ess-ta
"The cat went."

The thing is, this means you can't be neutral about the subject. If you don't use this suffix with a subject that would normally take it, like "teacher" or "pastor", your listener will assume that you don't respect them so highly. You can't even get away with being ambiguous - I'm told that a disjunction of politeness levels, like *"The teacher went(honorific) or went(unmarked) away", is totally unacceptable. There are genres, such as academic writing or journalism, where politeness morphology is not normally used, allowing you to be neutral on this; but in a face-to-face conversation, as far as I understand, no such solution is available. (Any Korean readers should feel free to correct me!)

No language is likely to be able to stop you from saying what you want to say, if you try hard enough. But things like this can make it a lot harder to avoid saying what you don't necessarily want to say.

"Written in Islamic"

I don't usually do current events posts, but this one was cute enough to warrant a micro-post: egregious ex-Senator Rick Santorum declares that Muslims think that “The Quran is perfect just the way it is, that’s why it is only written in Islamic.” In most speeches, a sentence like that would be a major embarrassment; in this one, it's merely his only linguistics-related blooper.

(Via Angry Arab.)

Fusha: the Straussian choice?

I came across a review of a book called Why are the Arabs not Free? The Politics of Writing, by an Egyptian psychoanalyst. I haven't read it (nor Adonis, whom he discusses below) but the quote presents an interesting perspective on Arabic diglossia:
My understanding of the political significance of this divorce between political and demotic Arabic and the key place of writing in the perpetuation of despotism crystallised when I read the work of our great poet Adonis, entitled The Book. It is one of the most revolutionary books I've read in Arabic literature. Apart from its provocative title, it lays bare the truth of our political history as having been a series of assassinations in a struggle for power. But it's written in such a high style that it's a difficult text even for the educated, without taking into account the vast majority of illiterate folk. So, it's no wonder that The Book has remained a 'dead letter'. I may say that I once heard Adonis declare that he won't ever write except in 'grammatical' Arabic because he prefers writing in a 'dead language'. One may wonder if his choice doesn't also represent his method for dealing with the condition [the German-born American political philosopher] Leo Strauss describes in his Persecution and the Art of Writing. The authorities are happy to ignore such books because in the unlikely event that they themselves have understood them, they know that their message will only reach a very limited number of people.
A tempting hypothesis in some ways, this idea that Fusha acts to insulate the majority of the population from the debates of intellectuals, keeping the powers that be safer from ideologically-inspired opposition and the intellectuals themselves safer (in the short term!) from popular reactions to their speculations. But is the issue really that people have trouble with the language, or just don't read much? Both are true to some degree, but in an era where TV shows and news programs in standard Arabic command large audiences across the Arab world, it's not plausible to blame everything on the difficulty of the language.

Elsewhere in the article he is said to imply that giving the colloquial greater status will "reduce any feeling of powerlessness as a result of a lack of formal linguistic expertise". That seems harder to argue with, given that many (probably most) people who can understand standard Arabic fine can't put together more than a sentence or two without mistakes, and certainly can't sound as eloquent or clear or at ease in it as in their colloquial language. But then again, what power does speaking standard Arabic well actually entail, when plenty of ministers and millionaires can't? Only the power to take part in debates that seem to have remarkably little effect on the society around them?

Why do historical linguistics?

Unraveling the details of a given language family's history is painstaking, detail-oriented work - comparing hundreds or thousands of words to each other, looking through different languages' grammars, coming up with hypotheses to explain what you see and hoping the next language you look at doesn't disprove them... Why do it?

Well, for one thing, you end up showing interesting things about the history of the relevant part of the world, often things it would be hard or impossible to show any other way - that Madagascar was settled by people from Borneo, for example, or that Ijo slaves from Nigeria ended up on the Berbice River in Guyana, or that Persians and Swedes (along with a lot of other people!) ultimately both got their language from a common source. But that depends on your being interested in a particular region; why would a person working on the historical linguistics of (say) the Sahara care about the historical linguistics of New Guinea, or Alaska, or even Europe?

It's because people are pretty similar everywhere - we all have roughly the same mouths and the same brains, and as a result we all tend to make roughly the same kinds of changes. Looking at changes in the languages of Europe, and at which direction they went, turns out to give you a pretty good idea of what kind of changes to expect in New Guinea - and vice versa; wherever you go, k is much more likely to change to g than to n, and a word meaning "want" is much more likely to become a future tense marker than a word meaning "jump".

That means that all these individual small-scale studies are so many pieces fitting together to form a map of how language works. Describing a language (no mean challenge in itself) shows you one set of possibilities; typology tells you the possible states of a language; but historical linguistics relates them to one another, showing you which states are closely linked and which are not. You can't predict what will happen to a language, but you can see in advance what kind of changes are likely and what kind are unlikely.

For sounds, this map of changes - this network linking different states of a language to one another - will seem familiar; it corresponds closely to articulatory and/or auditory similarity. You can mostly account for it by knowing how different sounds are made (with the lips, the tongue, etc...) and which sounds are hardest to distinguish. The key test for a theory of syntax (as far as I'm concerned) is whether it can account similarly for the attested map of syntactic change.

Oldest Papuan writing?

What are the oldest written documents in a Papuan language (ie a non-Austronesian language of the New Guinea region?) I'm not totally sure, but a strong candidate has to be the court records of Ternate. The islands of Ternate and Tidore in eastern Indonesia speak two closely related languages belonging to the non-Austronesian North Halmaheran family. They have been writing using the Jawi Arabic script since at least the 1500s; in fact, some of the earliest surviving Malay manuscripts are letters from the sultan of Ternate from about 1521.

Recently I came across an 1890 book on Ternate online: Ternate: The Residency and its Sultanate. The book includes a brief introduction to the language and a word list; it also gives reproductions of several manuscripts whose originals date back to the mid-1800s, along with translations. So if you want to try your hand at deciphering them, or just see what a Papuan language looks like in Arabic script, have a look! The page I've linked to (Arabic interpolation de-italicised) starts:

ma-dero toma hijratu-nnabiyy ṣallī `alayhi wa-sallim nyonyohi pariama calamoi si-raturomdidi si-nyagisio si-rara, tahun alif, toma-arah Sawal, i-fani futu nyagimoi si-tomodi, malam Jumaatu...

"In the year Alif of the Moslem era 1296, during the month of Sawal, on a Thursday night, the seventeenth night of the moon..."

Verbal adjectives in English

It may seem pretty exotic to English-speakers that in some languages adjectives behave almost exactly like verbs, but this strategy is not as un-English as it looks. Consider the following colloquial American English sentences, with more formal approximate "translations":

This rocks! - This is good. (and not *This is rocking!)
That would rock! - That would be good.
That rocked! - That was good.
That was a rockin' day. - That was a good day.

This sucks! - This is bad. (and not *This is sucking!
That would suck! - That would be bad.
That sucked! - That was bad.
That was a sucky day. - That was a bad day.

In these, a property usually expressed with an adjective ("good", "bad") is being expressed using a stative verb, but only in predicative constructions (that is, to form a sentence.) In attributive function (that is, modifying a noun) an adjective derived from this verb is used. Like other stative verbs ("know", "be") but unlike non-stative verbs, it uses the simple present form to express a current situation, not the present continuous.

Within English, this pattern may seem pretty odd. But it corresponds rather well to how adjectives are expressed in Songhay languages, eg Koyra Chiini (Heath 1999:73). There, properties are expressed in predicative contexts just like verbs, with the same mood/aspect/negation particles, and in attributive contexts usually take a suffix:

ni beer - you are big (like ni koy - you went)
hal a ma beer - until it gets big (like a ma koy - he will go)
har beer - a big man

ni futu - you are bad
har futu-nte - a bad man

In Songhay the perfect aspect is used with stative verbs to express a current situation; but, like the English simple present tense, this is the simplest indicative verb form. The chief difference is that in Songhay the predicative verbs are used for inchoative senses too, as if "That rocks!" could mean "That is becoming good" as well as "That is good".

Typologically, I find it kind of interesting that what looks like a couple of verbal adjectives should be lurking in the recesses of the English lexicon. But it also has practical applications: if I were trying to teach Songhay or a typologically similar language to Americans, I would certainly start by discussing the example of these two English words.

Coptic adjectives

A little follow-up on the previous post, based mainly on Reintges' Coptic Egyptian (Sahidic Dialect): A Learner's Grammar:

In Coptic, predication of properties is handled exactly as for nouns, including the use of an determiner with the adjective:

hen-noc gar ne neu-polytia for are their-labours.
For their labours are great.

In attribution, the structure is Determiner - A - n - B, where A can be the noun and B the adjective, or vice versa:

ou-kohi n-soouhs: a-small n convent
t-parthenos n-sabê: the-virgin n prudent

To express the material of which something is made, you use the same structure, except that only B can be the material:

t-kloole n-ouein: the-cloud n light "the cloud of light"

Note that this is separate from the attributive construction:

ntof pe-iôt pahôm "He, our father Pahom"

So can adjectives be distinguished as a separate word class, when they behave so much like nouns? The answer is yes: an adjective is an item that can occupy either A or B in the attributive structure without a change in referential meaning. (See Coptic Grammatical Categories, Shisha-Halevy, p. 53.) If you reverse the constituents of a genitive or material construction, you change the referential meaning: "a vessel of wood" vs. "vessel wood (ie wood for vessels.)" If you do so for an adjective-noun attributive construction, the referential meaning stays the same: ou-noc n-polis or ou-polis n-noc both refer to the same entity, "a big city". So for this case, Dixon's hypothesis scrapes through.

Adjectives - who needs 'em?

Most languages have a class of words that express properties and behave differently from other words. These are called adjectives. In English, for example, words like "red" or "old" or "tall" behave differently from nouns or verbs. For example, you add -s to verbs in the present tense if their subject is 3rd person singular, like "he sings" or "she eats"; but you can't add -s to an adjective, so you say "he is red" rather than *"he reds". You can put "very" before an adjective ("very red"), but not usually before a noun (you can't say *"very food".) Verbs can't be placed between "the" and the noun (unless you add an ending like -ing or -ed), but adjectives can (you can say "the red car", but not "the move car").

It turns out, according to Dixon 2004, that practically every language - perhaps every language - has at least one separate class of words, definable purely on the grounds of their (morphosyntactic) behaviour rather than their meaning, that refer to properties. This class typically includes words expressing size, age, value, and colour, and sometimes more.

But often, a concept expressed using an adjective in one language is expressed only by a verb or a noun in another. For example, in Kwarandzyəy adjectives come between the noun and the plural marker:

ạdṛạ kədda yu
mountain small PL
"little mountains" (hills)

But there is no adjective "happy" in Kwarandzyəy; instead, you use a verb, yəfṛəħ "be happy, rejoice". And to say "the happy people", you say "the people who are happy/have rejoiced":

bạ γ i-ba-yəfṛəħ
person who they-PF-happy

Moreover, though they may always be distinguishable by some test, they usually tend to behave very much like another word class. In fact, Stassen 1997:30 (link goes to 2003) postulates that in every languages adjectives handle predication (saying "X is red", for example) in the same way as either verbs, nouns, or locations. For example, in English or Arabic, adjectives handle predication like nouns (you say "He is tall", just like "He is a footballer"); in Korean or Tamasheq, they do it like verbs; and some languages, like Japanese, have both verb-like and noun-like adjectives.

So clearly people can do without some adjectives, and clearly the behaviour of adjectives tends to be very similar to the behaviour of some other word class. Why not do without them altogether? It would be easy enough to construct a language where no morphological or syntactic tests could distinguish adjectives from verbs, or from nouns. So if practically every language does take the trouble to distinguish them, there must be some pretty powerful cognitive motivation for it - and some pretty powerful historical tendencies acting to separate adjectives from verbs and/or nouns. The question isn't directly relevant to my current work, but it's worth thinking about.