Kalusa

Kalusa is a collaborative and corpus-driven, unplanned constructed language. The project was created by retired video game programmer Gary Shannon, and launched online in May 2006. It is most notable for its distributed, anonymous contribution process.

The definitive source of the language is a corpus of Kalusa to English translations. The only way to contribute to the language was to add to this corpus anonymously. This took place on a web interface hosted by Shannon. Due to social tensions the project was abandoned by September 2006, but not before spawning over 4,200 corpus sentences, 1,300 contributor comments, and 200 mailing list messages.

Overview
The project started with four seed sentences, posted by Shannon, with accompanying English translations. Collaborators were then encouraged to add their own sentences and translations, based on the existing material. All sentences were submitted anonymously, and could be anonymously rated up or down by the community.

The rating was quantified as a Correctness Quotient (CQ). The CQ could range between zero and 200, with a score of 100 denoting an average correctness. Sentences above that score were considered to be good examples of Kalusa by the community, and conversely those below were considered bad examples. Sentences that dropped below a certain level for a certain period of time were deleted entirely from the corpus.

As the language grew, the primary source of the language always remained the corpus. Several references works were, however, also created to document especially the vocabulary of the language. There were also out-of-band discussions concerning the language's features. These discussions precipitated an impromptu Kalusa community.

History and community
In June 2004, Shannon had created a corpus driven language called Madjal. The collaborators on that project were Andrew W. Soukup, Roger Mills, David Peterson, Sally Caves, and a further pseudonymous contributor. The obvious similarity of Kalusa to Madjal is evident from a section describing the latter's rules: "The only rule of Madjal is that there are no written grammatical rules. All that is known about Madjal grammar and vocabulary is found in the corpus which includes almost everything that has ever been written in the language."

Madjal is cited as a direct influence on Kalusa by David Peterson. On 22 May 2006, Shannon announced the new Kalusa project to the CONLANG-L mailing list. In the announcement he describes that some years ago on the same list he "tried to start up a collaborative conlang project that turned out to be impractical", but that now he has "found a way to make it work". By 28 May 2006, the new language was growing at a rate of 110 new words per day.

Though submissions to the corpus were anonymous, the web interface also housed a comments system. Though usernames were required to comment, there was no login system, and pseudonyms were used almost exclusively. The discussions in the comments system were the primary means of communicating about the language, and more was written about the language there than in any other forum. As the initial announcement of the project was on a constructed language mailing list, inevitably some community discussion arose there. When the comments were seen to be so obviously popular a feature, a Kalusa mailing list was set up. This did not achieve the popularity of the comments system, but still proved a lively forum for more detailed interchange and debate.

Writing system
The writing system of Kalusa is alphabetical, with all letters taken from the ISO basic Latin alphabet. Only twenty-four letters from this set of twenty-six are used, with "c" and "x" being omitted.

According to community analysis, the approximate order of frequency of letters used in Kalusa grouped into frequency tiers is: a, i e, k o s u r, m t n z, d l y, p v h g b, q f, w j. The top twenty digraphs are ki, es, ia, sa, ya, za, ay, ka, is, ku, zi, ha, ko, ze, au, go, ok, se, az, iz.

Pronunciation and phonotactics
As a primarily corpus-driven language, Kalusa did not originate as a spoken language, and therefore has no definitive pronunciation. There are few community comments on pronunciation, such as an early one proposing that letters should have their IPA values apart from y, sh , zh , ng and (maybe more contentiously) c , q. Perhaps facetiously, there was a suggestion that it would be good if both tt and th were pronounced.

Though there are few comments on the pronunciation and phonology of the language, there are several comments on phonotactics. Since new words were created based on corpus samples, it was important for contributors to have at least an intuitive grasp on the phonotactics of existing words in order for new words to appear consistent. Some contributors went to some lengths to get a more than intuitive grasp, as in the following comment by Jim Henry:

There are many long-established words that begin with consonant + r [...]. There are also many long-established words that have two final vowels. E.g., "krevo", "kia", "trosu", "kua". I agree that words with two final consonants, or two initial consonants where the second is not |r|, |y| ("nyava", "pyanezres"), or |w| ("kwa"), violate Kalusa phonotactics. I would prefer to have fewer words ending in a consonant [...] but words ending in consonants have been there from the beginning, such as "es".

Grammar and morphology
Kalusa is a moderately isolating language, with a most commonly SVO word order.

Adverbs can be created by adding an -at or -rat suffix to an adjective.

Vocabulary
As of 16 June 2006 there were 464 words in Kalusa. In Shannon's Dictionary of Modern Kalusa, which lists only 181 Kalusa words, there are 77 nouns, 40 verbs, 26 adjectives, ten adverbs, eight pronouns, six conjugations, five prepositions, five particles, three names, one suffix, one honorific, and one colloquialism. The dictionary came under criticism, however, for being too "definite" in its interpretations, and "perhaps premature". It was apparently abandoned early on, possibly due to this criticism.

Words occurring over 1% of the time in the corpus as analysed by Jim Henry were in decreasing order of frequency: ma, es, kia, da, dun, za, ira, lok, goro, kome, taya, sam, pe, bogi, kisa, and ib. Their meanings are given here as stated in the Dictionary of Modern Kalusa; another drawback of this dictionary can be seen in the fact that of these sixteen most frequent words, five are not described in the dictionary.


 * {| style="width: 90%"


 * style="width: 45%" | ma — first person singular (pron.) es — accusative case marker (part.) kia — of or belonging to, used to join nouns to adjectives (part.) da — noun conversion (part.) dun — past tense (part.) za — no definition ira — third person singular (pron.) lok — be at or located at (noun)
 * goro — the copula be, is, am, or are (verb) kome — to eat (verb) taya — no definition sam — no definition pe — no definition bogi — to have (verb) kisa — no definition ib — and (conj.)
 * }
 * }

Criticism
The lack of a standard phonology was identified as a source of problems. "Not being a spoken language, apparently little attention is being paid to the sound of the language, and words and sentences that are either unpleasant tongue twisters, or frankly childish sing-song constructions have found their way into the language."

The voting system to set CQ values was initially an open vote. Any person could vote any number times. This resulted in a large attack on the corpus on 13 June, referred to as "the massacre", causing many sentences to be deleted. This loophole was subsequently fixed, when Shannon "later modified the Kalusa software to disallow multiple votes on the same sentence from the same IP address." The "massacre" may have prompted the jocular message from Shannon on 16 June describing how a devastating eruption on the Island of Kalu had destroyed much of the Great Library.

Given the intense activity at the beginning of the project, it is not surprising to also find many positive comments about the language. Speaking of collaborative constructed languages, Jim Henry said that Kalusa "was by far the most interesting of the ones I've been involved in". David Peterson described Kalusa as "one of the most interesting collaborative conlangs I've ever seen".

Examples
Representative example sentences with a high CQ include:


 * Elamu kisa dun kome au miqi teset. (Sentence 536)
 * This apple was eaten by a mouse.


 * Ma ziresh es awan kia elamu. (Sentence 564)
 * I want a slice of apple.


 * Ma dun gada lok mung kia kauno. (Sentence 587)
 * I trod in dog feces.


 * Za da leota-ruba. Za goro orgon. (Sentence 610)
 * You are orange. You are an orange.


 * Ma dun goro zhati kia biti. Za dun kome es zhati kia biti. (Sentence 685)
 * I was a small bird. You ate a small bird


 * Teya trin teset vige. (Sentence 724)
 * Tea can be drunk.


 * Zhati biti terehe poi sepahuwe, ira dun kevuzi. (Sentence 758)
 * The little bird tried to fly over the rainbow.

Earliest corpus sentences
The first five retained sentences in the early 29 May corpus, with Correctness Quotients, are:

Sentence 5 was deleted due to downvoting.

Literature
The Saga of Malia or the Saga of Malia and Kuana, a surreal modernesque folk story about a milkmaid and her calf, is cited as the first and perhaps only example of Kalusa literature. Extended uses of Kalusa outside of the corpus were rare, though the occasional use of sentences or phrases such as "Ka Kalusa da vezya!", the imperative of "Kalusa is strong!", was comparatively common.