This page is for a list of methods for self-segregating morphology: ways you can design a conlang so that morpheme boundaries, or word boundaries, or both, are always obvious and unambigious.
Here are a couple of threads about the topic:
- A thread started by Rex May in March 2006 on the AUXLANG mailing list
- A thread started by Jim Henry in April 2006 on the CONLANG mailing list
- All morphemes are the same length (all are one syllable, or perhaps all are two syllables, and the phonotactics are such that syllable boundaries are unambiguous).
- A subset of the phonemes of the language are designated as an initial set (a), and the rest of the phonemes as the subsequent set (b). A word must begin with one or more phonemes from the initial set and end with one or more from the subsequent set. (Tceqli uses this method, with plosives and fricatives in the initial set, and vowels, nasals and liquids in the subsequent set.) Words might have the forms ab, aab, abb, aaab, abbb, etc., and the morpheme boundaries are wherever a b phoneme is followed by an a phoneme. Some variations of this method are:
- You could divide up the phonological segments into the following classes; a. Segments that can be the first segment of a morpheme, but can't be any non-first segment. b. Segments that can't be the first segment of a morpheme, but can be any non-first segment. Then the morphemes will look like a, ab, abb, abbb, abbbb, ... etc. Morpheme boundaries would occur just previous to each a.
- You could divide up the phonological segments into the following classes; c. Segments that can be the last segment of a morpheme, but can't be any non-last segment. d. Segments that can't be the last segment of a morpheme, but can be any non-last segment. Then the morphemes will look like c, dc, ddc, dddc, ddddc, ... etc. Morpheme boundaries would occur just after each c.
- If you require every morpheme to contain at least two segments, you could divide up the phonological segments into the following classes; e. Segments that can be the first or last segment of a morpheme, but can't be any non-first not-last segment. f. Segments that can't be the first nor last segment of a morpheme, but can be any non-first non-last segment. Then the morphemes will look like ee, efe, effe, efffe, effffe, ... etc. (Without the two-segment-minimum, ee might be "e, e" or might be "ee". Morpheme boundaries would occur just after each fe and just before each ef, but a string of ee morphemes would have to be parsed globally; you couldn't tell how to parse it unless you had the whole utterance.
- If you require every morpheme to contain at least two segments you could divide up the phonological segments into the following classes; b. Segments that as before can't be the first segment of a morpheme, but can be any non-first segment. d. Segments that as before can't be the last segment of a morpheme, but can be any non-last segment. Then the morphemes will look like db, ddb, dbb, ddbb, dddb, dddbb, ddbbb, ... etc. Morpheme boundaries would occur just before the d in bd.
- Divide phonological segments into three sets: a (initials), b (medials) and c (finals). A word boundary only occurs at a c-a pair. This means that any type of letter can occur immediately after a b and still be part of the same word.
- A subset of vowels are used only in initial or final syllables, while others are used in others. Konya did this, with /e i o u/ in initial syllables and /a/ in second and subsequent syllables of a polysyllable. Or one could use pure vowels except in final syllables, which must have a diphthong; or ditto with front and back, or rounded and unrounded, or nasal and oral vowels...
- The initial phoneme indicates the number of syllables to follow (as in Jeff Prothero's Plan B).
- Require the last segment of each morpheme to code the length of the morpheme. This has the disadvantage of requiring you to parse from the end of an utterance backward.
- All morphemes begin and end in a consonant and have no consonant clusters within them. A consonant cluster therefore marks a morpheme boundary.
- Inverse of above: all morphemes/words begin and end in vowel, and have no sequences of two vowels within them. Two vowels in a row mark a morpheme boundary. (Ilomi uses a variation of this, with two vowels in a row marking a word boundary and /n/ between two vowels marking a morpheme boundary within a compound word.)
- Modification of either of the above methods: To avoid adjacent vowels slurring into diphthongs, or possible difficult consonant clusters at word/morpheme boundaries, reserve a particular consonant (perhaps /?/ or /n/ or /l/) to mark boundaries between VCV... morphemes or a particular vowel (perhaps schwa) to mark boundaries between CVC... morphemes.
- Tone or stress marking to distinguish initial or final syllable from following or preceding ones, and maybe distinguish monosyllables from intial syllables of polysyllabic words.
- The Ilomi method mentioned above, with /n/ marking morpheme boundaries within compound words and a sequence of two vowels marking a word boundary; or its consonantal inverse, with CVC... morphemes and a schwa or some such unstressed vowel marking morpheme boundaries within words, consonant clusters marking boundaries between words.
- A variation on the above, with multiple intra-word conjunctions reserved for specifying the particular manner and/or order in which morphemes within a compound modify each other. There could be high and low precedence joiner morphemes, such that /ipeNahumafi/ could be parsed into /ipe/, /ahu/, and /afi/, and then the joiner morphemes /N/ and /m/ specify that /ahu/ + /afi/ modifies /ipe/ rather than /ipe/ + /ahu/ being modified by /afi/, to avoid ambiguity within compound words. Or the different joiner morphemes could specify the way the modifier morpheme applies to its head (quality, source, purpose, admixture, equal mixture, etc.); or with a larger set there could be high and low precedence versions of each manner-conjoiner morpheme.
And Rosta's Livagian uses another method which, though not a self-segregating morphology in the strict sense, partly serves the same purpose with less restriction in the phonological shape of words. It requires a full knowledge of the lexicon to parse unambiguously, however. The key is that no actual morpheme must look like a prefix or suffix substring of another actual morpheme. So, for instance, if in a string "kesumalipe" you recognize "kesu" and "pe" as familiar morphemes, you know that this must be "kesu" followed by "ma li" or "mali" followed by "pe"; the fact that "kesu" is a real morpheme in a language meeting this criterion means that there cannot be another morpheme "kesuma" or "kesumali", and there can't be any morpheme like "lipe" or "malipe". But if you have only learned the phonology of the language and don't know much vocabulary yet, you can't deduce the morpheme boundaries from the phonotactics of the word; you would have to start by looking up "k" in the lexicon, then "ke", then "kes", until you find "kesu"; then start looking for "m", "ma", etc.