This language is partly an experiment in a 'minimalist' phonology. It is inspired by Austronesian and Japonic languages, as well as some North American influences, although it is not designed to be too similar to any of them. I will call this language 'M' for now.

The Phonology

The phonology of M is characterised by its relatively small number of phonemes, a simple syllable structure, pitch accent, and a large amount of allophony.


A concise analysis gives M 13 individual consonantal phonemes. Notably, only the velar series distinguishes plain, palatalised and labialised forms. It is also lacking in any true labial phonemes, although these do reappear as allophones in certain contexts.

Stops: /t k kʷ kʲ/
Nasal: /n ŋ ŋʷ ŋʲ/
Fricative: /s x xʷ xʲ/
Rhotic: /r/

In addition, the archiphonemes */w/ */j/ */ʔ/ are necessary to account for various diachronic and morphological processes, but they never (or very rarely) actually surface as such. Indeed, the labialised and palatalised series can be analysed as /Cw Cj/ clusters, by which analysis M would only possess 10 consonant phonemes.

The vowel inventory is less problematic.

/a e i o u/

/a/ is more accurately the open-central vowel [ä].
/e o/ are more accurately mid [e̞ o̞].
/o/ after labials, and often after palatals, is unrounded [ɤ].
/i / is close to the cardinal value.
/u/ varies between [u~ɯ~y], with [ɯ] following labial consonants, and [y] following palatal ones.

In addition, /o u/ are unrounded when adjacent to /e i/. /oe eo ue eu/ [ɤe eɤ ɯe eɯ] /oi io ui iu/ [ɤi iɤ ɯi iɯ].

A more recent change is rounding-harmony. That is to say, if a syllable contains [ɤ ɯ], a following syllable with /o u/ will also be [ɤ ɯ].
/ŋʷoro/ > [mɤro] > [mɤrɤ].

Sequences of vowels are common, including sequences of the same vowel, although each vowel is treated as a separate syllable and not a long vowel. Diphthongs do not occur.

Allophony and Phonological Processes

The surface realisation of M is much more than just the above, as most consonants (and vowels to a lesser extent) show considerable allophonic variation.

Palatalisation and Labialisation

These two processes are arguably the most important in M's phonology and are responsible for many of the allophonic variations. The processes and irregularities are more easily explained by looking at the Proto-language.

The Proto-Language is believed to have had the following vowels. Although they are generally treated as vowels, phonetically they likely already caused +lab / +pal of the preceding consonant.

*a *ja *wa
*i *wi
*u *ju
*je *wo
*ɨy *ɨw

That is to say, plain */e o/ did not exist.
*ɨy *ɨw later became non-palatal/labialising /e o/.
*/i u/ did not contrast with */ji wu/, but are inherently palatalising/labialising (i.e. they can be thought of as /ji wu/.)
/je/ became /jɤ/ after velar consonants, later becoming /jo/ [ʲo~ʲɤ].

At some point, the palatal/labial element of the vowel was transferred onto the consonant in M, or lost entirely.

/r n/ de-palatalised/labialised in all positions and do not participate in any of these processes.
/ŋ k x/, however, gained (or maintained) a +lab / +pal distinction.

/s/ is intermediate in that it merged with /x/ before /j/, but lost labialisation.
/t/ is also intermediate in that the outcome of *tj is /ss/, except /tji/ [tsi], and /tju/ [tsu], but lost labialisation.

The palatalised series is thus realised:

/ŋʲa ŋi ŋʲo ŋʲu/ [ɲa ɲi ɲo~ɲɤ ɲy]
/kʲa ki kʲo kʲu/ [cça tɕi tɕo~tɕɤ tɕy]
/xʲa xi xʲo xʲu/ [ça ɕi ɕo~ɕɤ ɕy]

*/tʲa ti tʲo tʲu/ [ssa tsi sso tsu]

[cça ça] can also be [tɕa ɕa], with the former being a more conservative realisation.
[o] alternates with [ɤ] after palatals, with [o] being more common in adjacent syllables containing [o u]. E.g. /rukʲo/ [dutɕo] but /nikʲo/ [litɕɤ].

Only the velars participate in labialisation, and have more variable realisations.
At some point after the palatalisation process was complete, /wi/ became a non-palatalising /i/.
The labialised series is notable for its labial allophones after /wo u/. After/simultaneous with this change, /wo u/ lost rounding becoming [ɤ ɯ] after labial allophones.

The labialised series is thus realised:

/ŋʷa ŋʷi ŋʷo ŋu/ [ŋʷa ŋi mɤ mɯ]
/kʷa kʷi kʷo ku/ [kʷa ki pɤ pɯ]
/xʷa xʷi xʷo xu/ [xʷa xi ɸɤ ɸɯ]

Labialisation in /ŋʷa kʷa xʷa/ does not involve rounding. They may be more accurately [ŋᶭa kᶭa xᶭa].

/tu/ is notable in that it is either realised as [tsu] or [ku], with [tsu] appearing in high-pitched syllables, and [ku] with a low-pitch. (See further posts on pitch-accent system and interaction with consonants.)

Other Allophony

Some consonants have further allophones depending on where they occur in a word. This is somewhat further complicated by pitch-accent, but this will be discussed later.

/k kʷ kʲ/
Word-Initially, the /k/ series are all lightly aspirated.

/k/ voices and lenites to [ɣ~β] intervocalically, where [β] corresponds to intervocalic [p]: /aka/ [aɣa], /aku/ [aβɯ].
Similarly, /kʷ/ voices and lenites to [β] between vowels /akʷa/ [aβa].
/kʷi/ [ki] may be either [ki]~[gi].
The sequence /iku/ elides the /k/ entirely: [iɯ].

/kʲ/ does not voice intervocalically, remaining unvoiced-unaspirated [cç~tɕ]: /akʲa/ [acça], /akʲu/ [atɕy]

/ŋ ŋʷ ŋʲ/
Word-initially, the /ŋ ŋʷ/ may fortify to [g gʷ] and /ŋʲ/ may lose its nasal element becoming [j].

The sequence /uŋV/ is regularly [umV]. This is one of the few instances of regressive assimilation in M: /tuŋi/ [tsumi].

Intervocalically, /ŋʷ ŋʲ/ are commonly realised as nasalisation on the preceding vowel, followed by the corresponding glide [ɰ j].
The sequences /ŋʷi ŋi ŋʷo ŋu/ [ŋi ɲi mɤ mɯ] are exempt, retaining the consonantal value.

/aŋʷa/ [ãɰa]
/aŋʲa/ [ãja]
/aŋʷo/ [amɤ]

Where the /x/-series is not realised [ɸ ɕ], /x/ is in free variation with [h~ɦ]. [x xʷ] are more common before /a/. E.g. /axʷa/ [axʷa]~[ahʷa].

/t/ is lightly aspirated word initially, and unvoiced-unaspirated medially. /ti/ is always [tsi].
As mentioned, /tu/ can be [tsu]~[ku] depending on pitch.

/r~d~l/ and /n~l/
/ri ru/ are [di du] word-initially, otherwise /r/ is most commonly a single tap [ɾ], although variations of [l] may be heard especially before /I/.
/ni/ is usually [li]. This means non-initial /ri-ni/ are often merged into [li] for many speakers. /n/ may be [l] elsewhere for some speakers.

To come...

Other allophony
Pitch Accent

Pitch accent is a badly defined term, but I use it here to mean several things usually associated with "pitch accent":
Syllables are all pronounced with equal weight, that is to say, vowels do not reduce and syllable prominence is more associated with pitch than loudness/reduction/length.
There is a distinction between H and L pitches, which is not reducible to stress. The exact realisation of H and L is not absolute, however.

Pitch accent was likely present in the proto-language, but M has developed further contrasts, mostly from H-clusters and Q-clusters.

H is represented by an acute (á), and L by a grave (à). When M (mid) is indicated, although this is not phonemic, a macron is used (ā).

This section is a work in progress, and will probably change.

Disyllabic Words

Disyllabic (or dimoraic, if you will) words can take one of three accentual patterns: /HL/, /LH/, or /HH/ (flat). For HH words, the importance is that the accent is 'flat' across the word, without a large rise/drop in pitch. It can therefore also be realised /LL/ or /MM/.

HL tága [táɣà]
LH tagá [tàɣá]
HH tágá [táɣá]

As each vowel occupies a syllable, the same applies to adjacent vowels:

HL káa [ká.à]
LH kàa [kà.á]
HH káá [ká.á]

Disyllabic words with CvH or CvQ in the first syllable usually give rise to LH and HH patterns. Although the final syllable may be marked for +L:

LH sìHka /ɕìʰká/
HH síQká /ɕíkká/

sìHkàH > LL > HH /ɕíʰká/ (HH, LL merge into 'flat' HH)
síQkàH > HL > HL /ɕíkkà/

(I am still undecided whether to allow a HH LL distinction in disyllablic words).

Geminate initials count as a single mora, but such words can only be (L)H or (H)L as the pitch is mostly heard on the vowel (or only on the vowel in the case of /ss/):

ńnà / ǹná
śsà / s̀sá

Trisyllabic Words

Trisyllabic words can have one of four accentual patterns: HLH, LHL, HHL, LLH.

HHL and LLH are usually realised as a gradual fall and rise across the word, i.e. HML, LMH. These are called falling and rising accents.

HLH táròxʲà [táròɕá]
LHL àórà [àórà]

HHL tároxʲa [tárōɕà]
LLH taroxʲá [tàrōɕá]

HLH will likely become HHH in some contexts.

Phrasal Dynamics

M generally disallows a phrase to end with a marked rise (i.e. no LH). When a LH, HLH, or LLH word ends a phrase, a few changes take place.

/LH/ words become > /LH.L/, with the L vowel being lengthened to accommodate a LH rise:

tagá > LH [tàɣá] but phrase-finally: [taáɣà] LH.L
siHká > LH [ɕìʰká] but phrase-finally: [ɕiíʰkà] LH.L

Words which have identical adjacent vowels do not lengthen instead, e.g. [kà.á] become LL: [kà.à]. These may be marginally distinguished from original /HH/ words by a lower overall pitch.
Non-identical vowels can either lengthen or become LL, however, e.g.: [kà.é] > [kaá.è] or [kà.è].

To be continued...

Mora Madness

The mora (μ) is a smaller and more important unit of analysis than the syllable in M for determining pitch-accent and other morpho-phonological processes.

(Note that the pitch-accent system from the previous post will likely be overhauled, and so pitch-accent marking here is only partial.)

A mora can take the following forms:
μ = CV, V, C

V-morae and C-morae are known as vocalic morae (Vμ) and consonantal morae (Cμ).

The canonical mora is CV:
ta μ
tɕi μ

Vμ and Cμ behave slightly differently in certain contexts, but are nonetheless treated as a full mora.

a μ
taa μμ (ta.a)
na μ
nna μμ (

Consonantal Morae

Cμ are divided into 4 types: , , and 'defective' or 'vowel-less' morae.

can be analysed as underlying moraic /ʔ/.
can be analysed as underlying moraic /h/.
can be analysed as an underlying moraic nasal unspecified for place of articulation.
Defective morae are consonants which lack a vowel, always leading to a geminate consonant.

The surface realisation of these are quite variable but the above can explain most pitch-accent and morpho-phonological processes which might otherwise seem opaque. The surface realisation of these represent major dialectal isoglosses; namely, geminating dialects in which Qμ Hμ tends to cause gemination, and vocalising dialects in which Qμ Hμ tend to cause vocalisation. The main dialect described here is more geminating than vocalising.

and rarely surface as [ʔ h], but retain their moraic value. always causes high-pitch on the preceding vowel, and causes low-pitch.

Qμ Hμ + Vowel or /r/

When followed by a vowel or /r/, a Qμ vocalises with the corresponding high pitch:
taQ + i [táái]
taQ + ra [táára]

Hμ followed by a vowel are realised as [h.hV] where gemination preserves the mora:
taH + a [tàhha]

Before /r/, it vocalised with low pitch:
taH + ra [tààra]

Hμ in vocalising dialects repeats the previous vowel with low pitch:
taH + a [tàhha] - [tààha]

Qμ Hμ + Stop / Fricative

Before stops and /s x/, Qμ and Hμ cause gemination, with the corresponding pitch:

taQ + ka [tá.k.ka] || taH + ka [tà.k.ka]
taQ + ti [tá.t.tsi] || taH + ti [tà.t.tsi]
taQ + kʷo [tá.p.po] || taH + kʷo [tà.p.po]
taQ + sa [tá] || taH + sa [tà]
taQ + hu [tá.ɸ.ɸɯ] || taH + hu [tà.ɸ.ɸɯ]

Qμ Hμ + Nasal

Before nasals, Qμ assimilates to the following nasal, where the moraic nasal gains a high-pitch:

taQ + na [tá.ń.na]
taQ + ŋʷo [tá.ḿ.mɤ]

Hμ behave somewhat differently before nasals. HN sequences metathesise to /Ǹh/, where /h/ gains place of articulation of the metathesised N, but N retains the low-pitch caused by Hμ.

taH + na > taN + sa [tà.ǹ.za] na > sa
taH + ŋʲa > taN + xʲa [tà.ɲ̀.ɕa] ŋʲa > xʲa
taH + ŋa > taN + xa [tà.ŋ̀.xa] ŋa > xa

Vocalising dialects simply have a low-pitch repeated vowel: taH + ŋa [tà.ŋ̀.xa] vs [tààna].

Nμ represents a nasal unspecified for place of articulation. Unlike Qμ Hμ, it can bear its own pitch.

Before a vowel, it is treated as geminate [ŋ], with its usual allophones:

kaŃ + a > [ka.ŋ́.ŋa]
kaŃ + u > [ka.ḿ.mɯ]

Before consonants, it assimilates to the place of articulation of the following consonant, causing voicing of stops and /s/.
/Ns/ > [nz], /Nki/ > [ɲdʑi], etc. but /xʲ/ does not voice /Nxʲ/ > [ɲɕ].

kaŃ + ta > [ka.ń.da]
kaŃ + ka > [ka.ŋ́.ga]
kaŃ + xʲo > [ka.ɲ́.ɕo]

Word-finally, it is often realised as a repeated nasalised vowel/glide, or moraic [ŋ].

kaŃ > [ká.ã́] ~ [káɰ̃́] ~ [ká.ŋ́]
kaǸ > [kà.ã̀] ~ [kàɰ̃̀] ~ [kà.ŋ̀]

See the section in Post-Lexical Rules regarding bimoraic words in Nμ which suppress tone.


All consonants may appear geminated except /r/ medially. Many of these are caused by Qμ, Hμ, but the proto-language likely also had geminates.

Nasals and fricatives can also appear geminated word initially: /nn ŋŋ ŋŋʷ ŋŋʲ ss xx xxʷ xxʲ/. In these cases, the first element of the geminate is treated as a single defective mora, lacking a vowel. With nasals, the moraic N may bear pitch. I.e. [ń.nà] and [ǹ.ná] are distinct.

Many word-initial geminates derive from sequences such as
/sá.sá/ > /s.sá/ which explain their moraic value. This is further evidenced by words such as /s.sá/ not undergoing the vowel lengthening which monomoraic words undergo. (/sá/ > [sáá]).

This process is still somewhat productive in M.

These geminates may also occur after Qμ Hμ Nμ. Qμ Hμ vocalise to preserve the moraic value, with the corresponding pitch. Nμ is variable in this circumstance, being either a nasalised vowel, glide, or nasal consonant.

taH + ssa /tàà.ssa/
taŃ + xxʷa [taã́.xxʷa] ~ [taɰ̃́.xxʷa] ~ [taŋ́.xxʷa]

This occasionally leads to a sequence of 3 identical vowels:

taáQnna /taáánna/ [taʔáánna]

(See following section for how these are treated).

Post-Lexical Rules

Monomoraic Lengthening
Many words are underlyingly monomoraic in M, but in isolation or at the end of an intonation unit they become bimoraic. This is achieved by lengthening the vowel, with the pitch maintained:

/tá/ ‘mud, earth’ is realied [táá] in isolation.
/ŋà/ ‘duck’ is realised [tàà] in isolation.

Alternatively, in careful speech such words might gain [ʔ] and [h], as these are associated with H and L pitch:

/tá/ ~ [táá] ~ [táʔ]
/ŋà/ ~ [ŋàà] ~ [ŋàh]

This makes /tá-Taq, tà-tàH/ functionally homophones in isolation or at the end of intonation units, but the distinction reappears in all other contexts.

Bimoraic words in N appear to have drifted into the same treatment as monomoraic words, in that they only distinguish HH-LL pitches. This is better considered an allophonic realisation which only occurs when they are in isolation or at the end of an intonation unit, as the pitch of N can reappear with suffixes.

/kàŃ/ > [káã́] ~ [káɰ̃́] underlying LH, but realised HH at end of IU
but /kaŃ + a/ > /kà.ŋ́.ŋà/ LHL, N retains its high pitch

Additionally, words which are bimoraic in their underlying form distinguish HL, LH and HH as usual:
tá.à, tà.á, táá

There is an increasing tendency for flat-pitch VN sequences to be realised as /VṼ/:

káŃxu > [káã́.ɸɯ]
kàŃxu > [kà.ɰ̃́.ɸɯ] ~ [kà.ḿ.ɸɯ]

Epenthetic /ʔ/
Epenthetic /ʔ/ between vowels is non-phonemic, but relatively persistent. It mostly occurs to signal a pitch rise in sequences of identical vowels, and also possibly due to CV being the preferred type of mora.

It is most often inserted in sequences of 3 identical vowels. The general rule is that /ʔ/ occurs before a rise or fall in pitch in LHH and HHL sequences:
kàáá [kàʔáá] L.ʔHH
kááà [kááʔà] HH.ʔL

Otherwise it is generally attracted to LH rises, where it co-occurs with the H mora:
káàá [káaʔá] H.LʔH
kàáa [kà.ʔáa] LʔH.L

A sequence of three identical vowels with the same pitch is rare. In these cases, it is usually inserted before the final mora, or elided:
kaaHxxu [kàà.ʔà.ɸɸɯ] where /ʔà/ corresponds to the Hμ.

Where the final-vowel of a 3-identical-vowel sequence occurs before an unvoiced consonant, this vowel is often unvoiced unless subject to LH:
kààHssa [kààʔḁssa]
kááQxxu [kááʔḁɸɸɯ]

In fast speech where /ʔ/ is elided/not inserted, a 3-way length distinction may be perceived, although speakers of M would consider each instance of a vowel its own unit:
/kàssa/ [kàssa]
/kaHssa/ [kààssa]
/kàaHssa/ [kàààssa]

Sequences of 3 non-identical vowels often do not insert /ʔ/:
/íua/ [íua]
/eói/ [eói]

Two identical vowels followed by another vowel with a rise do insert /ʔ/:
kààí [kààʔí]

Sequences of four vowels regularly insert [ʔ] between VV.VV, regardless of pitch.
kaéaé [kaéʔaé]

Sequences of 5 or more vowels can be unpredictable regarding [ʔ], with the LH and VVʔVV rule usually being predominant.
kaéaéa [kaéʔaéa]
tíoaéa [tsíoaʔéa]

Having more than one epenthetic [ʔ] in a word or in close proximity is generally avoided.

Relatedly, in some dialects, the /k/ series remains unvoiced in LH sequences where H = /k/μ.
uká [uɣá] ~ [uká]

/tu/ is realised [tsu] when H in a LH contour, otherwise [ku~ɣu]. This varies somewhat across dialects.
/atú/ [atsú]
/átu/ [áɣù]

Debuccalisation of /x/
/x xʷ/ are more commonly [h~ɦ] between vowels. A clearly velar /x/ is considered 'masculine'. Geminate /xx xxʷ/ are in free variation: [xx~hh, xxʷ~hhʷ].

So…where does this leave the syllable?
It seems that the syllable is largely irrelevant in M, where the mora is the more important unit of analysis.

Syllabification in certain contexts is unclear, given the many VV sequences, initial geminates, and epenthetic /ʔ/.
Is /kaéaé/ 4 syllables [ka.é.a.é], or 2 [kaé.ʔaé]? This has little bearing on the actual realisations, however, where each vowel is given full moraic value.

In terms of mora, the only restriction is that Qμ, Hμ, Nμ are never adjacent.
E.g. *taHN
or *taQH
are impermissible.

There are no restrictions on 'superheavy syllables' (CVVC), which occur freely:
kaáŃssa /kaáńssa/ [kaʔáã́ssa]

Well, I've probably made all of that seem far more complicated than it is, but I don't have a more elegant analysis (yet) so hopefully it's clear enough. [:D]

Finally, some actual words:

/ína/ [ínà] mother
/íxa/ [íhà] father
/ínaíha/ [ínàíhà] parents

/aé/ [àé] younger sibling
/kaé/ [kàé] older sibling
/kaéaé/ [kàéʔàé] siblings

/ŋà/ [ŋàà] duck
/tá/ [táá] mud, earth
/uá/ [uá] lake
/ssàh/[ssàà] snake

To come:

Pitch Accent, Intonation Units, and related stuff
Last edited by Davush on 29 Dec 2019 20:10, edited 1 time in total.

I have wanted to make a language with a detailed but original tonal/pitch accent system for a long time, but it always seemed like a large undertaking if I didn't want to just recreate Japanese/Bantu/Chinese languages. This section is a bit abstract, but I'm quite happy with it and hopefully following sections will actually show the language 'in action' a bit more. I should also probably think of giving MiniLang a name...

Pitch Accent

MiniLang likely falls somewhere between having register tone and pitch accent. That is to say, pitch or tone has a higher functional load in M than in a language such as Japanese. However, only two phonemic tones (High and Low) are needed to explain tonal/pitch-related phenomena in M.

Unlike Japanese, simply knowing where a H tone is located is not enough to predict the word’s accent pattern in M, as sequences such as LHLH and HLHL occur. Pitch also interacts with intonation or phrasal units.

An intonation unit (IU) is roughly defined as a word spoken in isolation, a noun phrase comprising of more than a single noun (e.g. Genitive phrases, Noun + Adj phrases) and Verb-Subject phrases. Intonation units affect the surface realization of lexical pitch at the right-hand edge of the unit, namely, a LH sequence is blocked from occurring at the end of an IU.

Bimoraic units seem to be the preferred pitch-bearing unit in M, as longer sequences lend well to being analysed as sequences of /μμ/ units.
In this analysis, a /μμ/ unit distinguishes 4 pitch patterns, with some restrictions:

Rising (LH)
Falling (HL)
High-Flat (HH)
Low-Flat (LL)

With HH and LL being conditioned by surrounding pitch in words longer than two morae.
Bimoraic LL words are rare, and often merged with HH for many speakers.

Where a LH rise would occur over two adjacent vowels or a VN sequence, this always becomes HH.

LH /nòó/ > /nóó/ HH
LH /kàŃ/ > /káŃ/ HH

A phrase-final (or IU-final) LH sequence is not permitted, becoming either LL, or the final syllable becoming lengthened and gaining a HL fall. The latter is used for emphasis.

For example:
HL: hákà [háɣà]
HH: hágá [háɣá]

LH hàgá [hàɣá] (phrase-medial)
LH >LL hàkà [hàɣà] (phrase-final)
LH > L.HL hàkáà [hàɣáà] (phrase final)

Trimoraic words do not allow a rise or fall to occur on the second mora.
That is to say, trimoraic words can be analysed as /μμ.μ/ or /μ.μμ/ where the pitch must be flat (i.e. HH, LL) on the two morae adjacent to a rise or fall:

Functionally this means the following trimoraic sequences are permitted:

with LLL being a phrase-final allophone of LLH.

The proto-language likely permitted HLH and LHL sequences, but these became re-analysed as L.LH and H.HL (or LH.H, HL.L depending), possibly due to accent spreading of the /μμ/ unit.

Quadrimoraic words more clearly demonstrate /μμ/ as the pitch-bearing unit in which HL, LH, HH and LL, are the main patterns, with the following permissible sequences:


(HH.LH and HH.HH only occur as post-lexical allophones of LH.LH, where the LH is on a VV or VN sequence, and is therefore not phonemic.)

A sequence of LH or HL followed/preceded by a flat pitch assimilate the flat-pitch:

I.e. LHLL, and LLHL become LH.FF (LH.HH) and FF.HL (HH.HL)
and HLHH and HHLH become HL.FF (HL.LL) and FF.LH (LL.LH)

This again indicates that /μμ/ is the tone-bearing unit. In words more than two morae, the distinguishing feature of a flat-pitch /μμ/ is the lack of a rise/fall, with a flat HH or LL assimilating to the following/preceding tone in words of more than two morae.

This means the following changes have been made to the words in the previous post:

/aé/ > [áé] HH due to no LH on adjacent vowels. Similarly, /kaé/ > [káé] and /kaéʔaé/ > [káéʔáé].
/uá/ > [úá] HH

/ína.íha/ > [ínáíhà] HHHL due to /aí/ > HH.

I haven't considered how words of more than 4 morae behave yet, but hopefully this is enough for now... [:D]

The next section will likely consider orthography...

User avatar
Posts: 548
Joined: 10 Jan 2015 14:10

Re: Unnamed Minimalist Lang

Post by Davush » 30 Dec 2019 13:56


Given that an orthography based on underlying forms would not give much visual clue to how words are actually spoken, MiniLang's orthography is kind of a 'halfway' house in that it's not fully phonemic, but nor does it represent every single possible allophone distinctly.

Phoneme : Realisation : Orthography

/t/ [t] <t>
/tu/ [tsu~ku] <tsu~ku>
/ti/ [tsi] <tsi>

/k/ [k~ɣ] <k>

/kʷ/ [kʷ~β] <kw~v>
/kʷo/ [pɤ~βɤ] <po~vo>
/ku/ [pɯ~βɯ] <pu~vu>
/kʷi/ [ki~ɣi] <ki>

/kʲ/ [cç~tɕ] <cy>
/ki/ [tɕi] <ci>

/n/ [n] <n>

/ŋ/ [ŋ] <g>

/ŋʷ/ [ŋʷ~gʷ] <gw>
/ŋʷo/ [mɤ] <mo>
/ŋu/ [mɯ] <mu>
/ŋʷi/ [ŋi] <gi>

/ŋʲ/ [ɲ~j] <gy>
/ŋi/ [ɲi~ji] <gyi>

/x/ [x~h] <h>

/xʷ/ [xʷ~hʷ] <hw>
/xʷo/ [ɸɤ] <fo>
/xu/ [ɸɯ] <fu>
/xʷi/ [xi~hi] <hi>

/xʲ/ [ç~ɕ] <sy>
/xi~si/ [ɕi] <si>

/s/ <s>
/r/ [ɾ~d] <r>

/N/ followed by a non-nasal consonant is always just <n>, with voicing represented if present:

/Nt Nk Nx/ [nd ŋg ŋh] <nd ng nh>
/Nkʷ Nkʲ/ [ŋgʷ~mb ŋdʑ] <ngw~nb nj>
/Nxʷ Nxʲ/ [ŋxʷ~mɸ ɲɕ] <nhw~nf nsy>
/Ns/ [nz] <nz>
/Ntu Nti/ [ndzu~ŋgu ndzi] <ndzu~ngu ndzi>

When /N/ is followed by a nasal, it is written as geminate:
/Nn Nŋ/ [nn ŋŋ] <nn gg>
/Nŋʷ Nŋʲ/ [ŋŋʷ~mm ɲɲ] <ggw~mm ggy>

The main 'weirdness' is /ŋ/ being represented by <g> also in labialised and palatalised forms: /ŋʷ ŋʲ/ <gw gy>. Therefore <ng> is only [ŋg], but <g gg> are [ŋ ŋŋ].

Word-finally, /N/ [ŋ~ɰ̃] is just <n>.

Where Hμ and Qμ cause geminates, they are written as such. Word finally they are <h> and <'>

/taH-ka/ [takka] <takka>
/taQ-ka/ [takka] <takka>
/taH/ [tàà~tàh] <tah>
/taQ/ [táá~táʔ] <ta'>

Word-initial geminates are also written as such:

/xxa/ [xxa~hha] <hha>
/ŋŋʷo/ [mmɤ] <mmo>

Pitch-Accent in the Orthography

Tone is only represented on the lexical level. Phrase/Intonation Unit-final changes are not marked in the orthography, but speakers apply them as necessary. A general rule is that flat sequences (HH, LL) are unmarked, and an upstep to H is marked with acute, and downstep to L with a grave.

In monomoraic words, both H and L are always marked with acute and grave respectively. Monomoraic lengthening at the end of an IU is not represented.
/nó/ <nó> vs /nò/ <nò>

Bimoraic words distinguish HL, LH and HH, with HH being unmarked:
HL: háta /xátà/
LH: hatá /xàtá/
HH: hata /xátá/

Few bimoraic words are lexically LL, which most speakers merge with HH. These can be indicate with two graves if necessary:
LL: hàtà

Trimoraic words mark the rise or fall following/preceding two flat tones:

HHL hakatà
HLL hákata
LLH hakatá
LHH hàkata
HHH hakata

Quadrimoraic words

In HL.HL and LH.LH sequences, H is marked both times:
háta.káta, hatá.katá

HLLL and LHHH only mark the first H or L:
háta.kata, hàta.kata

HH.LL and LL.HH mark the initial H and L of each bimoraic unit:
háta.kàta, hàta.káta

HH.HL and LL.LH only mark the H or L:
hata.katà hata.katá

Epenthetic /ʔ/
This is generally written where a speaker would insert it to break up long chains of vowels, especially where an upstep occurs. It is represented by <'>

So, the words from previous posts can be romanised thus:

/ína/ [ína] <ína>
/íxa/ [íha] <íha>
/ínaíxa/ [ínáíha] <ínaíha> (HL.HL but realised HH.HL. The LH > HH rule across vowels is not applied in the orthography)

/aé/ [áé] <aé>
/kaéaé/ [káéʔáé] <kaé'aé> (again, LH > HH across vowels is not written, but is realised as such)

/ŋà/ [ŋà] <gà>
/ssàh/ [ssàà] <ssàh>

MiniLang now has a name (or rather, several)!

It is variously known as Ssírúito / Ssírúiko / Ssíko. Ssírúi itself means something like 'familiar speech', with -to being a 1pl. exclusive possessive, and -ko inclusive. Therefore between speakers it is known as Ssírúiko, or simply Ssíko (Our Speech), but to outsiders Ssírúito. The exclusive form was used for the Anglicisation: Shiruitoan, corresponding roughly to the pronunciation of Ssírúito as [ɕɕírɯ́ìtò].

Typologically, it is head-initial but I am unsure whether VSO or VOS will be the 'neutral' word-order. In any case, topicalisation will be a feature, with topics moving to the pre-verbal position. So SVO or even OVS may occur.
Following head-initial trends, adjectives follow nouns, and TAM marking etc. are usually pre-verbal particles. The language also uses sentence final particles and honorifics.

Some pitch accent and other stuff will likely change as the grammar progresses.

Common Nouns and Proper Nouns

Shiruitoan distinguishes two main types of noun: Common and Proper, which are marked differently. This was partly inspired by some Austronesian systems.

Proper nouns include personal names, place names, kinship terms, pronouns. Perhaps some particular events are also proper nouns.
Common nouns include everything else (objects, abstract nouns, etc.). Kinship terms can also be common nouns when they are used with a more general sense, not referring to a specific person.

I may expand this, but it is enough for now.

An interesting feature of this system is that the 3rd person pronouns are essentially an open class, which are formed by transforming a common noun into a proper noun, so 'she' could be represented variously by words such as 'woman, girl, lady, queen' etc. with usage depending on context/politeness/honorifics.

Copula Clauses

Shiruitoan does not have a copula per se, but rather a way of marking predicate and subject in such clauses, which occur in various types. This post will look at identifying clauses, which are most commonly N = N type phrases.

Copula clauses are predicate initial, and predicates are marked by a particle/proclitic. Many particles are essentially 'accentless', with their accent being determined entirely by the word to which they are bound.

Tsu marks a Proper Noun predicate:

ína [ína] '(my) mother'
tsu ína [tsú ína] 'PRED (my) mother'

íha [íha] '(my) father'
tsu íha [tsú íha] 'PRED (my) father'

Subjects of Copula Clauses

Common nouns are marked as Subject by a particle which has several realisations depending on the noun it is attached to.

If the noun begins with a vowel or a consonant which cannot be geminated, it is either a/i/u. With /i/ being used for nouns whose initial vowel is /i, e/, and /u/ for nouns in /u, o/. They are written with a following <'> if the word begins with a vowel.

If the consonant can be geminated, this becomes the subject form.

When attached to monomoraic H words, the pattern is HH. With L words it becomes HL.
With bimoraic HH words, they become L. With LH > LLH, and HL > HHL. I need to figure out what is going on systematically with unaccented morphemes/particles.

te /té/ 'man'
ite /íté/ 'SUBJ man'

no /nó/ 'woman'
nno /ńnó/ 'SUBJ woman' (initial /n/ can be geminated)

ua /úá/ 'lake'
u'úá /ùʔúá/ 'SUBJ lake''

Sentence Final Particle 'a'

a is a neutral sentence final particle common in copula clauses. It is always low after HL, HH. After LH it remains H.


Tsu íha ite a
[tsú íhà íté à] H HL HH L
PRED father SUBJ=man a
'(The) man is (my) father'

Tsu ína nno a
[tsú ína ńnó à] H HL HH L
PRED mother SUBJ=woman a
'(The) woman is (my) mother'

Tsu kae ite a
[tsú káé íté à] H HH MM L (Long sequences of H show a gradual descent, so final HH is realised with mid-tone before final L)
PRED elder.sibling SUBJ=man a
'(The) man is (my) elder sibling'

Diachronics and Sister Language(s)

I've decided to make a sister-language alongside Shiruitoan. I don't usually pay much attention to diachronics, but the relatively simple syllable structure means there (hopefully) won't be too much sound-change induced chaos that more inflecting languages seem to undergo. There will also be a third language, which diverged earlier.

Shiruitoan and its sister-language are spoken on an island roughly the size of New Guinea, which will have a geographic boundary clearly 'splitting' the island (and the languages) in two. Most likely a mountain range. I'm unsure of the climate, but it will probably be Cfa (Humid Subtropical) or thereabouts.

Each language will have about 5-6 million speakers, with the earlier-diverged language having about 2 million and maybe spoken in a more isolated region/outlying island(s). Shiruitoan and its sister-language, although separate, will have been in close contact, leading to quite a lot of cross-borrowings which can make some sound changes seem more 'sporadic' than consistent.

Nonetheless, there are clear correspondences as the following tables show:

Proto-Language –– Shiruitoan –– Sister-Language

*a *ya *wa –– a ʲa ʷa –– a ɛ ʌ
*ye *wo –– ʲɤ ʷɤ –– ɪ~iɛ̯ ʊ~uɔ̯
*i *wi –– i i –– i uɨ~uː
*u *yu –– u ʲu –– iɨ~iː
*ɨ –– e~o –– ɨ

*ɨyɨ *ɨwɨ –– e o -- ɨː
*ɨye ɨwo -- oe eo -- ɨyɪ ɨwʊ
*aya *awa -- ea oa -- ɛː ʌː

The main difference with the vowels is that Shiruitoan moved the glide-vowels into palatalised/labialised series on velars, while the sister-language developed more vowel contrasts, giving the inventories:

Shiruitoan: /a e i o u/
Sister-Language: /a ɛ ʌ i ɪ ɨ u ʊ/ (ignoring length for now)

Proto Language –– Shiruitoan –– Sister-Language

*p –– ∅~h –– p~b
*t –– t~ts~s –– t~d
*k –– k~p –– k~ʔ

*m –– ∅ –– m~b
*n –– n –– n
*ŋ –– ŋ~m –– g

*s –– s~ɕ –– s
*x –– h~ɕ~ɸ –– h~∅

*r –– r –– r
*y –– ∅ –– y~∅
*w –– ∅ –– w~∅

The main differences being the development of a voiced series in the sister-language, due to voicing of intervocalic stops and also word-initial /m ŋ/ > /b g/ in some contexts. /g/ also underwent a further change to /dʑ/ before *y in the sister-language, giving a stop-inventory of:

/p b t~d k g~dʑ ʔ/ in comparison to Shiruitoan's /t k kʷ kʲ/.

Sister-lang might also have /w j/ fortify between vowels > /v dʑ/

An example:

*ŋye-kɨ means 'our side/region', a common way of referring to the two language regions.

In Shiruitoan: *ŋyé-kɨ > *ŋʲɤ-ko > /ɲɤɣɤ~jɤɣɤ/
In Sister-Lang: *ŋye-kɨ > *gjɪ-ʔɨ > /dʑɪ́ʔɨ/

The two languages can also be referred to in this way:

*xíxí-ŋyé-kɨ 'Speech of Our Side' becoming /ɕɕíɲɤ́ɣɤ/ <Ssínyóko> and /hiidʑɪʔɨ/ respectively.

Similarly, the cognate of Ssírúiko from *xíxí-rúwi-kɨ would be /hiiruvɨʔɨ/, and using the exclusive suffix -tɨ: /híírúvɨdɨ/ so possibly Hiruvidian.

This is all really interesting and looks much more naturalistic than most minimalist phonologies. The level of phonological depth is very impressive.

Post by Davush » 06 Jan 2020 12:35

VaptuantaDoi wrote:
03 Jan 2020 23:03
This is all really interesting and looks much more naturalistic than most minimalist phonologies. The level of phonological depth is very impressive.
Thanks! I've probably made it sound far more complicated than it needs to be, and pitch accent isn't settled yet, so hopefully I'll have a more concise description...

I've also changed the sound changes to the sister-language somewhat, so it is now probably called Hirovuruan, from xíxí-ruwi-tɨ > /hiiruvɨɾɨ/.


Demonstratives come after the noun. Both Shirui and Hirovu have 4 demonstratives: proximal, medial, distal and neutral.

Prox: -kkwa / -ka
Med: -kki / -ku
Dist: -ppo / -bu
Neut: -ha / -he~ha

/té/ 'man'
tékkwa /tékkwa/ 'this man'
tékki /tékki/ 'that man'
téppo /téppɤ/ 'that man over there'
téha /téha/ 'this/that/the man'

túu /tɨɨ/ 'man'
túuqa /tɨɨqa/ 'this man'
túuqu /tɨɨqɨ/ 'that man'
túubu /tɨɨbɨ/ 'that man over there''
túuhe /tɨɨhæ/ 'this/that/the man'

These point to something like *kwa, *kwi, *kwo and *ha in the Proto-Language, although *kwo is unclear, as Hirovu has -bu, which may reflect influence from Shirui -ppo, which would be a regular outcome of *kwo. The expected Hirovu outcome would be -qo.

They can combine with pre-nominal determines.

The Focus Marker

Noun phrases can be marked for focus by a pre-nominal particle, which are similar to cleft phrases in English in that they highlight new or contrastive information.

Shirui: ko /ko/
Hirovu: káa /káa/

Copular Phrases

I have changed how copular phrases are handled. Predicates are introduced by 'ha' in S and 'a' in H. There is also a polite alternative 'sa' (se in H) which occurs with the polite sentence-final particle 'ga' in Shirui, and 'je' in Hirovu. This perhaps derives from the 3rd person pronoun 'sása / sáha'.

Ha íhang itékkwa a /a íhaŋ ítékkwa a/
This man is my father

In H, this is exactly the same:
A íbang u túuqa va. /a ibaŋ ɨ tɨɨka va/

Polite forms:
S: Sa íhaka itékki ga /sa íhaɣa ítékki ŋa/
That man is your father

H: Se íbe'e u túuku je. /sæ íbæʔæ ɨ tɨɨkɨ dʑæ/

Either element can take focus marking, but is replaces the predicate marker 'a/sa' in Shirui. The polite predicate marker 'se' can occur with Káa in H:

S: Kou tékkwa íhang a / H: Káa túuka a íbeng va or polite H: Káase túuka e íbe'e je. /kɑɑsæ tɨɨkɑ æ ibæʔæ dʑæ/
THIS MAN is my father

Kou íhang itékkwa a / Káa íbeng u túuka va.
This man is MY FATHER
Last edited by Davush on 10 Jan 2020 12:55, edited 1 time in total.

Aspect Markers and Other Stuff

Hirovuruan has yet again changed, and now likely has a vowel system of /a i u ɨ/ <a i o u>.

I am only going to focus on Shiruitoan for this post however. Default word order is either VSO or VOS. SVO and OVS are also common when the focus marker 'ko' is used. Verbs in the initial position are obligatorily preceded by an aspect marker.

The predicate marker 'ha' is considered 'neutral' in terms of aspect and may better be thought of as a 'dummy' marker. Aspect particles will all have polite forms, which usually also combine with polite sentence-final particles. I will only focus on the informal forms here.

Arguments of a verb usually require the default determiner if no other determiners are present. This isn't necessarily a definite article or subject marker however.

Common nouns prefix a/i/u, or geminate the initial consonant where possible. E.g. te 'man' > ite, no 'woman' > nno, hwà 'village > hhwà. In casual speech, a/i/u may also be realised as gemination, e.g. ite > tte.

Proper nouns (which include personal names, place names, kinship terms, and certain events) take the prefix (v)H where (v) is an echo vowel that is dropped if the word begins with a vowel. H- assimilates to hh/ff/ss depending. E.g., íha 'father' > ssíha, káé 'older sibling' > àkkáé.

agá /aŋá/ 'to eat'

Ha agá ite a.
/hà àŋá íté à/
'The man eats/ate/is/was eating.'

Ha agá ssíha a.
/hà àŋá ɕɕíhà à/
(My) father eats/ate/etc.

súfu /súɸɯ/ 'to drink'

Ha súfu nno a.
/há súɸɯ nnó à/
'The woman drinks/drank/etc.'

Ha súfu àkkae a.
/há súɸɯ àkkáé à/
(My/Your) elder sibling drinks/drank/etc.

I'm not sure what to do regarding pitch marking of particles whose pitch is determined by the following word. For example, ha preceding HL is H, but L preceding LH. Ha agá > LLH, but ha súfu > HHL.

The focus marker ko moves the noun phrase to the pre-verbal position. In this case, ha is absent, compare:

Ha agá ite a. A/The man eats.
Ko te agá a. It is the man who eats.

'Simple' Objects
This isn't properly worked out yet, but I think simple objects will attach directly to the verb without any determiner and may refer more often to indefinite or non-prominent objects. This could be considered a form of incorporation. There might be a requirement that 'Simple Objects' be mono/bimoraic.

'The man sees/saw a woman' (any woman, unspecified)
Ha taka no ite a. /hà táɣá nó íté à/ L HH H HH L
ASP see woman a. (see-woman man)

'My father eats/ate food'
Hà agá éa ssíhang a. /hà aŋá éa ɕɕíhàŋ à/ L LH HL HL L
ASP eat thing DET.father a (eat-thing father)

This could also lead to a kind of Mandarin type situation, where verbs commonly have a 'dummy' object, as in the above sentence where éa (thing) fills the object role. This probably has something to do with transitivity.

If VSO word order is used, perhaps this gives more referentiality to the object or indicates the object will become a topic in following discourse. In this case, a determiner must be present.

Ha taka ite nno a.
The man saw the/a certain woman

Ha agá ssíhang íéa a.
My father ate something.

