Internal reconstruction is a method of recovering information about a language's past from the characteristics of the language at a later date. Whereas the comparative method compares variations between languages—such as in sets of cognates—under the assumption that they descend from a single proto-language, internal reconstruction compares variant forms within a single language under the assumption that they descend from a single, regular form. For example, these could take the form of allomorphs of the same morpheme.

The basic premise of internal reconstruction is that a meaning-bearing element that alternates between two or more similar forms in different environments was probably a single form in the past, into which alternation was introduced by the usual mechanisms of sound change and analogy.

Language forms reconstructed by means of internal reconstruction are denoted with the pre- prefix, similar to the use of proto- to indicate a language reconstructed by means of the comparative method, as in proto-Indo-European. For example, an internally-reconstructed earlier form of Old English (OE for short) could be referred to as pre-OE, an unattested stage intermediate between comparatively-reconstructed Proto-Germanic and the earliest attestation of Old English. (As it happens, in actual practice this particular internally-reconstructed language is often referred to as prehistoric Old Englishpreh. OE or prehOE for short; were it not for this old usage, however, pre-OE would serve well, as it conforms precisely to the modern convention.)

It is even possible to apply internal reconstruction to proto-languages reconstructed by the comparative method. For example, performing internal reconstruction on proto-Mayan would yield pre-proto-Mayan. In some cases it is also desirable to use internal reconstruction to uncover an earlier form of various languages, and then submit those pre- languages to the comparative method. Care must be taken, however, because internal reconstruction performed on languages before applying the comparative method can remove significant evidence of the earlier state of the language and thus reduce the accuracy of the reconstructed proto-language.

Role in historical linguistics

When undertaking a comparative study of a hitherto under analyzed family of languages it is worthwhile to get an understanding of their systems of alternations, if any, before tackling the greater complexities of analyzing entire linguistic structures. For example, the Type A forms of verbs in Samoan (as in the example, below) are the citation forms, i.e., the forms in dictionaries and word lists, but when making historical comparisons with other Austronesian languages it would be a blunder to use Samoan citation forms with parts missing. (And an analysis of the verb sets would alert the researcher to the certainty that many other words in Samoan have lost a final consonant.) Another way of looking at it is that internal reconstruction gives access to an earlier historical stage, at least in some details, of the languages being compared, and this can be valuable: the more time that passes, the more changes accumulate in the structure of a (living) language, and for this reason we always try to use the earliest known attestations of languages when working with the comparative method.

Internal reconstruction, when not a sort of preliminary to the application of the comparative method, is most useful in cases where the analytic power of the comparative method is unavailable.

Internal reconstruction can also draw limited inferences from peculiarities of distribution. Even before comparative investigations had sorted out the true history of Indo-Iranian phonology, some scholars had wondered if the extraordinary frequency of the phoneme /a/ in Sanskrit (20% of all phonemes together, an astonishing total) might point to some historical fusion of two or more vowels. (In fact, it represents the final outcome of five different Proto-Indo-European syllabics two of which—the syllabic states of /m/ and /n/—can be discerned by the application of internal reconstruction.) But in such cases, internal analysis is better at raising questions than at answering them. The extraordinary frequency of /a/ in Sanskrit hints at some sort of historical event, but does not lead, and cannot, to any specific theory.

Issues and shortcomings

Neutralizing environments

One issue in internal reconstruction is neutralizing environments, which can be an obstacle to historically correct analysis. Consider the following forms from Spanish, spelled phonemically rather than orthographically:

infinitive 3rd person sg
























One pattern of inflection here shows alternation between /o/ and /ue/; the other type has /o/ throughout. The lexical items are all basic, i.e. not technical or high register or obvious borrowings, so their behavior is likely to be a matter of inheritance from an earlier system rather than the result of some native pattern overlaid by a borrowed one. (An example of such an overlay would be the non-alternating English privative prefix un- next to the privative prefix in borrowed Latinate forms, which alternates between in-, im, ir-, il-.)

One might guess that the difference between the two sets can be explained by two aboriginally different markers of the 3rd person singular, but a basic principle of linguistic analysis is that one cannot analyze data they do not have (and should not try to). Besides that, positing such a history violates the principle of parsimony (Occam's Razor): it adds a complication to the analysis unnecessarily, a complication moreover whose chief result is to restate the observed data as a sort of historical fact. That is, the result of the analysis is the same as the input. And as it happens, the forms as given yield readily to real analysis, so there is no reason to look elsewhere.

The first assumption is that in pairs like bolbér/buélbe the root vowels were originally the same. We have two choices if we stick to the data: either something happened to make an original */o/ turn into two different sounds in the 3rd person singular, or else the distinction in the 3rd sg. is original, and the vowels of the infinitives are in what is called a neutralizing environment (i.e., where an original contrast is lost because two or more elements "fall together", i.e., coalesce into one). There is no way of telling ("predicting" as the jargon has it) when /o/ will break to /ué/ and when it will remain /ó/ in the 3rd sg. On the other hand, starting with /ó/ and /ué/ as givens, we can write an unambiguous rule for the infinitive forms: /ué/ becomes /o/. And one might notice further, upon looking around in Spanish, that the nucleus /ue/ is found only in tonic syllables anywhere, not just in verb forms.

This analysis gains plausibility from the observation that the neutralizing environment is atonic, whereas the nuclei are different in tonic syllables. This fits with the commonplace that vowel contrasts are often preserved differently in tonic and atonic environments, and further that the usual relationship is that there are more contrasts in tonic syllables than in atonic ones (owing to previously distinctive vowels having fallen together in the atonic environment).

The idea that original */ue/ might fall together with original */o/ is unproblematic; so we internally reconstruct a complex nucleus *ue which remains distinct when tonic and coalesces with *o when atonic.

However, the true history is quite different: there were no diphthongs in Proto-Romance; rather, there was an *o (reflecting Latin ŭ and ō) and an (reflecting Latin ŏ). In Spanish, as in most Romance languages, the two fall together in atonic syllables, but in stressed syllables the breaks into the complex nucleus /ue/. In sum, internal reconstruction accurately points to two different historical nuclei in atonic /o/, but gets the details wrong.

Lyle Campbell (who devotes a whole chapter in the book cited below to internal reconstruction) raises an interesting caution: if internal reconstruction is applied to members of a compact subgroup prior to the exercise of comparative analysis, there is a risk that a shared innovation definitive of the subgroup itself will be analyzed out of existence. His example is consonant gradation in Finnish, Estonian, and Lapp (Saami). A pre-gradation phonology can be discerned for each of the three groups via internal reconstruction, but in fact it was manifestly an innovation in the Finnic branch of Uralic, not of the individual languages, and indeed it was one of the innovations defining that branch. This fact would be missed if the comparanda of the Uralic family included as primary data the "de-graded" (if you will) states of Finnish, Estonian, and Lapp.

This is an interesting point, and an insightful one, but it does not portend any serious problems. Even if such a mistake were to be made, sooner rather than later a historian would notice the result: that nearly identical sound laws were being formulated for each of several closely related languages. Such things do happen in fact, with the spread of areal features, or with commonplaces (say, devoicing in word-final position), but the whole point of setting up subgroups, branches, and so on, in the first place, is that it is more plausible that a phonological (or morphological) innovation, particularly a complex or unobvious one, took place only once in the history of the group—i.e., in the speech community of a proto-branch—rather than separately and repeatedly in a whole array of daughter languages. (And Finnic consonant gradation is in the character of a complex and unobvious innovation.) That is, the blunder warned of by Campbell is harmless enough, given that its mischief would necessarily be temporary because soon noticed and corrected.

Not all synchronic alternation is amenable to internal reconstruction. Even though cases of secondary split (see phonological change) often result in alternations that signal a historical split, the conditions involved are usually immune to recovery by internal reconstruction. For example, the alternation of voiced and voiceless fricatives in Germanic languages, as described in Verner's law, cannot be explained only by examination of the Germanic forms themselves. This is in fact a general characteristic of secondary split, though occasionally internal reconstruction can work. Primary split in principle is recoverable by internal reconstruction whenever it results in alternations, but later changes can render the conditioning irrecoverable.



English has two patterns for forming the past tense in roots ending in apical stops, i.e. /t d/.

Type I
Present Past
wait waited
reflect reflected
greet greeted
fret fretted
rent rented
note noted
waste wasted
adapt adapted
regret regretted
fund funded
found founded
grade graded
abide abided
plod plodded
blend blended
end ended
Type II
Present Past
put put
set set
cut cut
cast cast 'throw'
meet met
bleed bled
rid rid
shed shed
send sent
bend bent
lend lent

Although modern English has very little affixal morphology, its number includes a marker of the preterite, apart from verbs with vowel changes of the find/found sort, and the great majority of verbs that end in /t d/ take /əd/ as the marker of the preterite, as seen in Type I.

Can we make any generalizations about the membership of verbs in Types I and II? Most obviously, the Type II verbs all end in /t/ and /d/, though that is just like the members of Type I. Less obviously they are all without exception basic vocabulary. Note well that this is a claim about Type II verbs and not a claim about basic vocabulary: there are basic home-and-hearth verbs in Type I, too. But there are no denominative verbs in Type II, that is, verbs like to gut, to braid, to hoard, to bed, to court, to head, to hand. There are no verbs of Latin or (a little harder to spot) of French origin; all stems like depict, enact, denote, elude, preclude, convict are Type I. Furthermore, all novel forms are inflected as Type I: all native speakers of English would presumably agree that the preterites of to sned and to absquatulate would most likely be snedded and absquatulated.

The inference from these considerations is that the absence of a "dental preterite" marker on roots ending in apical stops in Type II reflects a more original state of affairs, i.e., that in the early history of the language the "dental preterite" marker was in a sense absorbed into the root-final consonant when it was /t/ or /d/; the affix /əd/ after word-final apical stops then belongs to a later stratum in the evolution of the language. The same suffix is involved in both types, but with a 180° reversal of "strategy": other exercises of internal reconstruction would point to the conclusion that the aboriginal affix of the dental preterites was /Vd/ (where V = a vowel of uncertain phonetics, and of course an inspection of Old English directly would reveal several different stem-vowels in the mix). In modern formations, it is stems ending in /t d/ that preserve the vowel of the preterite marker; in an earlier day, odd as it might seem, the loss of the stem vowel had taken place already prehistorically whenever the root ended in an apical stop.


In Latin there are many examples of "word families" showing vowel alternations. Some of these are examples of Indo-European ablaut: pendō "weigh", pondus "a weight"; dōnum "gift", datum "a given", caedō "cut" perf. ce-cid-, dīcō "speak", participle dictus, that is, inherited from the proto-language. (Note: all unmarked vowels in these examples are short.) But some, involving only short vowels, clearly arose within Latin. Examples:

faciō "do", participle factus, but perficiō, perfectus "complete, accomplish"; amīcus "friend" but inimīcus "unfriendly, hostile"; legõ "gather", but colligō "bind, tie together", participle collectus; emō "take; buy", but redimō "buy back", participle redemptus; locus "place" but īlicō "on the spot" (< *stloc-/*instloc-); capiō "take, seize", participle captus but percipiō "lay hold of", perceptus; arma "weapon" but inermis "unarmed"; causa "lawsuit, quarrel" but incūsō "accuse, blame"; claudō "shut", inclūdō "shut in"; caedō "fell, cut", but concīdō "cut to pieces"; damnō "find guilty" but condemnō "sentence" (verb); and many, many more of the same sort. Briefly: vowels in initial syllables never alternate in this way, but in non-initial syllables (omitting some details) short vowels of the simplex forms become -i- before a single consonant and -e- before two consonants; the diphthongs -ae- and -au- of initial syllables alternate with medial -ĩ- and -ũ-, respectively.

Now, reduction in contrast in a vowel system (for that is what has happened here) is very commonly associated with position in atonic (unaccented) syllables, but in Latin the tonic accent of say reficiō and refectus is on the same syllable as simplex faciō, factus, and that is true of almost all of the examples given (cólligō, rédimō, īlicō (initial-syllable accent) are the only exceptions), and indeed for most of the examples of these alternations throughout the language. Obviously the reduction of contrast points in the vowel system (-a- and -o- fall together with -i- before a single consonant, with -e- before two consonants; long vowels replace diphthongs) cannot have anything to do with the location of the accent in attested Latin:

The accentual system of Latin is well-known, partly from statements by Roman grammarians and partly from agreements among the Romance languages on the location of tonic accent: the tonic accent in Latin fell three syllables before the end of any word with three or more syllables, unless the second-to-last syllable (called the penult in classical linguistics) was "heavy", i.e. contained a diphthong, a long vowel, or was followed by two or more consonants, in which case that syllable has the tonic accent. Thus perfíciō, perféctus, rédimō, condémnō, inérmis.

If there is any connection, then, between word-accent and vowel-weakening, the accent in question cannot be that of Classical Latin. Since the vowels of initial syllables never show this weakening (to oversimplify a bit), the obvious inference is that at some point in prehistory, the tonic accent must have been a "stationary" accent always falling on the first syllable of a word. Such an accentual system is very common in the world's languages (Czech, Latvian, Finnish, Hungarian, and, with certain complications, High German and Old English, etc., etc.), but was definitely not the accentual system of Proto-Indo-European. Therefore, on the basis of internal reconstruction within Latin, we discover a prehistoric sound-law that replaced the inherited accentual system with an automatic initial-syllable accent which, in turn, was replaced by the attested accentual system. As it happens, Celtic languages likewise have an automatic word-initial accent (subject, like the Germanic languages, to certain exceptions, mainly certain pretonic prefixes). Celtic, Germanic, and Italic languages share some other features as well, and it is tempting to think that the word-initial accent system was an "areal feature" as it is called, but that would be more speculative than the inference of a prehistoric word-initial accent for Latin specifically.

There is a very similar set of givens in English, but with very different consequences for internal reconstruction. There is pervasive alternation between long and short vowels (the former now phonetically diphthongs): between // and /ɪ/ in words like divide, division; decide, decision; between // and /ɒ/ in words like provoke, provocative; pose, positive; between // and /ʌ/ in words like pronounce, pronunciation; renounce, renunciation; profound, profundity; and many other examples. As in the Latin example, the tonic accent of present-day English is often on the syllable showing the vowel alternation. In the Latin case, it was possible to frame an explicit hypothesis regarding the location of word-accent in prehistoric Latin that would account for both the vowel alternations and the attested system of accent. Indeed, such a hypothesis is hard to avoid. By contrast, the alternations in English do not point to any specific hypothesis, only a general suspicion that word accent must be the explanation, and that the accent in question must have been different from that of modern-day English. Where the accent used to be, and what the rules (if any) are for its relocation in present-day English, cannot be recovered by internal reconstruction. In fact, even the givens are uncertain: it is not even possible to tell whether we are dealing with lengthening in tonic syllables or shortening in atonic ones. (And in historical fact, both are involved.) Part of the problem is that English has alternations between diphthongs and monophthongs (i.e., between Middle English long and short vowels, respectively) from no fewer than six different sources, the oldest (for example write, written) dating all the way back to Proto-Indo-European. But even if it were possible (it is not) to sort out the corpus of affected words, sound changes subsequent to the relocation of tonic accent have eliminated the necessary conditions for framing accurate sound laws. It is, in fact, possible to reconstruct the history of the English vowel system with great accuracy, but not by internal reconstruction. (In a nutshell, at the time of the atonic shortening, the tonic accent lay two syllables to the right of the affected vowel and was subsequently retracted to its present-day position. But in words like division and vicious (cf. vice) have lost a syllable in the first place, which would be an insuperable obstacle to correct analysis.)


  • T. Givón, Internal reconstruction: As method, as theory, Typological Studies in Language (2000).
  • J. Kuryłowicz, On the Methods of Internal Reconstruction, Proceedings of the Ninth International Congress of Linguists (1964).
  • Anthony Fox, Linguistic Reconstruction: An Introduction to Theory and Method, Oxford University Press (1995), ISBN 0-19-870001-6.
  • .