Printed version = Tapani Salminen: The rise of the Finno-Ugric language family. — Early contacts between Uralic and Indo-European: linguistic and archaeological considerations. Edited by Christian Carpelan, Asko Parpola and Petteri Koskikallio. Mémoires de la Société Finno-Ougrienne 242; Helsinki 2001. 385–396.

There exist a good number of often radically different scenarios about the early history of the Finno-Ugrian (Uralic) language family. The crucial questions can be formulated as follows. Firstly, how are the Finno-Ugrian languages related to each other, or more specifically, how are they properly classified? Secondly, where was the oldest centre of expansion of the Finno-Ugrian family? Thirdly, when did the first contacts between Finno-Ugrian and Indo-European take place? Fourthly, what are the prospects for a distant genetic relationship between Finno-Ugrian and Indo-European? It may be said that to all these questions there is one standard answer but in each case both the standard and the competing views require critical evaluation. This essay attempts to give a general overview of some of the problems that scholars need to tackle in the future, without going into details of various controversial issues or referring to all important publications in the field.

Defining features

If we try, as we should, keep the concepts ‘proto-language’ and ‘reconstruction level’ apart, it is self-evident that proto-languages have been natural languages, and typical features of a natural language are variation and change, which are connected with both internal contacts promoting unity of the language area and external contacts leading to differentiation. A natural proto-language, so to speak, must have been a dynamic dialect continuum. Changes frequently result in the increase of dialectal differences, which is a necessary but not a sufficient condition for an actual break-up of the proto-language into a number of daughter languages. Rather, new languages are created so that the transitional dialects between the main dialects of the proto-language disappear through assimilation to the main dialects or other languages, which yields clear-cut units that can no longer profoundly influence each other but continue to change independently. Paradoxically, then, the extinction of transitional dialects changes the status of dialects to languages.

The outcome of the recurrent divisions within a language family can well be captured in the classical tree model, although it is important to keep in mind that the tree model is not a theory of genetic relationship but a means of illustrating it. There is also no need to assume that the structure of the tree model must always be binary, with every node divided into two branches, but a tree with several equal branches is natural and even expected in the case of a language family whose subsequent diversity appears to be the result of a rapid expansion from the original language area to different directions. Finno-Ugrian and Indo-European are prime examples of such highly expansive language families, but there is a difference in the traditions of classification. The Finno-Ugrian family, which is the name that is provocatively used here instead of Uralic, is almost always classified according to a binary tree model, which is based on the status of the Finnish language as the focal point of the classificatory scheme.

In other words, it is a grave error to assume that a single innovation equals a break-up of the proto-language, and that an established isogloss within the family corresponds to an early language boundary. Such an approach is typical of scholars who insist on a binary classification, but do not recognize that the starting-point of any changes must have been a proto-language which was already characterized by variation. Furthermore, the choice of decisive innovations in a binary classification is generally quite random, because true language boundaries must have become established much later than the oldest isoglosses dividing the language area.

Finno-Ugrian languages, in the widest sense of the word, share a few core vocabulary items, though when critically examined, the number of satisfactory etymologies appears smaller than was thought earlier (Janhunen 1981; Sammallahti 1988). Whether or not there were borrowings from Indo-European that spread to all branches of Finno-Ugrian including Samoyed, is still being argued but the case for such borrowings seems quite strong (Koivulehto 1991; Rédei 1986, 1988). While scholars agree on many details of the Proto-Finno-Ugrian sound system, there are also different views about several crucial questions. The basic morphological structure is, hopefully, better understood, and three types of suffixal markers can be quite reliably reconstructed in Proto-Finno-Ugrian, namely a set of case markers and two sets of personal suffixes (Janhunen 1982). The personal suffixes, in particular, can be regarded as defining features of the Finno-Ugrian language family, because on the one hand, they are transparent enough to be recognized as products of agglutination processes of personal pronouns, and on the other hand, certain morphophonological alternations can be reconstructed in the system of possessive suffixes at least (Janhunen 1982), so they had already had time to lose some of their original agglutinative character before any major disintegration of the proto-language. Curiously, Indo-European is characterized by a set of personal suffixes with a similar background, and it might prove interesting to study the possible connections.


In the traditional binary classification of the Finno-Ugrian (Uralic) family (for an illustration, see Häkkinen 1983: 83, 1984: 8), there are two kinds of proto-language. Nine of them (Proto-Saami, Proto-Finnic, Proto-Mordvin, Proto-Mari, Proto-Permian, Proto-Hungarian, Proto-Mansi, Proto-Khanty, and Proto-Samoyed), often given as the lowest nodes in a tree graph, clearly stand apart from each other and from their common predecessor (Proto-Uralic), with a large number of characteristic innovations. The other alleged proto-languages (Proto-Finno-Ugrian, Proto-Finno-Permian, Proto-Finno-Volgaic, Proto-Finno-Saami, Proto-Ugrian, Proto-Ob-Ugrian, and Proto-Volgaic) are supergroups of the nine well-founded branches, and little substantiation has ever been presented for them. Häkkinen (1984) presents a detailed critique of the larger groupings (Finno-Ugrian, Finno-Permian, and Finno-Volgaic) as they are still based on obsolete criteria formulated in the 19th century, and Salminen (2002) evaluates the controversies over the narrower subgroups. Helimski (1982), while critical of such intermediate proto-languages as Finno-Permian and Finno-Volgaic, calling them areal genetic units instead, tacitly assumes the primary division between Finno-Ugrian and Samoyed, but it is difficult to see how Finno-Ugrian should earn a different treatment and why, for instance, Ugro-Samoyed (including Hungarian, Mansi, Khanty, and Samoyed) would not be an areal genetic unit exactly like Finno-Permian and Finno-Volgaic.

As to the factual basis of the suggested primary division, no sound changes were assigned to the intermediate Finno-Ugrian level (and the same was largely true about the other binary nodes as well) until the recent studies by Janhunen (1981) and Sammallahti (1988) who have actually presented a couple of tentative sound changes characteristic of the intermediate levels. A good summary is provided by Sammallahti (1998: 119–122) in his presentation of the historical background of the Saami languages: according to him, there are two Finno-Ugrian, two Finno-Permian, one Finno-Volgaic, and three Finno-Saami sound changes; the actual number is, however, lower, because some of them represent different phases or effects of the same process. It is very difficult to see these results as conclusive: in some cases it may be a question of an illusion created by reconstruction techniques; in other cases there are too few etymologies to establish the actual distribution of the innovation; and with regard to the Finno-Saami sound changes (1) and (2) as posited by Sammallahti (1998: 122), there are no grounds for arguing that they covered Finnic (cf. Sammallahti 1998: 190). The few sound changes involve the history of vowels, and while it is true that Janhunen and Sammallahti have made notable progress in this field, no systematic patterns of innovations have been established as yet, and scholars like Abondolo (1996) have pursued an entirely new picture of the development of vowels.

The established basis for the primary division is, however, not sound changes but the number of shared vocabulary: it is an undeniable fact that with regard to their lexicon, the Samoyed languages form an aberrant branch within the family. However, shared vocabulary is not a criterion for classification since only innovations count, and it is straightforward to assume that a wave of lexical innovations met Proto-Samoyed in the eastern periphery of the area. The other logical option would be that Samoyed had retained the bulk of the original Uralic lexicon which would make Finno-Ugrian the innovative branch, but a situation with massive changes in vocabulary but with no or very few changes in grammar, phonology included, can hardly be expected.

There are two special groups of lexical items used as supporting evidence for the primary division, namely the numerals and the Indo-European loanwords, supposed to be exclusively Finno-Ugrian. By now it should be clear that Samoyed shares two numerals with the rest of the family, ‘two’ and ‘five’ (cf. Rédei 1986–1991), the cognate of the latter meaning ‘ten’ in Samoyed, which appears to be a semantic innovation, especially since the Samoyed word for ‘five’, being four syllables long, looks very much like an innovation. However, the numerals for ‘three’, ‘four’ and ‘six’, if we assume that they were not simply replaced by other words in Proto-Samoyed which seems the likeliest possibility, speak undeniably for the unity of the traditional Finno-Ugrian branch. Whether the distribution of these numerals constitutes a sufficient basis for establishing a proto-language remains an open question.

The argument concerning the Indo-European loanwords, on the other hand, has become largely obsolete because there are a number of words with cognates in Samoyed that are now recognized as being of Indo-European origin, cf. Koivulehto (1991) and Rédei (1986, 1988). There are vocal critics of this idea (cf. Helimski 1995; Napolskikh 1997), but their assessments seem to derive from somewhat outdated views about Finno-Ugrian and Indo-European historical phonology (cf. Anttila 1993).

Whatever the value of the proposed innovations is, the crucial thing is that they are very few; so few that not even their cumulative effect is not sufficient to make a lowest-level intermediate proto-language (e.g., Proto-Finno-Saami) different from the highest-level one (i.e., Proto-Uralic). In other words, by comparing Saami and Finnic alone we reach phonological and morphological reconstructions that are supposed to be distinctly Finno-Saami (also known as “early Proto-Finnic” which is an unfortunate misnomer) but turn out to be virtually identical with Proto-Uralic reconstructions. This state of affairs is, incidentally, evident from the tables including reconstructions on each intermediate level in Sammallahti (1998: 189, 198–202). It must be concluded that in comparison with the well-established proto-languages, the intermediate proto-languages represent a different kind of theoretical construct and, consequently, another taxonomic category. Calling them ‘areal genetic units’ in the Helimskian sense seems an appropriate terminological choice.

There is no need to claim that the intermediate proto-languages in each case lack foundation altogether, but that the evidence for them is scanty, and that in fact, it is possible to draw competing binary trees with as much substance to the alternative intermediate nodes as to the nodes in the traditional binary tree. Creating conflicting binary trees is not difficult, and one serious proposal has been made by Viitso (1997: 921; cf. also Viitso 1995). Furthermore, if a tree following similar standards but from a Samoyed point of view was drawn, it would probably include branches such as Khanty-Samoyed and Ugro-Samoyed, contradicting both the traditional and Viitso’s alternative scheme.

Consequently, it would be a wise move to disqualify the ill-founded intermediate proto-languages in the basic taxonomic description of the Finno-Ugrian (Uralic) language family, and be content with a flat family tree consisting of the nine basic branches (for an illustration, see Häkkinen 1983: 384). This is not to say that higher groups would not require extensive study, quite the contrary, but it would actually be a more fruitful approach from every point of view to treat them as results of areal inter-branch connections rather than properly defined proto-languages. Binary classification as such is a valid possibility, so its apparent invalidity when applied to Finno-Ugrian is simply due to the lack of substantial evidence. The actual historical and linguistic developments that led to the establishment of the nine uncontested branches must have been a highly complex process rather than a neat nine-fold division of the language area, but sticking to a single untenable hypothesis to explain this process does not help but hampers serious study in the field.

The multi-level hierarchy typical of a binary tree also obscures the obvious chain-like structure of the Finno-Ugrian language family, or the network-like structure of Indo-European, for that matter. The problem is that while binary trees look interesting, their non-binary alternative is flat both literally and figuratively, but has one obvious and unquestionable merit, though, namely that it only includes well-founded units representing valid, historically significant proto-languages. In technical terms, non-binary trees need not be called anything else but trees, although sometimes ‘bushes’ and other makeshift terms are used to refer to them.

The lack of hierarchy in a non-binary tree means a lack of predictive power. Nevertheless, since the predictions based on any of the possible higher-level intermediate proto-languages in a binary tree are few, controversial and conflicting, it can be maintained that a non-binary tree is the only version of the tree model that properly and realistically reflects the relations between the well-established branches. Of course, there are other possible models to describe the structure of a language family, most notably the wave model. One model that could be called a circle model is a kind of a compromise between the tree and wave models in that it superficially looks like the wave model and the arrangement of the circles has a similar function, but fundamentally it is a graphic variant of the tree model because it recognizes a number of intermediate proto-languages which have developed from a single parent language and would be further divided into a number of daughter languages (for an illustration, see Salminen 1999: 20). It is richer than a tree only because it can include information about areal connections between branches, and the empty space between circles can be interpreted iconically as representing the transitional dialects whose extinction created the primary language boundaries. It would also be possible to give distances between circles an indicative value, and one place for greater distance might well be between Khanty and Samoyed.

Both the tree and circle models resemble a map, which in the case of the non-binary tree model is, however, not dictated by any principal factors but the geographically based order of the nodes is simply a mnemonic device. The circle model, by contrast, is designed to reflect both genetic and areal connections and it is therefore expected that in most (but not necessarily in all) cases it does form a map-like pattern.


The chain-like distribution of the Finno-Ugrian branches suggests that the original dialect continuum has been created through a rather rapid expansion along a particular ecological zone (expansion in this context only refers to a linguistic phenomenon which can occur with or without large-scale migrations). It is difficult to think that the centre of the expansion could have been very close to either periphery so the safest assumption is still that the homeland was located near to the present nucleus of the language family, that is the area where Mordvin, Mari and Udmurt are spoken, in other words between the Volga and the Urals. The alternatives are a Siberian homeland supported by Napolskikh (1997), and a homeland extending far to the west as described by Sammallahti (1995). Sammallahti finds it possible to connect archaeological and linguistic evidence to support the idea that Proto-Uralic was spoken among the first settlers of the Baltic region, but this seems truly hubristic because early cultural boundaries need not have corresponded to linguistic boundaries any more than they do in historical times, for instance in Siberia, and because a language can spread through diffusion as well as migration.

Some indications about the Urheimat can presumably be found in the oldest and most widely known common lexicon, though there is a great risk of jumping to conclusions in this context. Whatever paleolinguistic evidence is presented in the discussion about Urheimat, one thing is certain: the etymological material must be reliable and well-established. Luckily, Janhunen (1981) and Sammallahti (1988) have critically examined the stock of proposed Uralic etymologies and at least as far as Samoyed material is involved, their etymological word-lists must be regarded as highly conclusive.

To see what happens if the rule of the reliability of etymological material is not respected, we may take a brief look at an etymology concerning a fish-name playing a crucial role in Napolskikh’s famous article (Napolskikh 1993: 49–50). The fish in question is known as ‘round-nosed whitefish’, and Napolskikh himself calls his main point “the round-nosed whitefish argument”. It is not the only fish-name he discusses but, as he readily admits himself, it is the only one pointing to a specifically Siberian homeland, the hypothesis Napolskikh strives to prove. The fish-names in question appear, firstly, locally in Saami with the meaning ‘a little whitefish’, secondly, very scantily attested in Finnish with the meaning ‘a salmon with a hooked nose’, thirdly, in two old records of Northern Khanty in compounds whose meaning is given as ‘round-nosed whitefish’, and, finally, in a single ancient attestation of a compound in Tundra Nenets with the same meaning. Starting from the compounds, the Khanty word may be understood transparently as “a stone fish”. While this is acknowledged by Napolskikh, it can be added that the Tundra Nenets record may well be seen as a temporary formation referring either to “a whirlpool fish” or “a nearby fish”. Itkonen (1956), in his critique of Collinder (1955), regards the comparison of the Finnish and Nenets words as questionable (60), and refers to the Finnish word as a possible Saami loan (63), a conclusion, it may be added, strongly suggested by its phonotactics. Häkkinen (1996: 70) points out that while being critical of earlier treatments of the topic, Napolskikh has failed to take account of the latest results of etymological research. Notably, this etymology is absent from the Uralic etymological dictionary (Rédei 1986–1991) as well as both of the Finnish etymological dictionaries (cf. Joki 1973: 197). Furthermore, while Napolskikh (1993: 49–50) talks about five or six Proto-Uralic fish-names, there are no fish-names at all in the Uralic word-lists by Janhunen (1981) and Sammallahti (1988) and in Sammallahti’s Finno-Ugrian list there are only ‘ide’ and ‘tench’, two fish that occur in a wide territory in northern Eurasia.

On the other hand, Napolskikh (1993: 41–44) may well be right in his claim that the Baltic origin of two important fish-names in Finnic, those for ‘eel’ and ‘salmon’, indicates that the Finno-Ugrian language spread to the Baltic area in a relatively late period. We can simultaneously assume that Proto-Finno-Ugrian, while a natural language with internal variation, was still relatively uniform at the time of its expansion, because these and other distinctly Baltic loans have gone through all Finnic sound changes. After the expansion, Proto-Finno-Ugrian began to disintegrate quickly under the pressure of contact languages.

Viitso (1997) is highly critical of the use of names of animals and plants in defining the Urheimat, and he wisely keeps quiet about details. What we can safely say on the basis of widely attested and etymologically sound material is that the Urheimat was quite far from the sea, and in deep forests rather than tundra or steppe environment, but such reasoning does not narrow down the possibilities very much.

Indo-European contacts

From the point of view of the earliest contacts between Finno-Ugrian and Indo-European it does not matter too much if the primeval Finno-Ugrian and Indo-European centres of expansion are thought to have been located next to each other or not, because even at the time of a relatively late first contact the dialects within the proto-language continuums had not differentiated much. Some Indo-European loanwords have been used as evidence either in classifying Finno-Ugrian languages and locating their Urheimat or for the Indo-Uralic hypothesis. Three cases may be briefly dealt with here, namely the words for ‘bee’ and ‘honey’, the word for ‘copper’, and the word for ‘water’.

Words for ‘bee’ and ‘honey’ of Indo-European origin occur in most Finno-Ugrian languages (e.g. Hungarian méh and méz), but not in Samoyed, which is seen as evidence for a secondary Finno-Ugrian proto-language after Samoyed had split off. The alternative hypotheses, namely that these Indo-European words were once known in the entire proto-language area but were subsequently lost in Samoyed, or, more likely, that the words have spread from one branch to another within an already disintegrated Finno-Ugrian language chain are often rejected without proper consideration. More interestingly, perhaps, the words for ‘bee’ and ‘honey’ have been used for postulating a Proto-Finno-Ugrian homeland subsequent to the primeval Proto-Uralic one. Napolskikh (1997: 137–138) sees as the only possibility that the insect itself was unknown to the speakers of Proto-Uralic and it was therefore borrowed when Proto-Uralic, in the form of Proto-Finno-Ugrian, began to be spoken in the Volga region. Against this view it can be argued that even in an exclusively Siberian Urheimat bees or closely related insects would not have been unknown, and even if the proto-language speakers had encountered bees for the first time after expansion to the Volga region, the natural source of borrowing would have been the alleged aboriginal language of the Volga region rather than Proto-Indo-European which according to everyone was spoken further away. The most probable explanation is therefore that the words for ‘bee’ and ‘honey’ were borrowed because they represented a cultural innovation.

Interestingly, while there appear to be no common Finno-Ugrian words related to agriculture, there is a word which refers to a metall, in some languages meaning ‘copper’ (e.g. Finnish vaski) and in some others ‘iron’ (e.g. Forest Nenets wyesya), and which can be regularly reconstructed to the earliest proto-language on the basis of cognates from the westernmost and easternmost branches as *wäśkä (Sammallahti 1988: 541); for a competing view, see Napolskikh (1997: 123, 154–155). It is tempting to regard the early use and trade of copper as the defining cultural innovation behind the expansion of the Finno-Ugrian language area.

The so-called Uralic word for ‘water’ (e.g. Hungarian víz), with well-known reflexes in all branches except Saami and Khanty, is one of the most widely-used pieces of evidence either for ancient contacts or Urverwandtschaft between Indo-European and Finno-Ugrian (Joki 1973). It would be difficult to think that the resemblance between the Indo-European and Finno-Ugrian roots could be a plain coincidence, and indeed, the absence of this root in Saami and Khanty clearly points to a secondary nature of the Finno-Ugrian root. The common Saami word for ‘water’ (e.g. North Saami čáhci) has namely a cognate in nowhere else but Khanty, where the meaning is ‘tide, flood’. The most plausible scenario involves a semantic shift from ‘water’ to ‘tide, flood’ in Khanty, which is consistent with the fact that the common Khanty word for ‘water’ is based on the root meaning ‘ice’. The Saami word and its Khanty cognate can therefore be regarded as reflexes of the original Uralic word *śäčä ‘water’ (cf. Sammallahti 1988: 549; Rédei 1986–1991: 469), retained in the northern periphery of the Finno-Ugrian language area but replaced by an Indo-European borrowing elsewhere. The assumption, based on the application of the traditional binary model, that cognates of Hungarian víz etc. must have existed in pre-Saami or pre-Khanty is axiomatic and only leads to circular argumentation.

Most similarities between Indo-European and Finno-Ugrian can be easily explained on the basis of language contacts. The only notable exception are the basic pronominal stems, widespread in northern Eurasia and beyond. The striking thing about the common nominal and verbal roots is that their independently reconstructed Indo-European and Finno-Ugrian forms are so similar to each other, a situation which must be seen as an indicator of contacts rather than Urverwandtschaft. In most of these cases it is also semantically plausible to explain them as borrowings, because they often belong to the field of trade relations, and in the few instances without semantic motivation, like ‘water’ discussed above, other arguments in favour of a contact explanation can be presented.

If scholars want to pursue the search of evidence for genetic affinity between Indo-European and Finno-Ugrian, it must be kept in mind that the relationship between these language families must be much more remote than that amongst their branches. To illustrate this quantitative difference, it may be estimated that any speaker of a Finno-Ugrian language shares 50 to 100 common lexical items with a speaker of any Finno-Ugrian language of another branch, while even if we take a most positive stand to the genetic affinity hypothesis, we can count that any speaker of a Finno-Ugrian language shares no more than 5 to 10 common lexical items, pronouns included, with a speaker of any Indo-European language. It seems reasonable to interpret such a major quantitative gap qualitatively as well, which means the rejection of the so-called Indo-Uralic hypothesis. The one thing that seems certain about Neolithic communities is that they were characterized by wide-spread multilingualism, and in such conditions language contacts were at least as extensive as they are known to have been more recently. There is no need to assume that the primary Finno-Ugrian and Indo-European homelands were adjacent to each other but secondary expansions must have brought them into contact relatively early, spreading the knowledge of Indo-European among Finno-Ugrian speakers.

In the study of ancient prehistoric developments, a high level of source criticism is required, and intuitive or authoritative methods must be avoided. If, for instance, evidence suggesting ancient contacts between Finno-Ugrian and Indo-European is disregarded without proper consideration because of an underlying hypothesis of a Siberian Urheimat for Finno-Ugrian, or vice versa, the results are bound to be biased and circular. While comparing archaeological and linguistic data in general, it should be remembered that the correlation between language and culture has always been weak at best, as can be seen from historically attested cases, for example, the complex linguistic and cultural patterns found in Siberia. To sum up the basic, perhaps rather discouraging message of this paper, the development of the field depends, more than anything else, on getting away from preconceived notions, which means that scholars must welcome rather than deny or ignore information that seriously challenges their preconceptions.


