Cognitive interaction of verbal and nonverbal signs (prosodio-kinetic complexes)

Natalya V. Sukhova ( /, Chair of Foreign Languages, School of Public Administration, Lomonosov Moscow State University


This article investigates verbal / nonverbal interaction from a cognitive point of view. There is a certain zone of that interaction that can be considered within a cognitive framework. The cognitive approach deals with the process of the production and perception of verbal and nonverbal signs. The process starts from general intention, proceeds to the meaning of a future utterance, and then employs forms of mental representation (verbal and nonverbal) that emerge as concrete means of expression (words and gestural movements) within a communicative act. Thus, the physiological, psychological and cognitive mechanisms of speech production and perception are analyzed with a special emphasis on the gestural aspect. These mechanisms operate on four large functional planes (orientation, utterance forming, realisation, control). Thus, the verbal / nonverbal interrelation starts when the aim of a future utterance is being established, when one meaningful cognitive entity is formed. The programming stage then follows: meaning is embodied into a gesture-speech utterance; moreover, sometimes a gestural phrase surpasses the verbal one already in the zone of symbolic representation. The motor programming stage is characterised by the formation of a common meaning, packing it into verbal and gestural formats, that emerge as speech signals and gestural movements. Then the realised utterance is corrected in accordance with the initial model if necessary. This article presents a detailed and parallel investigation of all cognitive processes going on at the production / perception stage, with both gestural and phonetic elements of prosodio-kinetic complexes.

1. Introduction

The present study has been conducted within the cognitive approach to gesture-speech interaction. This approach deals with different stages of speech production, consisting of intention / meaning formation, the process of mental representation (verbal and nonverbal) and realisation of concrete words and gestural movements. Thus, the cognitive approach is comprised of two major problems: production of language and gestures in a broad sense and their perception. Generally speaking, the focus here is on the physiological, psychological and cognitive mechanisms of speech production and speech perception and on the role gestures play in this process.

This study investigates the relationships between gestural movements and speech units, particularly prosodic elements, on different planes of the speech production process. More precisely, the emphasis is on both gesture-speech intentional / semantic relations being formed in the initial production stages and final realisation of gesture and speech in the act of communication.

To succeed in such an investigation, it is necessary to distinguish three important areas: gesture-speech production, gesture-speech perception processes and gesture-speech functioning. Sometimes all work collaboratively to uncover gesture-speech interrelations. I will consider each of them in turn and then jointly.

2. Production / perception process

Psychological and psycholinguistic investigations have proved that the processes of production and perception of any behavioural act, speech acts included, develop in similar stages. This means that the way we produce speech units is practically the same as the way we perceive them. The only difference is the direction of the process. Such similarity enables researchers to study the process of production, which is more complicated to investigate, through perception and thereby obtain data about the production process.

There are a number of works devoted to the process of speech production and its different stages (Leontyev 1974, Ryabova 1992, Anokhin 1975 and others). It has been established that there are four large functional blocks or planes in any speech act production:

  1. the plane of orientation;
  2. the plane of utterance formation;
  3. the plane of realisation;
  4. the plane of control.

For a long time the plane of realisation has been at the forefront of psycholinguistics, and I would say that it still is. Nevertheless, there are the mechanisms of orientation and utterance forming that are the most important in the production process.

It has not gone unnoticed, however, and here it is necessary to emphasise, that people always use gestures while speaking. Thus, both speech and gestures are produced during these stages. I shall consider the basic planes and the most important mechanisms of utterance production with gestural involvement and shall concentrate on the gestural role in the speech production process.

3. Plane of orientation

Firstly, a speech mechanism is triggered by a motive and a communicative intention. Speech intention, or the aim of the utterance, is based on memory, motivation, afferential elements and a starting stimulus. Thus, a motive triggers an idea that is developing and forming, and then the whole process results in an interior / exterior utterance (verbal and nonverbal). The aim of a speech act forms the meaning of a future utterance.

For example, Stone (2004) specifies conversational agency by describing agents' coordinated reasoning about the communicative intentions. He says, that "the speaker produces each utterance by formulating a suitable communicative intention. The hearer understands it by recognizing the communicative intention behind it. When this coordination is successful, interlocutors succeed in considering the same intentions - that is, isomorphic representations of utterance meaning - as the dialogue proceeds" (Stone 2004:791).

This stage then may serve as a source for meaning of both gesture and speech units. Some scholars argue that speech and gesture convey one and the same meaning (McNeill et al. 1994, Kendon 1997, etc.); however, speech and gestures are not semantically redundant in their co-functioning. Gestures and speech are produced as an integral part of the same plan of action as the spoken utterance, and they are but another manifestation of the same underlying process. Krauss et al. argue that communicative intention is the "common origin of gesture and speech" (Krauss et al. 1991:752). The communicative intention activates both an abstract propositional representation and a motoric representation that may be reflected in a gestural movement. These ideas seem reasonable.

There is also the representation theory of mind (Fodor 2000), which seeks to naturalise common sense intentional explanations of human action. According to this theory, there are common sense characterisations in semantic terms and scientific characterisations in physical terms. The theory postulates symbolic representations and algorithms as a bridge between the two types of characterisations. People have a large number of beliefs, commitments and desires, which are considered whenever a choice is made. We appeal to this information while explaining our actions semantically, that is, when we make a common sense explanation. We also appeal to computation and view this body of information operationally. "Any of our beliefs, commitments and desires can be represented physically as symbolic structures, and we can give precise algorithms for mechanically deriving symbolic structures that motivate our actions from such representations" (Stone 2004:785). Then semantic entailment and computational inference are brought together by exact logical correspondence. Therefore, our actions can be regarded as the exact consequences of a physical mechanism and as an exact manifestation of our identity.

Even if the representation theory of mind and other agent theories have some shortcomings, they have highlighted the dimensions of intentions, collaboration and possible analysis of symbol-processing realisations.

Specifically, for example, Gerwing and Bavelas (2004) examined hand gestures in face-to-face dialogues and concluded that such gestures are symbolic acts, integrated into speech. They demonstrated that the 'immediate communicative function', that is, the ability to convey information that is common ground or that is new, plays a major role in determining the physical form of the gestures.

In light of the intentions in establishing the aim, it is vitally important to have a close look at the programming plane when the gesture-speech utterance is being formed.

4. Plane of utterance formation

Secondly, after establishing the aim, the mechanisms of symbolic representation are activated. Both phenomena - speech and gesture - are forms of mental representation.

In computational linguistics it is assumed that there are two types of resources. The first type is linguistic and it describes the form and meaning of utterances and the abstract process from specific connections to the subject-matter of a conversation. The second type is communicative; it captures the mechanisms of the appropriate connection of these utterances to that subject-matter (Stone 2004). Evidently, in reality there is also a third type, which can either be a subtype of a linguistic resource or an independent resource itself. This is the resource of gestural vocabulary.

However they are characterised, gesture and speech are an entity on the deep representational level, but then they part ways. Meanings are not transformed into a gestural unit through linguistic formats; they are transmitted directly and independently (Kendon 1987). The production of a gestural and a linguistic sign is understood as two aspects of one and the same representational process, although the two are organised separately from each other. Moreover, afterwards the channels of transmission are different.

Here I would like to survey some research in neurobiology, neurolinguistics and cognitive science to examine the relationships between gesture and speech in the early stages of speech production.

E.V. Bobrova and A. Bobrov (2005) assume a double dichotomy in the right and left hemispheres.

The authors argue that, on the one hand, each hemisphere uses mainly one of the two types of specialised neural nets, yet on the other hand, primary visual information in each hemisphere is divided into two streams, namely, dorsal and ventral streams. The dorsal stream (the posterior parietal cortex) is presumably the "where" system. The ventral stream (inferior temporal cortex) is the "what" system. The "what" system of the right hemisphere creates subimages, which are constructed of details. It employs the mechanism of primitives, performed by non-linear neurons. Thereafter, the "where" system of the right hemisphere creates a visual image out of the subimages supplied by the "what" system. Here the mechanism of frames is at work. The "what" system creates a truncated abstract description of the visual image. The mechanism of spatial frequency analysis is performed by linear neurons. The "where" system describes a visual scene that is constructed from images described by the "what" system. It also uses the mechanism of frames.

The joint work of these two systems in both hemispheres results in a sort of "labour differentiation", which seems relevant to explore in terms of the relations between gesture and speech in the brain. There is a very illustrative chart (Bobrova and Bobrov 2005) dealing with action and perception in both hemispheres (see Table 1).

Table 1. Functions of the right and left hemispheres (Bobrova and Bobrov 2005).

Right hemisphere

Left hemisphere


- Vision

  • visual image perception;
  • frames are filled by subimages constructed from primitives;
  • sensory learning.
  • visual scene perception;
  • frames are filled by truncated codes of visual images.

- Language

  • word perception;
  • frames are filled by elements of words;
  • language acquisition.
  • speech perception;
  • frames are filled by truncated codes of words.


- Movement

  • postural control;
  • control of precise parameters of movements;
  • movement learning.
  • control of successive coordinated movements for action;
  • control of timing of successive movements.

- Speech

  • control of word production during language acquisition.
  • control of successive coordinated movements of muscles participating in speech production.

As the chart shows, gesture and speech are undoubtedly connected. However, there are two groups of scholars, each of whom regard that connection differently. Some scientists (McNeill et al. 1994, Kelly et al. 2004, and others) think that gesture is communication and furthermore that gesture is tightly integrated with speech and influences "speech processing even at the earliest stages of comprehension" (Kelly et al. 2004:253). The other group of researchers, for whom gesture is non-communication, argues that gesture and speech are independent systems and that gesture does not influence language comprehension in a significant way (e.g., Krauss et al. 1991). The opinion of the first group seems to be more reliable and valid.

To support this idea several experiments were conducted that were intended to investigate, for example, the relationship of gesture and speech in language comprehension (Kelly et al. 2004) and the motor functions of the Broca's region, which is known to mediate the production of language and contribute to comprehension (Binkofski & Buccino 2004; see also Müller & Basho 2004).

It was concluded that at the early stages of processing, "the high-level visuospatial information conveyed through hand gestures may have an early cross-modal effect on speech processing. Specifically, gestures may create a visuospatial context that subsequently influences the sensory processing of the linguistic information that follows" (Kelly et al. 2004:258). And more importantly, the authors continue that "at late stages of processing, the semantic content of the complementary gestures were treated as partially consistent with the semantic content of the accompanying speech" (ibid.).

Moreover, it was recently shown that manual gesturing while speaking improves cognitive efficiency as measured by memory capacity, suggesting an interaction between the language system and the motor system for cognitive functions (see Small & Nusbaum 2004). Besides, the presence of the visio-motor information changed the laterality of the activity in the superior temporal cortex, as Small and Nusbaum point out, and it demonstrates the interaction in processing between face information and acoustic speech in more traditional speech perception areas (2004:306).

Interesting findings about Broca's region, which might be critical not only for speech, give evidence that it may play a more general role in motor control by interfacing external information about biological motion with internal motor representation of hand, arm and mouth actions (see Binkofski & Buccino 2004). The scientists showed that "left-hemispheric activation of Broca's region reflected 'pragmatic' motor processing, while the right-hemispheric activation of Broca's homologue was related to explicit motor processing motion" (ibid.:365).

5. Plane of realisation

The plane of realisation triggers first an interior and then an exterior speech. Thus, a mechanism of interior programming of an utterance is activated. After that the transition from the programme to a syntactic / grammatical structure of a sentence is fulfilled through mechanisms of grammatical prognosis. In addition, there is a search for an appropriate word by semantic and phonetic attributes. The next stage is a motor prognosis with phonetic operations and the filling in of a programmed form. The result is a realised exterior sounded speech.

The process of realisation is a very complicated process to study. However, some attempts have been made. For instance, Schegloff (1986) suggested that there is a "projection space" in which a word appears for the first time (the programming stage). The word stays there at least "as early as the thrust / acme or perhaps even the onset of the gesture selected or constructed by reference to it" (Schegloff 1986:278). Hence, a gesture can be finished before the appearance of an affiliated word; as well, word and gesture can coincide. However, there are cases when the word takes the lead over the gesture. This study has shown that there is a well-organised programme of actions to produce an utterance with gestural accompaniment. Moreover, the semantic prognosis is realised quicker on a gestural level than on a verbal one; at the moment words are still at the stage of deep motor programming.

There is strong experimental evidence for that idea. Kelly and his colleagues found that "gestures appear to influence how speech acoustically encoded several hundred milliseconds prior to any semantic analysis of the speech content" (Kelly et al. 2004:257).

There are some interesting findings about the inferior frontal cortex, which activates not only during observation of meaningful visual stimuli, but also "when subjects passively listen to meaningful auditory stimuli, such as speech" (Müller & Basho 2004:333). These authors argue that this convergence of visuo-motor coordination and object perception implies that semantic information on objects perceived during word acquisition and motor behaviour crucial for word production are processed in close proximity to the inferior frontal lobe.

That process may well concern the interior speech-gesture organisation. However, the clearest and most illustrative process of gesture-speech interaction takes place during utterance realisations in the act of communication.

It has been argued that messages actually consist of a "mixed syntax", that is, speech and gestures that are elements combined together to form the overall message (e.g., Slama-Cazacu 1976). Chovil (1991/1992) conducted an experiment in which the facial displays were analysed according to the type of information they conveyed. Moreover, both speakers and listeners were examined and were found to use facial displays in a variety of ways. The results of Chovil's study showed that facial displays and linguistic elements distinguish features of discourse and illustrate or add semantic content.

It is also worth noting that the speaker monitors the recipient's responses and adjusts his or her gestures to them (see Streeck 1994). Thus, the recipient participates as a co-author of the gestures.

In 2004 I completed an investigation dealing with prosodio-kinetic complexes in monologue speech acts (Sukhova 2004). The hypothesis was that different aims of utterances and different significant and pragmatic meanings gave birth to different prosodio-kinetic complexes.

The study took place within both cognitive and functional frameworks: 1) the initial stage of utterance orientation was hypothetically considered to be a decisive moment for the appearance of a certain complex; 2) the stage of final realisation was taken as an experimental site to prove the first position.

The communicative and pragmatic types of utterances and the aims of those utterances were defined as the starting point for investigating the process of monologue speech acts production. These acts contain individual meanings and a semantic structure. More concretely, the production process may be viewed as follows: All information in the course of communication is transformed into a system. The elements of the system are the objective meanings of linguistic and kinetic units employed by a speaker and also the subjective meanings of a speaker and the subjective meanings of a listener.

The behavioural act, both its verbal and nonverbal parts, is performed with a definite aim, which forms a holistic meaning. In the course of the production process there is a semantic structure (the meaning of the sentence) as input, and there is also a certain pragmatic performance (the meaning of an utterance) as output. While perceiving the communicative act the listener engages in a reverse process: 1) the input is a pragmatic performance; 2) the output a semantic structure (for details, see Semenenko 1996). The goal of carrying out a communicative act is an indispensable part of a pragmatic performance and, significantly, pragmatic performance is correlated with some elements of a semantic structure. Furthermore, in a monologue situation, communicative and intentional aspects of the utterance (illocutionary and other intentions) interact with a speaker's intention to adjust dialogically-oriented utterance semantics to the conditions of a monologue situation.

Thus, the pragmatic performance, based on a semantic structure, includes different objectives: 1) the communicative act; 2) the utterance proper; 3) the monologue adjustment. The most important fact is that all those objectives are expressed through various means, prosodic and kinetic being among them. This suggests that study of gesture-speech interaction should proceed, on the one hand, from the expression of the objectives in a speech act in general, and on the other hand, from speech-gesture realisations in particular, which can prove their deep links on a cognitive level. I shall dwell upon the expression of different objectives in a speech act, which are conveyed through the speech act constituents.

All three components of our study, namely, a prosodic nucleus, a kinetic gesture and an utterance proper, possess a form (morphology), a meaning (semantics), syntagmatics and paradigmatics and pragmatics. Accordingly, the following meanings can be distinguished:

  1. significative - the relationship of the element to the meaning and communicative essence of an utterance;
  2. syntagmatic - the syntagmatic relations between elements and their functions in speech; pragmatic meaning also belongs here, that is, the impact the elements have on a listener and the relationship of a speaker towards the sign;
  3. paradigmatic - the contraposition of semantically homogeneous elements to other elements of the same class on the same paradigmatic axis;
  4. sigmatic - the relationship of the elements to the situation, and the actualisation of a meaning in a particular situation.

The most relevant meanings for our study are significative, pragmatic and, as a variant, sigmatic meanings. Significative meaning plays a major role in defining the aim of a pragmatic and communicative type of utterance. Pragmatic and sigmatic meanings indicate the relationship between the participants in a situation and the elements of the situation, and between an element and the situation. It is worth emphasising that in the communication process different meanings of an utterance are formed due to the meanings of its constituents, while conversely, the meaning of a bigger component influences the meanings of smaller ones. Hence, the significative and pragmatic meaning of gestures to a nuclear word and the significative and pragmatic meaning of the nuclear word correlate directly to the meaning of the utterance. This fact justifies the possible investigation of significative and pragmatic meanings of a prosodio-kinetic complex, containing a prosodic nucleus and a concomitant kinetic phrase. This meaning of a complex is based on the communicative and pragmatic intention of the whole utterance.

This strategy 'from above' proved to be successful. We considered large speech episodes in which the meaning blocks were singled out. According to this intention-based grouping, we obtained two major communicative and pragmatic types of utterances, namely, statements and statements-estimation utterances. Statements render new information which a listener does not yet have. Statement-estimation means that the speaker intends to express his / her attitude towards people, situations, actions, events, etc. and at the same time convey some new information about the subject, which can be about the people, situation, action, event, etc.

Nevertheless, the strategy 'from below' also gave us some evidence that prosodio-kinetic complexes alone contribute to the general significative and pragmatic meanings of the utterance. On the one hand, prosody fulfills distinctive functions (cf. constitutive functions, which form general characteristics of a prosodic structure, independent of expression of this or that meaning). These prosodic elements contribute to the significative meaning of an utterance, establish the utterance objectives and influence the pragmatic meaning of an utterance on the whole.

On the other hand, the kinetic gestures also activate the significative meaning of an utterance. Undoubtedly, gestures affect the relationship between a speaker and a listener; gestures manage the relationships of a speaker towards his / her utterance. Therefore, gestures reveal the pragmatic meaning of an utterance as well. Thus, the two strategies were combined, and we obtained the following results.

Table 2. Realisation of kinetic forms in different communicative and pragmatic types of utterances.

Kinetic gestures

Usage of the kinetic forms in episodes (%)

Usage of the kinetic forms in different communicative and pragmatic types of utterances (%)











Facial expressions




Body movements








Thus, the interaction between prosodic nuclei and kinetic phrases is revealed through their joint functioning to convey significant and pragmatic meanings of the utterance. Accordingly, different meanings originate in different prosodio-kinetic complexes.

6. Conclusion

The four planes involved in processes of speech production were discussed here. In summary, the interrelation between the gestural and verbal parts of an utterance starts when the process of establishing a goal begins, and the meaningful cognitive entity is being formed. Then follows the programming stage: meaning is embodied into an utterance; moreover, sometimes a gestural phrase overrides the verbal one already in the zone of symbolic representation. The motor programming stage is characterised by the formation of a common meaning, and packing it into verbal and gestural formats, which emerge as speech signals and certain gestural movements. Then, if necessary, the realised utterance is corrected in accordance with the initial model.


Anokhin, P.K. 1975. Ocherki po fiziologii functsional'nykh sistem. (Essays on Physiology of Functional Systems.) Moscow: Medictsina.

Bavelas, J.B. 1994. "Gestures as part of speech: Methodological implications". Research on Language and Social Interaction 27(3): 201-221. doi:10.1207/s15327973rlsi2703_3

Binkofski, F. & G. Buccino. 2004. "Motor functions of the Broca's region". Brain and Language 89(2): 362-369. doi:10.1016/S0093-934X(03)00358-4

Bobrova, E.V. & A. Bobrov. 2005. "Neurophysiological aspects of language birth: Vertical posture and explosion". Presentation at Imatra International Summer Institute for Semiotics and Structural Studies.

Chovil, N. 1991/1992. "Discourse-oriented facial displays in conversation". Research on Language and Social Interaction 25: 163-194.

Fodor, J.A. 2000. The Mind Doesn't Work That Way: The Scope and Limits of Computational Psychology. Cambridge, Mass.: MIT Press.

Gerwing, J. & J.B. Bavelas. 2004. "Linguistic influences on gesture's form". Gesture 4(2): 157-195. doi:10.1075/gest.4.2.04ger

Kelly, S.D., C. Kravitz & M. Hopkins. 2004. "Neural correlates of bimodal speech and gesture comprehension". Brain and Language 89(1): 253-260. doi:10.1016/S0093-934X(03)00335-3

Kendon, A. 1987. "On gesture: Its complementary relationship with speech". Nonverbal Behavior and Communication, 2nd edition, ed. by A.W. Siegman & S. Feldstein, 65-97. London: Lawrence Erlbaum Associates.

Kendon, A. 1997. "Gesture". Annual Review of Anthropology 26: 109-128. doi:10.1146/annurev.anthro.26.1.109

Krauss, R.M., P. Morrel-Samuels & C. Colasante. 1991. "Do conversational hand gestures communicate?" Journal of Personality and Social Psychology 61(5): 743-754. doi:10.1037/0022-3514.61.5.743

Leontyev, A.A. 1974. "Rechevaya deyate'lost' [Speech activity]". Osnovy teorii rechevoj deyatel'nosti, 21-29. Moscow: Nauka.

McNeill, D., J. Cassell & K-E. McCullough. 1994. "Communicative effects of speech-mismatched gestures". Research on Language and Social Interaction 27(3): 223-237. doi:10.1207/s15327973rlsi2703_4

Müller, R-A. & S. Basho. 2004. "Are nonlinguistic functions in 'Broca's area' prerequisites for language acquisition? FMRI findings from an ontogenetic viewpoint". Brain and Language 89(2): 329-336. doi:10.1016/S0093-934X(03)00346-8

Ryabova, O.V. 1992. Rol' intonatsii v semanticheskoj interpretatsii kosvennogo rechevogo akta. (The Role of Intonation in the Interpretation of Indirect Speech Act.) Dissertatsia kandidatskaj. Moscow: MSLU.

Schegloff, E.A. 1986. "On some gestures' relation to talk". Structures of Social Action: Studies in Conversation Analysis, ed. by J.M. Atkinson & J. Heritage, 266-296. Cambridge: Cambridge University Press.

Semenenko, L.P. 1996. Aspekty lingvisticheskoj teorii monologa. (Aspects of Linguistic Theory of Monologue.) Moscow: MSLU.

Slama-Cazacu, T. 1976. "Nonverbal components in message sequence: 'Mixed syntax'". Language and Man: Anthropological Issues, ed. by W.C. McCormack & S.A. Wurm, 217-222. The Hague: Mouton.

Small, S.L. & H.C. Nusbaum. 2004. "On the neurobiological investigation of language understanding in context". Brain and Language 89(2): 300-311. doi:10.1016/S0093-934X(03)00344-4

Stone, M. 2004. "Intention, interpretation and the computational structure of language". Cognitive Science 28(5): 781-809.

Streeck, J. 1994. "Gesture as communication II: The audience as co-author". Research on Language and Social Interaction 27(3): 239-267. doi:10.1207/s15327973rlsi2703_5

Sukhova, N.V. 2004. Vzaimodeistvie prosodii i neverba'nykh sredstv v monologe (na materiale angliiskikh dokumental'nykh filmov). (Interaction Between Prosody and Nonverbal Means in Monologue Speech (on the English Documentaries).) Ph.D. dissertation, Moscow State Linguistic University.