Looking for an Eil Pronunciation Standard: A Literature Review and Classroom Experience from the Russian L1 Perspective

This article concerns itself with the identification of language units essential to the intelligibility of communication of non-native English speakers (NNESs) in international settings, or English as an international language (EIL) communication. It focuses on a seemingly narrow but nevertheless significant area of speech production and reception – pronunciation. Based on the works of pronunciation scholars and classroom experience, we outline areas of concern for NNES training and suggest pronunciation foci for Russian learners of English as a foreign language (EFL). We specifically examine areas where academic discourse goals overlap with the goals of developing NNES pronunciation fluency and rhetorical competence, targeting those features that, if improved upon, would make NNES speech sound intelligible, educated and cultured as the academic environment requires. We consider these features in view of their importance for two emerging pedagogical domains: English as a lingua franca (ELF) and English as a medium of instruction (EMI), particularly taking into account their approach to NNESs’ identity and attitude.

In the 21 st century, global world English has become a widely recognized and well-established lingua franca. Aside from being used for travel and everyday communication, it functions as a means of exchange at international forums and gatherings, in research, academia and business. The number of interactions between speakers from so-called expanding and outer circles who use English a medium of communication as well as native and other NNESs in international settings by far exceeds the number of interactions between NNESs and NESs (native English speakers) (Crystal, 1997;Kachru, 1985). Due to the continuing spread of English and the proliferation of language contacts, a great variety of "Englishes" have emerged, each modifying, in their own way, the canonical version on grammatical, lexical and phonological levels. As several linguists and English Language Teaching (ELT) professionals contend, rapid English globalization has also influenced "inter-English" intelligibility. According to Martin Dewey (2007), globalization of English has deviated from a traditional orientation to varieties of English to English as a multilingual activity which is deeply intercultural and flexible (Dewey, 2007, p. 335). The centrifugal force of worldwide English development in many ways runs counter to well-established EFL pedagogies. Specifically, such pedagogies are mainly centered around teaching one of the two most prestigious NES varieties, British or American, with most textbooks focusing on either of them. Nowhere is this dichotomy more apparent than with English pronunciation, a discrepancy that has triggered vigorous debate on two as yet unresolved ELT questions: Which English language pronunciation standard should be chosen as a learning goal toward the ultimate realization of EIL intelligibility, and which segmental and suprasegmental elements of pronunciation hinder international communication due to L1/L2 transfer. These two linguistic and pedagogical issues are closely connected to the question of the sociocultural status of NNES accent. Stereotypically, NNES accents are still discriminated against both locally and internationally, and the prejudice still holds that to speak good English one should totally get rid of an L1 accent. It looks particularly one-sided when non-native English language teachers' professional skills and expertise are evaluated according to their proximity to an English native accent, which puts them at a considerable disadvantage among the ELT world community. Meanwhile, as a result of regional and social mobility, it is next to impossible nowadays to find a native speaker in possession of the 'pure' standard variety, not to mention that some regional native accents are stigmatized, too, for example, Scottish or Irish. This is another reason why, besides linguistic and pragmatic factors, ELT scholars are drawing more attention to a more comprehensive approach for suggesting pronunciation elements of an EIL standard and take into consideration the L1 sociocultural context as well (Sedlehoffer, 2000;Walker, 2010).

Materials and Methods
To meet the communication needs of small-scale and large-scale multicultural interaction scenarios, the ELT research and pedagogical paradigm has shifted from the concepts of EFL and English as a Second Language (ESL) to EIL, or ELF. Although ELF still remains one of the most controversial approaches to ELT, it is widely recognized that nowadays it symbolizes the conceptual and practical contrasts between EFL and ELF and "some of the ELF ideas are likely to influence mainstream teaching and assessment practices" (Graddol, 2006, p. 87).
For over a decade now ELF, traditionally known as EIL, has drawn significant interest of applied linguists and ELT professionals, with this field of English language studies now supported by ample theoretical and empirical data in the works of Cogo, A., Bolton, K., Dewey, M., Jenkins, J., Kirkpatrick, A., Seidlehofer, B., Shen, S., Walker, R., Widdowson, H., to name just a few.
Unlike conventional EFL and ESL approaches, EIL/ELF focuses on pragmatic competences and international communication strategies, where, for successful communication, more importance is placed on discourse intelligibility than native-speaker-like fluency. International discourse pragmatics require that participants from different linguistic and sociocultural backgrounds build common ground by promoting solidarity of discourse and equally share responsibility for the success or the failure of communication by adjusting their linguistic and extra-linguistic behavior when interacting with each other. Both parties need, therefore, to be intelligible to each other, that is, to understand and communicate the message clearly. According to the EIL approach, both NESs and NNESs have to aspire to a mutually intelligible variety of English, which has a distinctive language, pragmatics features that NNESs are using as a means of expressing their sociocultural identities (Seidlhofer & Berns, 2009, p. 190).
Recently the linguists' focus has shifted from language features that characterize ELF interactions towards processes and practices by which these features develop (Jenkins & al., 2011, p. 292); that is why ELF research is primarily aimed at highlighting the pragmatic strategies employed by speakers (Cogo, 2012;Sewell, 2013). Of several specific areas of ELF research, two domains in particular have captured the attention of ELF scholars, namely, business communication (as cited in Jenkins et al., 2011: e.g., Bjorkman, 2010Ehrenreich, 2009;Erling, 2007;Pulling and Stark, 2009) and higher education (as cited in Jenkins et al., 2011: e.g., Bjorkman, 2010Erling, 2007;Smit, 2010). Research into academic settings is of particular importance with the emergence of EMI, the kind of English used for instruction and communication in educational institutions in countries where English is a foreign language.
So-called "internationalization" of education in those countries has given impetus to further intensive investigations of academic discourse and has set forth tasks for fostering teacher development, lecturers' training and the design of course materials according to new pragmatic and sociocultural standards.
One of the most widely cited ELF proponents is Jennifer Jenkins, who, based on her extensive classroom experience, proposed the ELF pronunciation core that appears crucial for intelligibility in ELF communication. Jenkins suggested regarding and designating the pronunciation units from the point of view of their "teachability" and "learnability", which makes her approach invaluable for ELT pedagogy (Jenkins, 2000). Outside the "core", she leaves some pronunciation features that might occur in different varieties of English and suggests teaching them receptively rather than productively so that non-native learners would be able to understand other accents while maintaining some of their own accent in order to retain their identity and to make the learning goals teachable and achievable. This data is still a point of debate among linguists and ELT professionals, but one cannot deny that Jenkin's findings have triggered intensive classroom research both in cross-and multi-cultural ELT contexts. Jenkin's innovative approach to acquiring a native speaker-like accent is particularly worth mentioning. She proposed acquiring only the degree of native speaker accent sufficient to ensuring intelligibility. This compromising ELT method saves precious time and effort of EFL/ ESL teachers and learners while serving to retain both NNES national identity in terms of accent and securing intelligibility of EIL communication.
Before proceeding further, a few words must be said about what kind of communicative situations should be included in academic discourse. Academic discourse can be "planned, organized by a pre-determined set of topics or informational bits intended to be addressed, as in the genres of lectures, sermons, legal proceedings" (Strauss and Feiz, 2014, p. 65). Lectures and presentations, which are the focus of this article, can be scripted, which is becoming a more and more obsolete way of delivery, and delivered extemporaneously. An extemporaneous way of delivery, which takes on the appearance of a spontaneously produced talk, in most cases requires preparing scripted materials and demands numerous rehearsals, placing this type of oral discourse among planned, organized and predetermined types of discourse. On the other hand, to win over the audience and get the message across in the most clear and unambiguous way, conversational passages and interactions with the audience in a semi-formal or informal speaking style are becoming more and more common in public speaking, including academic lectures and presentations. This more democratic style of public speaking has drastically changed a once most formal genre of academic lecture, with its speaking characteristics now similar to those of more conversational genres such as, for example, panel discussions, debates, negotiations, and interviews. This approach looks particularly relevant from the pragmatic point of view since all of the mentioned communication situations pursue similar goals, i.e., they are primarily aimed at persuading the listener and, in most cases, at changing people's beliefs or actions. Regarding all the factors above, we will consider oral academic discourse as a planned or semi-planned rhetorical performance delivered in a formal or semi-formal style and assume that debates, negotiations, meetings, and job interviews will require sticking to pronunciation patterns similar to academic lectures and presentations, namely those that are typical for cultured voices.
In support of an ELF-oriented approach to teaching pronunciation to NNESs of English, Simon Andrews has put forward the idea that NNESs should aim to acquire a pronunciation model approximating that of public speaking. He claims that to fit multicultural professional settings (e.g., presentations, debates, negotiations, meetings, etc.) NNESs need to develop their rhetorical competence in such areas as clarity of enunciation, speed of delivery, appropriate pausing and nuclear stress patterns (Andrewes, 2011). These elements of pronunciation are considered of primary importance for intelligibility in ELF communication in many other books of EIL and EFL researchers (see, for example, Jenkins, 2007;McKay, 2002;Walker, 2010). Intensive sociolinguistic research of Russian phoneticians who have investigated British and American standard pronunciation in relation to socially and regionally marked speech has shown that pitch range is a distinctive and reliable sociocultural factor that differentiates the pronunciation of a middle class educated urban citizen from regional native speakers of lower social status. In addition, a narrower pitch range also signals of an informal and conversational discourse variety (Shevchenko, 2006;Shevchenko, 2015). Developed rhetorical skills will help NNESs to achieve pragmatic goals of academic communication and to get the message across logically, clearly, intelligibly, accurately, and persuasively as required by the academic environment. By accuracy in academic discourse, we are primarily referring to correct word stress and consider pitch change and pitch range among the main prosodic characteristics which comply with the norms of cultured speech and help to build the image of an educated speaker (Shevchenko, 2006;Skopintseva, 2013).
To designate the key segmental and suprasegmental elements of an academic discourse pronunciation model, it is suggested putting forward the elements based on their advantage for the three main goals of academic discourse: those that are crucial for EIL intelligibility, important for building a speaker's rhetorical competence, and those that comply with the sociocultural expectations of an academic discourse. The pronunciation elements listed below are the result of a 6-year long classroom experience of teaching English pronunciation for a public speaking course for ESAP Russian students and the findings of the school of English sociophonetics headed by professor T.I. Shevchenko at Moscow State Linguistic University, where the author taught and performed research for about 20 years.
From our point of view, the key pronunciation elements are: • Clear and distinct articulation of stressed vowels • Accurate articulation of consonants and consonant clusters both in word-initial and word-ending positions • Word stress • Slower pace • Meaningful division of the stream of speech into shorter word groups • Appropriate placement of nuclear stress to distinguish between old and new information and also used for rhetorical purposes (e.g., in contrasts, repetitions) • Register and pitch range to highlight the logical structure of academic discourse and to lay rhetorical emphasis.

Suprasegmentals
Russian students tend to complain that they find it more difficult to understand Standard British than Standard American speakers, and the British accent is typically harder for them to acquire than the American one. Having investigated prosodic errors of Russian learners in their academic presentations that hinder communication, we came to the conclusion that, to a large extent, intonational preferences of Russian speakers stem from differences in phonotactics, which in their turn affect prominence and rhythm. The basic difference consists in the dominance of an open syllable in Russian (Consonant-Vowel, or CV) and a closed syllable in English (Consonant-Vowel-Consonant, or CVC) in actual speech. According to Russian scholars' research data, 78% of Russian syllables are open (as cited in Shevchenko, 2015).
Another important feature, also supported by experimental evidence, concerns the phonotactics of syllable division and articulation. In English, like in all Germanic languages, there is close contact between the vowel and the coda consonants (in VC), which, as a result, affects the length of vowels. The retention stage of a consonant belongs to the previous short vowel while the release is with the next syllable. This way, the boundary between the two syllables runs within the medial consonant: [sit-ti], [hap-pi] which never happens in Russian (Lukina, 2003). In English the articulatory tension peaks between the consonant and the vowel, which is perceived as a more accented consonant in comparison with the vowel. On the contrary, in Russian the articulatory tension grows during the transition from the consonant to the vowel, with the articulation peak localized on the vowel. The transition runs more smoothly in Russian than in English, which results in placing more prominence on a vowel than on a consonant in Russian -resulting in consonant palatalization before front vowels (Lukina, 2003).
Although both English and Russian are known to be stress-timed languages, the rhythmic patterns significantly differ. The difference can be illustrated, for example, by the stressed/unstressed syllable ratio in English. The duration of British stressed/unstressed syllables in reading was found to be 1.8:1 and in speaking 1.5: 1 (Shevchenko, 2011;Shevchenko, 1999Shevchenko, , 2012Shevchenko, , 2013 as cited in Shevchenko, 2015), whereas the average for Russian in formal and semi-formal discourse was 1.3:1 (Savina, 1996). As a result, the overall articulatory effort in English is stronger than in Russian, and the rhythm of the English language, particularly British English, is sometimes compared to staccato (Shevchenko, 2006).
Another phonotactical feature -sound sequence constraints -are to be regarded in relation to initial consonant clusters, which Jenkins includes in her ELF pronunciation core on the grounds that in some languages (e.g., Japanese and some Turkic languages) such long consonant sequences are not possible, with the consequence that speakers insert vowels in between the consonants or drop sounds all together, both of which diminish intelligibility. As for Russian, consonants in clusters are not as closely assimilated as they are in English and therefore tend to be pronounced more as a sequence of separate sounds. This does not specifically impact intelligibility but rather disrupts the fluency and smoothness of the speech stream, which could to some extent undermine rhetorical competence.
Chunking speech into word groups was also found to have an impact on comprehension and attitude. The average comprehensible and nicely sounding word group should typically last two or three seconds and comprise two or three accented words, which tentatively correlate with normal breathing rhythm. If the tempo is faster and the speaker puts more accented words within a word group, the listener perceives such speech as too pushy and exhilarated. It is also harder to discern a message presented in this way, and in the long run listening to such a performance becomes irritating for the listener (Morov, 2005). The differences in syllable duration, word group length and phonotactical rules between a speaker's L1 and English have a great impact on accentuation and vowel reduction in weak forms of words in connected speech. Due to sentence prominence and rhythm, small structural items (i.e., auxiliary verbs, articles, prepositions, pronouns) are reduced in quantity or quality and are pronounced in their weak form. It is claimed that for clarity's sake, NNESs should retain full, non-reduced pronunciation of non-notional words (Jenkins, 2000). Jenkins considers such items "unteachable" and therefore excludes them from the Lingua Franca Core (LFC). However, to accommodate their pronunciation to EIL, NNESs need to be taught weak forms for receptive function (ibid). According to Russian classroom experience, reduced vowel quality comes naturally to Russian learners through the teaching of rhythm, and it would be fairly unnatural to specifically teach them full articulation of vowels in unstressed positions. Dauer, in The Lingua Franca Core: A New Model for Pronunciation Instruction? (2005), also claims that it would be almost impossible for anyone to speak fluently without using weak forms. Pronouncing all the sounds in their full quality at natural speed would be unfeasible (Dauer, 2005, p. 547-548). According to Brown, connected speech is used in English at all levels of formality even in very formal speech (Brown, 2012) because weak forms play an important 'accentuation' role (Gimson, 2001, p.249 as cited in Brown, 2012. Weak forms are important for decoding English speech, and Brown also makes it clear that students with a syllabletimed L1 have considerable difficulty in both speaking and comprehending oral English (Brown, 2012).
To sum up we would suggest that NNESs, and particularly those whose L1 is a syllable-timed language, should raise their awareness of L1/L2 phonotactic transfer. They should acquire accentuation and connected speech primarily through the teaching of rhythm. They should also be taught to exaggerate their articulatory effort to acquire English rhythm for the purpose of meeting specific rhetorical goals, those of clear articulation and appropriate prominence. For speech production and reception purposes, it is important for teachers to segment speech into shorter chunks of two or three accented syllables and slow the pace of delivery.
Accurate word stress is essential both for intelligibility and orthoepy. There are some mistakes in word stress that TATIANA SKOPINTSEVA NNESs repeatedly make, with a wide range of variability across languages. Our classroom experience shows that the words typically mispronounced by Russian ESP/EAP students are consequence, access, control (n.), recognize, and innovate, in which they tend to shift emphasis towards the end of the word in accordance with Russian stress rules.
It has been noted by many ELT specialists that NNESs' oral performance sounds monotonous to a native English speaker's ear. Some speakers (Cantonese, for example) are cited as having a 'sing-song' pattern: They go up and down to the same level as they speak. Others (e.g., Korean, Japanese, Castilian, Spanish) have the so-called 'monotone', where their voice varies very little as they speak (Mayers & Holt, 2002). Besides L1/L2 intonation transfer, this can be partly accounted for by NNESs' linguistic insecurity on the one hand and the natural stage fright of public speaking on the other, which makes pitch change a triple value feature for academic discourse. Because they typically speak too fast and too loudly while emphasizing too many syllables, Russian speakers of English create the impression of being "pushy" in the minds of NESs. It has also been observed that Russian presenters sound "incomplete" in that they lack adequate levels of persuasion and style (Savina and Skopintseva, 2005b).
It is undeniable that pitch is a key prosodic variable in public speaking discourse. Pitch is known to be an essential tool that signposts the discourse structure. It is typically used to emphasize discourse structure, to highlight high-key and low-key information and logical contrasts, and to signal shift to a new topic. Mastering so-called "step-ups" and "step-downs" (Hewings, 2010) is therefore significant for rhetorically competent voices. According to the findings presented below, pitch change and pitch range can be considered suprasegmental features of double advantage for academic discourse.
Contrastive analyses of the intonation variation of NESs with Standard English and Standard American accents showed that the speech of educated middle class speakers is marked by a richer repertoire of tones and a wider pitch range (Shevchenko, 2015). In numerous sociolinguistic experiments that excluded factors of conflicting identity, it was found that pitch range is a contrastive sociocultural factor for distinguishing between standard and regionally accented speech, between a citizen of a metropolis and a small town and between middle class and working class (Shevchenko, 2014). Pitch range was also found to be a gender, age and stylistic factor differentiating contrastive types of discourse. Some experiments also revealed that higher pitched beginnings, a wide pitch range and variable melody (tone) contour are relevant for expressing friendliness and empathy (Glochkina, Shevchenko, 2010).
Besides its rhetorical and sociocultural importance, pitch was also found to be more important in relation to syllable prominence. Among the three correlates of prominence -loudness, pitch, and length -pitch was emphasized as the leading factor in syllable and nuclear stress prominence in English as compared to Russian: The main prosodic feature for Russian speakers would be intensity (loudness) while English speakers would rather vary the pitch under the same circumstances (Savina and Skopintseva, 2005a, p. 74).

Segmentals
Clear and distinct pronunciation of segmentals (i.e., vowels and consonants) is essential for the intelligibility of English as a means of international communication. Shaping a distinctive articulation imprint of an English sound by specific tongue shape, differences in tension, lip and jaw posture works both for NNES speech production and for reception once the articulation habit is automatized or at least retained. Among the ELF segmental core crucial for EIL intelligibility, Jenkins names differences between long and short vowels, the [ə:] vowel, most consonants, and consonant clusters in word-initial positions. The phoneme inventories are different in English and Russian. In English there are 20 vowels and 24 consonants while in Russian there are only 6 vowels and 34 consonants. In accordance with Jenkin's principle of acquiring only some degree of NES accent, we suggest considering only those phonemes whose misarticulation will cause communication breakdown and undermine oral performance.
First  [o] respectively. Thus, the pairs of words bag -bug, stuff -staff, much -March and walk -work, born -burn, course -curse will sound the same. Although teaching practices are beyond the topic of this article, we cannot help mentioning that articulating [ae] and [ə:] are essential for enunciation practice because they require mastering English-specific jaw movement and tongue position that are not typical of the Russian articulatory setting.
Practicing these two English-specific articulation gestures will also contribute to improving diction and lead to clearer pronunciation.  (Shevchenko, 2015, p. 169) LOOKING FOR AN EIL PRONUNCIATION STANDARD Second, there are no diphthongs in Russian, so Russian speakers tend to mix up the pairs of words want -won't, sells -sales, lawn -loan, beer -bear.
Rhetorical competence includes distinct enunciation of word endings. It is well known that consonants in wordinitial positions play an important role in decoding the meaning of words, while clear articulation of word endings is regarded as a sign of educated speech and is a vital feature of rhetorical competence. In addition, the distinct articulation of endings adds to the overall articulatory effort such that words do not run together in a "mumble-jumble" and thoughts are finalized and nicely paced. "Eating" sounds at the end of words produces the impression of hurried speech, and the audience is soon to make a snap judgement about the speaker's education and speech culture.
The English inflectional morphological elements that mark grammatical forms of verbs and nouns (i.e., -s and -ed) are particularly important for phonological reasons since voiced consonants are always devoiced at word endings in Russian. For example, played often sounds like plate and plays like place when uttered by Russian NNESs.
The -ing ending is known to be socially distinct in English because pronouncing the dental stop [n] instead of the velar consonant is more typical of informal speech and the speech of the young (Crystal, 2003;Lychanaya, 2000;Shevchenko, 2006).
So overall, clear enunciation of word endings, in particular of -ed, -s and -ing inflections, might be considered an achievement that has two advantages for academic discourse: intelligibility and speech culture. According to Hancock (2013) (Jenkins, 2000;Seidlehofer, 2011). Although to our knowledge none of the ELF pronunciation core proponents consider interdental fricatives worthy of practice since they do not hamper intelligibility, we suggest including them among the pronunciation "core" on the grounds that th words like theory, hypothesis, thought, think and their derivatives abound in English academic discourse -and mispronouncing them might seem irritating or distracting to NES interlocutors. Certainly, this assumption would need to be supported by further research on NES attitudes toward NNES accents.

Conclusion
We have reviewed and analyzed the extensive research and classroom data related to NNES (Russian) and NES (Standard British and Standard American) pronunciation and identified key segmental and suprasegmental elements crucial for intelligibility, in terms of both the rhetorical and sociocultural competence of NNES oral performance in EIL communication in academic settings. The units were selected based on their importance in developing all three competences considered essential for formal and semi-formal academic discourse. According to our classroom experience, an ELF-oriented approach to building an English accent upon the L1 accent has proven to be an effective teaching method that economizes on teaching and learning time, helps to overcome negative attitude towards the NNES accent, and makes English pronunciation learning goals achievable. Although more classroom data is needed to support hypotheses born of theory and practice, we recommend that the following segmental and suprasegmental features be taught to Russian learners to ensure EIL academic communicative success: • More energetic articulation, particularly when emphasizing prominent syllables • Cardinal vowels quality with a special focus on /ae/ -/ʌ/ -/a:/ difference as in: cat -cut, stuff -staff, much -March and /ɔ:/ -/ə:/ difference as in: walkwork, born -burn, course -curse • Articulation of diphthongs as in: want -won't, sells -sales , bare -beer contrast (n.) -contrast (v.), import (n.) -import (v.) • Acquiring weak forms though rhythm • Pitch change and pitch range to emphasize academic discourse structure, to break the monotony of L1specific intonation patterns in public speaking and to conform to the speech norms of higher status NESs.