Intelligent Approaches to Computer Testing of Perception and Production Skills of Russian EFL Speakers
Abstract
Background: This study addresses a gap in applied phonetics by developing an evidence-based, neural network-driven computer phonetic test for Russian EFL learners. Integrating interdisciplinary methods, the system targets Russian-specific pronunciation deviations and delivers adaptive feedback, thereby advancing both perception and production skills while aligning technological innovation with pedagogical effectiveness.
Purpose: The purpose of the study is twofold: (1) to design and develop a computer-based system employing deep learning neural networks for objectively assessing Russian EFL students’ perception and production skills, and (2) to evaluate the effectiveness and reliability of this system through repeated testing, statistical analysis of learner performance and user feedback.
Methods: A pre-test identified frequent segmental deviations, informing a targeted item pool. The software was developed in Microsoft Visual Studio 2022 (C#) using the Microsoft Speech Recognition Engine. The perception module used randomized audio stimuli (WAV files), while the speech recognition one recorded response via built-in microphones for automated accuracy evaluation. Twenty-five Russian EFL students (B1–B2 CEFR, aged 19–22) completed three test iterations at one-week intervals. Post-test questionnaire assessed usability and perceived learning gains. Data were analysed using descriptive statistics and correlation analysis.
Results: We designed a computer-based system employing deep learning neural networks and assessed its efficiency in Russian EFL learners. The study found a 14.5% overall improvement in participant performance, with results showing a clear linear increase supported by a high R² value. Students performed better in perception tasks than in production practice. Pearson correlation analysis indicated consistent performance between consecutive attempts, supporting robust test-retest reliability. Both modules showed high internal consistency (α = 0.90 for perception, α = 0.88 for production). Participants rated the tool as useful and interesting, although they suggested improving the speech recognition function due to minor technical flaws.
Conclusion: The module focused on testing perception skills can serve as an effective and engaging learning tool. While the pronunciation control component shows potential, its performance can be further enhanced through additional testing with high-sensitivity microphones to refine speech recognition accuracy. Overall, continued exploration of CAPT systems presents a promising direction for future research and innovation.
Downloads
References
Agarwal, C., & Chakraborty, P. (2019). A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Education and Information Technologies, 24(6), 3731-3743. https://doi.org/10.1007/s10639-019-09955-7
Alsuhaibani, Y., Mahdi, H. S., Al Khateeb, A., Al Fadda, H. A., & Alkadi, H. (2024). Web-based pronunciation training and learning consonant clusters among EFL learners. Acta Psychologica, 249, 104459. https://doi.org/10.1016/j.actpsy.2024.104459
Arakin, V. D. (2008). Comparative typology of English and Russian languages (4rd ed.). Fizmatlit.
Backus, A., Cohen, M., Cohn, N., Faber, M., Krahmer, E., Laparle, S., Maier, E., Van Miltenburg, E., Roelofsen, F., Sciubba, E., Scholman, M., Shterionov, D., Sie, M., Tomas, F., Vanmassenhove, E., Venhuizen, N., & De Vos, C. (2023). Minds: Big questions for linguistics in the age of AI. Linguistics in the Netherlands, 40, 301–308. https://doi.org/10.1075/avt.00094.bac
Barriuso, T. A., & Hayes-Harb, R. (2018). High variability phonetic training as a bridge from research to practice. The CATESOL Journal, 30(1), 177-194. https://doi.org/10.5070/B5.35970
Belenko, M. V., & Balakshin, P. V. (2017). Comparative analysis of speech recognition systems with open code. International Research Journal, 4(58), 13-18. https://doi.org/10.23670/IRJ.2017.58.141
Bliss, H., Abel, J. & Gick, B. (2018). Computer-assisted visual articulation feedback in L2 pronunciation instruction: A review. Journal of Second Language Pronunciation, 4, 129-153. https://doi.org/10.1075/jslp.00006.bli
Blok, E. (2019). The planning and customization of introductory L2 phonetic courses on the basis of a numeric scale for assessing non-native speaker mistakes. Rhema, (4), 34-52. https://doi.org/10.31862/2500-2953-2019-4-34-52.
Bondarko, L. V. (1969). Slogovaya struktura rechi i differentsial’nye priznaki fonem (eksperimental’no-foneticheskoe issledovanie na materiale russkogo yazyka) [The syllabic structure of speech and the distinctive features of phonemes (experimental-phonetic research in the Russian language)] [Unpublished doctoral dissertation]. Leningr. gos. un-t im. A. A. Zhdanova. https://search.rsl.ru/ru/record/01010234177.
Church, K., & Liberman, M. (2021). The future of computational linguistics: On beyond alchemy. Frontiers in Artificial Intelligence, 4, 625341. https://doi.org/10.3389/frai.2021.625341
Crystal, D. (1970). Prosodic systems and language acquisition. Prosodic Feature Analysis (pp. 77-90). Didier Montreal and Paris.
Derwing, T. M., Munro, M. J. (2015) Pronunciation fundamentals: Evidence-Based perspectives for L2 teaching and research. John Benjamins.
Dovchin, S. (2024). Artificial Intelligence in Applied Linguistics: A double-edged sword. Australian Review of Applied Linguistics, 47(3), 410–417. https://doi.org/10.1075/aral.24145.dov.
Drost, E. A. (2011). Validity and reliability in social science research. Education Research and Perspectives, 38(1), 105-123. https://search.informit.org/doi/10.3316/.
Flege, J. E., & Bohn, O.-S. (2021). The Revised Speech Learning Model (SLM-r). In R. Wayland (Ed.), Second Language Speech Learning (1st ed., pp. 3–83). Cambridge University Press. https://doi.org/10.1017/9781108886901.002.
Flege, J. E., & Davidian, R. D. (2008). Transfer and developmental processes in adult foreign language speech production. Applied Psycholinguistics, 5(4), 323-347. https://doi.org/10.1017/S014271640000521X.
Fouz-González, J. (2020). Using apps for pronunciation training: An empirical evaluation of the English File Pronunciation App. Language Learning & Technology, 24(1), 62–85. https://doi.org/10125/44709
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. https://doi.org/10.1007/BF00344251
Goncharova, N. L. (2006). Formirovaniye inoyazychnoy fonetiko-fonologicheskoy kompetentsii u studentov-lingvistov: na materiale angliyskogo yazyka [Forming foreign language phonetic-phonological competence of linguistic students based on the material of the English language] [Unpublished doctoral dissertation]. North Caucasus State Technical University. https://search.rsl.ru/ru/record/01003042566
González, M. D. L. Á. G., & Ferreiro, A. L. (2024). Web-assisted instruction for teaching and learning EFL phonetics to Spanish learners: Effectiveness, perceptions and challenges. Computers and Education Open, 7, 100214. https://doi.org/10.1016/j.caeo.2024.100214
Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE. https://doi.org/10.1109/ICASSP.2013.6638947
Hino, N. (2021). Language education from a post-native-speakerist perspective: The case of English as an international language. Russian Journal of Linguistics, 25(2), 528-545. https://doi.org/10.22363/2687-0088-2021-25-2-528-545
Ivanko, A. F., Ivanko, M. A., Sizova, Y. A. (2019). Neural networks: General technological characteristics. Scientific Review. Technical Sciences, (2), 17-23.
Joshi, A., Dabre, R., Kanojia, D., Li, Z., Zhan, H., Haffari, G., & Dippold, D. (2025). Natural Language Processing for dialects of a language: A survey. ACM Computing Surveys, 57(6), 1–37. https://doi.org/10.1145/3712060.
Kulikov, V. G. (2005). Phonological contexts and frames: Toward the unified methodology of cognitive linguistics. Issues of Cognitive Linguistics, (2), 28-40.
Lam, J., Tjaden, K., & Wilding, G. (2012). Acoustics of Clear Speech: Effect of Instruction. Journal of Speech, Language, and Hearing Research, 55(6), 1807–1821. https://doi.org/10.1044/1092-4388(2012/11-0154).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Leonov, A. S., & Sorokin, V. N. (2007). K analizu rezonansnykh chastot rechevogo trakta [To the analysis of the resonant frequencies of the speech tract]. Informacionnye Process, 4(7), 386 - 400.
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge University Press.
Li, Z., Basit, A., Daraz, A., & Jan, A. (2024). Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network. PLOS ONE, 19(1), e0291240. https://doi.org/10.1371/journal.pone.0291240
Luz, S. (2022). Computational linguistics and natural language processing. The Routledge handbook of translation and methodology (pp. 373-391). Routledge. https://doi.org/10.4324/9781315158945-27
Mahdi, H. S., & Al Khateeb, A. A. (2019). The effectiveness of computer-assisted pronunciation training: A meta-analysis. Review of Education, 7(3), 733-753. https://doi.org/10.1002/rev3.3165
McShane, M., & Nirenburg, S. (2021). Linguistics for the Age of AI. Mit Press. https://doi.org/10.7551/mitpress/13618.003.0003
Marefat, F., Hassanzadeh, M., Noureddini, S., & Ranjbar, M. (2025). Reporting practices in applied linguistics quantitative research articles across a decade: A methodological synthesis. System, 131, 103627. https://doi.org/10.1016/j.system.2025.103627
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Information Fusion, 99, 101869. https://doi.org/10.1016/j.inffus.2023.101869
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network-based language model. Interspeech, 2(3), 1045-1048. https://doi.org/10.21437/Interspeech.2010-343
Mooney, D. (2019). Phonetic transfer in language contact: Evidence for equivalence classification in the mid-vowels of Occitan–French bilinguals. Journal of the International Phonetic Association, 49(1), 53-85. https://doi.org/10.1017/S0025100317000366
Munro, M. J., & Derwing, T. M. (2020). Foreign accent, comprehensibility and intelligibility, redux. Journal of Second Language Pronunciation, 6(3), 283-309. https://doi.org/10.1075/jslp.20038.mun
Nesset, T. (2008). Ronald W. Langacker, Cognitive Grammar: A basic introduction. Oxford: Oxford University Press, 2008. Pp. x+562. Journal of Linguistics, 45(2), 477–480. https://doi.org/10.1017/S0022226709005799.
Nickolai, D., Schaefer, E., & Figueroa, P. (2024). Aggregating the evidence of automatic speech recognition research claims in CALL. System, 121, 103250. https://doi.org/10.1016/j.system.2024.103250
O’Brien, M. G., Derwing, T. M., Cucchiarini, C., Hardison, D. M., Mixdorff, H., Thomson, R. I., Strik, H., Levis, J. M., Munro, M. J., Foote J. A. & Levis, G. M. (2018). Directions for the future of technology in pronunciation research and teaching. Journal of Second Language Pronunciation, 4(2), 182-207. https://doi.org/10.1075/jslp.17001.obr
Ohala, J. J. (2010). The relation between phonetics and phonology. In The Handbook of Phonetic Sciences (pp. 653–677). Wiley Blackwell. https://doi.org/10.1002/9781444317251.ch17
Omid, M. (2022). Review of research on the use of Information and Communication Technologies (ICTs) in ELT-related academic writing classrooms. Journal of Language and Education, 8(2), 165-178. https://doi.org/10.17323/jle.2022.13395
Pashkovskaya, S. S. (2010). Differenciruyushaya model obucheniya russkomu proiznosheniyu [Differentiating model of teaching Russian pronunciation] [Unpublished doctor dissertation]. Gos. in-t rus. iaz. im. A.S. Pushkina. https://search.rsl.ru/ru/record/01004949907
Pennington, M. C., & Rogerson-Revell, P. (2019). English pronunciation teaching and research: Contemporary perspectives. Palgrave Macmillan. https://doi.org/10.1057/978-1-137-47677-7
Redmon, C., Leung, K., Wang, Y., McMurray, B., Jongman, A., & Sereno, J. A. (2020). Cross-linguistic perception of clearly spoken English tense and lax vowels based on auditory, visual, and auditory-visual information. Journal of Phonetics, 81, 100980. https://doi.org/10.1016/j.wocn.2020.100980
Rogerson-Revell, P. M. (2021). Computer-assisted pronunciation training (CAPT): Current issues and future directions. RELC Journal, 52(1), 189-205. https://doi.org/10.1177/0033688220977406
Rudregowda, S., Patilkulkarni, S., Ravi, V., H.L., G., & Krichen, M. (2024). Audiovisual speech recognition based on a deep convolutional neural network. Data Science and Management, 7(1), 25–34. https://doi.org/10.1016/j.dsm.2023.10.002
Schildt, H. (2010). C# 4.0: The complete reference. McGraw-Hill.
Sereno, J. A., Jongman, A., Wang, Y., Tupper, P., Behne, D. M., Gu, J., & Ruan, H. (2025). Expectation of speech style improves audio-visual perception of English vowels. Speech Communication, 171, 103243. https://doi.org/10.1016/j.specom.2025.103243
Shadiev, R., & Liu, J. (2023). Review of research on applications of speech recognition technology to assist language learning. ReCALL, 35(1), 74–88. https://doi.org/10.1017/S095834402200012X
Shevchenko, T. I. (2017). Cognitive phonology: Theoretical and applied aspects. Vestnik of Moscow State Linguistic University. Humanities, 5(776), 106-115.
Soundarya, M., Karthikeyan, P. R., & Thangarasu, G. (2023). Automatic speech recognition trained with convolutional neural network and predicted with recurrent neural network. In 2023 9th International Conference on Electrical Energy Systems, (pp. 41-45). IEEE. https://doi.org/10.1109/ICEES57979.2023.10110224
Souza, H. K. D., & Gottardi, W. (2022). How well can ASR technology understand foreign-accented speech? Trabalhos Em Linguística Aplicada, 61(3), 764–781. https://doi.org/10.1590/010318138668782v61n32022
Stratton, J. M. (2025). The effects of production training on speech perception in L2 learners of German. Journal of Phonetics, 108, 101370. https://doi.org/10.1016/j.wocn.2024.101370
Su, Y., & Kuo, C. C. J. (2022). Recurrent neural networks and their memory behavior: A survey. APSIPA Transactions on Signal and Information Processing, 11(1), e26 (1-38). http://dx.doi.org/10.1561/116.00000123
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326-344. https://doi.org/10.1093/applin/amu076
Tikhonova, E., & Raitskaya, L. (2023). ChatGPT: Where is a silver lining? Exploring the realm of GPT and large language models. Journal of Language and Education, 9(3), 5-11. https://doi.org/10.17323/jle.2023.18119
Urip, S., Reli, H., Faruq, U. M., & Mujiyono, W. (2022). Determinants of Technology Acceptance Model (TAM) towards ICT use for English language learning. Journal of Language and Education, 8(2), 17-30. https://doi.org/10.17323/jle.2022.12467
Vishnevskaya, E. M. (2014). Metodika korrekcii fossilizacii foneticheskih navykov bakalavrov pedagogicheskogo obrazovaniya (na materiale anglijskogo yazyka kak vtorogo inostrannogo) [Methodology for correcting the fossilization of phonetic skills of bachelors of pedagogical education (based on the material of English as a second foreign language)] [Unpublished doctor dissertation]. Mosk. gos. gumanitar. un-t im. M.A. Sholokhova]. University Repository. https://search.rsl.ru/ru/record/01007483765
Wang, J., Ahmad, N. K. B., Jamil, H. B., & Darmi, R. (2025). Resonating voices: Unpacking EFL teachers’ beliefs regarding pronunciation instruction in Chinese tertiary context. Journal of Curriculum and Teaching, 14(1), 30. https://doi.org/10.5430/jct.v14n1p30
Wang, X., & Munro, M. J. (2004). Computer-based training for learning English vowel contrasts. System, 32(4), 539-552. https://doi.org/10.1016/j.system.2004.09.011
Wei, Y. (2025). A study of non-native accent correction techniques combining phonetics, machine learning and biomechanics. Molecular & Cellular Biomechanics, 22(1), 725. https://doi.org/10.62617/mcb725
Zou, B., Liviero, S., Ma, Q., Zhang, W., Du, Y., & Xing, P. (2024). Exploring EFL learners’ perceived promise and limitations of using an artificial intelligence speech evaluation system for speaking practice. System, 126, 103497. https://doi.org/10.1016/j.system.2024.103497
Copyright (c) 2025 National Research University Higher School of Economics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the Copyright Notice.