Experienced and Novice L2 Raters’ Cognitive Processes while Rating Integrated and Independent Writing Tasks
Abstract
Background. Recently, there has been a growing interest in the personal attributes of raters which determine the quality of cognitive processes involved in their rating writing practice.
Purpose. Accordingly, this research attempted to explore how the rating experience of L2 raters might affect their rating of integrated and independent writing tasks.
Methods. To pursue this aim, 13 experienced and 14 novice Iranian raters were selected through criterion sampling. After attending a training course on rating writing tasks, both groups produced introspective verbal protocols while they were rating integrated and independent writing tasks which were produced by an Iranian EFL learner. The verbal protocols were recorded and transcribed, and their content was analyzed by the researchers.
Results. The six extracted major themes from the content analysis included content, formal requirement, general linguistic range, language use, mechanics of writing, and organization. The results indicated that the type of writing task (integrated vs. independent) is a determining factor for the number of references experienced and novice raters made to the TOEFL-iBT rating rubric. Further, the raters’ rating experience determined the proportions of references they made. Yet, the proportional differences observed between experienced and novice raters in their references were statistically significant only in terms of language use, mechanics of writing, organization, and the total.
Conclusion. The variations in L2 raters’ rating performance on integrated and independent writing tasks emphasize the urgency of professional training to use and interpret the components of various rating writing scales by both experienced and novice raters.
nced and novice raters.
Downloads
References
Ahmadi, A., & Mansoordehghan, S. (2015). Task type and prompt effect on test performance: A focus on IELTS academic writing tasks. Journal of Teaching Language Skills, 6(3), 1-20. DOI: https://doi.org/10.22099/jtls.2015.2897
Allen, S. (2004). Task representation of a Japanese L2 writing and its impact on the usage of source text information. Journal of Asian Pacific Communication, 14(1), 77-89. DOI: https://doi.org/10.1075/japc.14.1.06all
Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115. DOI: https://doi.org/10.1177/0265532215582283
Barkaoui, K. (2010a). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54-74. DOI: https://doi.org/10.1080/15434300903464418
Barkaoui, K. (2010b). Do ESL essay raters' evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31-57. DOI: https://doi.org/10.2307/27785069
Beck, S. W., Llosa, L., Black, K., & Anderson, A. T. (2018). From assessing to teaching writing: What teachers prioritize. Assessing Writing, 37, 68-77. DOI: https://doi.org/10.1016/j.asw.2018.03.003
Brookhart, S. M. (2013). How to create and use rubrics for formative assessment and grading: Alexandria.
Brown, H. D., & Abeywickrama, P. (2018). Language assessment: Principles and classroom practices (3rd ed.). Pearson Education.
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135. DOI: https://doi.org/10.1177/0265532215582282
Dörnyei, Z. (2007). Research methods in applied linguistics: Oxford University Press.
Duijm, K., Schoonen, R., & Hulstijn, J. H. (2018). Professional and non-professional raters' responsiveness to fluency and accuracy in L2 speech: An experimental approach. Language Testing, 35(4), 501-527. DOI: https://doi.org/10.1177/0265532217712553
Eckes, T. (2011).Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270-292. DOI: https://doi.org/10.1080/15434303.2011.649381
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37-64. DOI: https://doi.org/10.1177/0265532207071511
Ericsson, K. A., & Simon, H. (1993). Protocol analysis: Verbal reports as data. MIT Press.
ETS website (2019). The TOEFL family of assessments. Retrieved from https://www.ets.org/toefl.
ETS Website (2020). TOEFL-iBT test independent and integrated writing rubrics. Retrieved from https://www.ets.org/s/toefl/pdf/toefl_writing_rubrics.pdf.
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters' severity and bias in second language writing assessment.International Journal of Language Testing, 1(1), 1-16.
Gallagher, N. (2005). Delta's key to the next generation TOEFL test: Advanced skill practice for the IBT. Delta Publishing Company.
Harsch, C., & Martin, G. (2013).Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281-307. DOI: https://doi.org/10.1080/0969594X.2012.742422
Hoenig, J. M., & Heisey, D. M. (2012). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistican, 55(1), 19-24. DOI: https://doi.org/10.1198/000313001300339897
Hyland, K. (2003). Second language writing: Cambridge University Press.
James, C. L. (2006). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178. DOI: https://doi.org/10.1016/j.asw.2007.01.002
Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task, and scoring method contribution. Language Testing in Asia, 11(30), 1-27. DOI: https://doi.org/10.1186/s40468-021-00134-5
Klimova, B. F. (2013). Developing thinking skills in the course of academic writing. Procedia-Social and Behavioral Sciences, 93, 508-511. DOI: https://doi.org/10.1016/j.sbspro.2013.09.229
Krahmer, E., & Ummelen, N. (2004). Thinking about thinking aloud: A comparison of two verbal protocols for usability testing. IEEE Transactions on Professional Communication, 47(2), 105-117. DOI: https://doi.org/10.1109/TPC.2004.828205
Leung, C., & Lewkowicz, J. (2006). Expanding horizons and unresolved conundrums: Language testing and assessment. TESOL Quarterly, 40(1), 211-234. DOI: https://doi.org/10.2307/40264517
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543-560. DOI: https://doi.org/10.1177/0265532211406422
Long, H., & Pang, W. (2015). Rater effects in creativity assessment: A mixed methods investigation. Thinking Skills and Creativity, 15, 13-25. DOI: https://doi.org/10.1016/j.tsc.2014.10.004
Michel, M., Révész, A., Lu, X., Kourtali, N. E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 307-334. DOI: https://doi.org/10.1177/0267658320915501
Myford, C. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement. Journal of Applied Measurement, 5(2), 189-223. PMID: 15064538
Nikmad, F., & Tavassoli, K. (in press). The impact of test length on raters' mental processes while scoring test-takers' writing performance. Journal of Language Horizons. DOI: https://doi.org/10.22051/LGHOR.2022.37340.1545
Plakans, L. (2010). Independent vs.integrated writing tasks: A comparison of task representation. TESOL Quarterly, 44(1), 185-195. DOI: https://doi.org/10.5054/TQ.2010.215251
Pourdana, N., Nour, P., & Yousefi, F. (2021). Investigating metalinguistic written corrective feedback focused on EFL learners' discourse markers accuracy in mobile-mediated context. Asian-Pacific Journal of Second and Foreign Language Education, 6(7),. DOI: https://doi.org/10.1186/s40862-021-00111-8
Ruiz-Funes, M. (2001). Task representation in foreign language reading-to-write. Foreign Language Annals, 34(3), 226-234. DOI: https://doi.org/10.1111/j.1944-9720.2001.tb02404.x
Şahan, Ö., & Razı, S. (2020). Do experience and text quality matter for raters' decision-making behaviors? Language Testing, 37(3), 311-332. DOI: https://doi.org/10.1177/0265532219900228
Shi, B., Huang, L., & Lu, X. (2020). Effect of prompt type on test-takers' writing performance and writing strategy use in the continuation task. Language Testing, 37(3), 361-388. DOI: https://doi.org/10.1177/0265532220911626
Suskie, L. (2008). Using assessment results to inform teaching practice and promote lasting learning. In G. Joughin (Ed.) Assessment, learning and judgment in higher education (pp. 1-20). Springer Science & Business Media. DOI: https://doi.org/10.1007/978-1-4020-8905-3_8
Swales, J. M., & Feak, C. B. (2004). Academic writing for graduate students: Essential tasks and skills (Vol. 1). University of Michigan Press Ann Arbor, MI. DOI: https://doi.org/10.3998/mpub.2173936
Uludag, P., McDonough, K., & Payant, C. (2021). Does prewriting planning positively impact English L2 students' integrated writing performance? Canadian Journal of Applied Linguistics, 24(3), 166-185. DOI: https://doi.org/10.37213/cjal.2021.31313
Van Moere, A. (2014). Raters and ratings. In A. Kunnan (Ed.), The companion to language assessment, Vol. III: Evaluation, methodology, and interdisciplinary themes (pp. 1358-1374). John Wiley & Sons, Inc. DOI: https://doi.org/10.1002/9781118411360.wbcla106
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. DOI: https://doi.org/10.1177/026553229801500205
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Wolfersberger, M. A. (2007). Second language writing from sources: An ethnographic study of an argument essay task. Unpublished doctoral dissertation, University of Auckland, Auckland, NZ.
Zabihi, R., Mehrani-Rad, M., & Khodi, A. (2019). Assessment of authorial voice strength in L2 argumentative written task performances: Contributions of voice components to text quality. Journal of Writing Research, 11(2), 331-352. DOI: https://doi.org/10.17239/jowr-2019.11.02.04
Zanders, C. J., & Wilson, E. (2019). Holistic, local, and process-oriented: What makes the University Utah's Writing Placement Exam work. Assessing Writing, 41, 84-87. DOI: https://doi.org/10.1016/j.asw.2019.06.003
Copyright (c) 2022 National Research University Higher School of Economics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the Copyright Notice.