Lexical Bundles of L1 and L2 English Professional Scholars: A Contrastive Corpus-Driven Study on Applied Linguistics Research Articles

The current study examined the structural and functional types of four-word lexical bundles in two different corpora of applied linguistics scientific articles written by L1 English and L1 Indonesian professional writers. The findings show that L2 writers employed a higher number of bundles than L1 writers, but L2 writers underused some of the most typical lexical bundles in L1 English writing. Structurally, unlike previous studies, this study reports the frequent use of prepositional phrase (PP) based bundles in the articles of L2 writers. However, besides the high frequency of PP-based bundles, L2 authors also used a high number of verbal phrasebased bundles, suggesting that these L2 writers were still acquiring more native-like bundles. In terms of functional types, L2 writers employed fewer quantification bundles than their counterparts. This study has potential implications for teaching English for academic writing. Teachers need to raise their students’ awareness of the most frequently used lexical bundles in a specific academic discipline and pay attention to the discourse conventions of academic writing, helping L2 students transition from clausal to phrasal styles.


Introduction
Lexical bundles (henceforth LBs) are defined as "recurrent expressions that usually co-occur in natural language use, regardless of their idiomaticity and their lexical status" (Biber et al., 1999, p. 990), and they can be "identified empirically by running a computer program in a corpus of language texts" (Cortes, 2015, p. 205). They are identified automatically by using a computer program with frequency and distribution thresholds set by researchers (Hyland, 2012). LBs play a significant role in improving the quality of scientific writing for both native and non-native speakers. LBs are seen as a significant aspect of fluent linguistic production and a noticeable feature of academic written texts (Hyland & Jiang, 2018). Hyland (2012, p, 153) emphasises the importance of LBs for writers and speakers in three points: "(1) their repetition offers users (and particularly students) ready-made sets of words to work with; (2) they help define fluent use and therefore expertise and legitimate disciplinary membership; (3) they reveal the lexico-grammatical community-authorized ways of making-meanings". LBs are therefore seen to be very important in the formulation of texts.
Regarding the second point, LBs could help writers claim their membership in a particular discourse community (Ädel & Erman, 2012). Wray (2006) explains that, when speaking, people choose a specific turn of phrase that they consider to be related to certain values, styles, and groups. In other words, they help registered community members show solidarity with other members (Esfandiari & Barbary, 2017) and build a disciplinary experienced voice (Pang, 2010). Thus, LBs tend to reflect an authentic part of users' communicative experiences (Hyland & Jiang, 2018).
LBs have also drawn the interest of linguists to explore the role of LBs in teaching and learning academic writing. For instance, in English for Academic Purposes (EAP), exposure to multi-word constructions helps Fajri, M. S. A., Kirana, A. W., & Putri, C. I. K. (2020). Lexical Bundles of L1 and L2 English Professional Scholars: A Contrastive Corpus-Driven Study on Applied Linguistics Research Articles. Journal of Language and Education,6(4), 76-89. https://doi.org/10.17323/jle.2020.11271 LEXICAL BUNDLES OF L1 AND L2 ENGLISH PROFESSIONAL SCHOLARS students gain a better understanding of the language style of academic textbooks (Wood & Appel, 2014) since LBs make up 21-52.3% of written discourse (Biber et al., 1999). The absence of multiword expressions thus might indicate a writer's lack of expertise in academic contexts (Wray, 2002). In other words, LBs enable us to distinguish novice and expert users of a certain language in different contexts both in oral and written forms, which are seemingly useful in teaching and learning activities especially in enhancing speaking and writing skills. Noticing the significance of those studies on LBs, this article aims to explore the application of LBs by professional writers for whom English is their native language (L1) and those for whom English is their L2, or foreign language, in academic articles within the discipline of applied linguistics.

Structural and Functional Characteristics of Lexical Bundles
Most LBs are incomplete structural units that comprise two or more words, and they can be categorised into different types of structures as they have strong grammatical correlations (Cortes, 2004). Biber et al. (1999) categorised the grammatical structures of LBs into three common forms: verb-phrase bundles which refer to any word combinations with a verb component such as it is also possible, can be noted that, and it is likely that; noun-phrase fragments which refer to any noun phrases with post-modifier fragments such as the use of the, the nature of the, and the way in which; and prepositional phrase bundles which include any bundles starting with a preposition plus a noun-phrase fragment such as in addition to the, in the context of, and at the end of. Different registers require different grammatical structures. For example, Biber et al. (1999) argued that LBs in a conversation contain mostly clause fragments (60%) and only 15% of them were phrases. In contrast, in academic prose, LBs were mostly in the form of phrases and less than 5% were clausal constructions.
In terms of functions, Biber et al. (2004) divided LBs into three main categories: stance expression (e.g. the fact that the), discourse organizers (as well as the), and referential expression (e.g. one of the most). Like in the LBs structures, they also found a dramatic difference between oral and written registers in their dependence on LBs' functional types. Conversations or spoken registers mostly use stance expression bundles, while academic writing mostly uses referential expressions. On the basis of Biber et al.'s taxonomy, Hyland (2008a) developed a similar functional taxonomy including research-oriented, text-oriented, and participant-oriented bundles. Their taxonomies differ in the sense that Biber et al.'s (2004) classification was based on both written and oral registers that covered various genres, while Hyland's (2008a) taxonomy was far more specific, focusing on written registers only. Therefore, this study used the functional taxonomy proposed by Hyland (2008a).

Previous Studies on Lexical Bundles
To discover the application of LBs, a corpus has been widely employed for analysing various types of texts (written and spoken) in different languages (e.g. Kim, 2009;Ruan, 2017;Wang, 2017;Wright, 2019). Biber, Conrad, and Cortes (2004) investigated LBs of oral and written registers, including conversations, lectures, textbooks, and academic prose. In several other academic writing genres, linguists have investigated LBs in theses, dissertations, and students' academic writing in a range of academic disciplines. For example, Hyland (2008a) examined the forms, structures, and functions of four-word clusters in a corpus of research articles, dissertations, and theses in four academic disciplines: engineering, microbiology, business, and applied linguistics, while Cortes (2004) compared research articles and students' writing within the disciplines of history and biology. Broadening the scope of the field of knowledge, Kwary et al. (2017) analysed the use of LBs in journal articles of four wide academic disciplines: life sciences, physical sciences, health sciences, and social sciences. These previous studies generally suggest that LBs vary in their discourse functions and their use differs from one discipline or register to another.
Three studies have explored the use of LBs in L2 writing at different proficiency levels. Staples et al. (2013) investigated LBs used by non-native English speakers with different proficiency levels in prompted TOEFL writing. Their research shows that learners at lower proficiency level preferred to employ more clusters than those at higher proficiency levels, lending weight to the second language acquisition theory that as students acquire more proficiency in a second language, they have a tendency to use fewer formulaic structures (Ellis, 1996). Chen and Baker (2016) studied second language development by comparing the use of LBs in L2 English (L1 Chinese) rated learner essays across three levels of Common European Framework of Reference (B1, B2 & C1). Their findings indicate that learners' writing at lower levels is likely to share more features with conversation, relying more on colloquial quantifiers, while the discourse of more advance writing has a more impersonal tone, closer to that of academic prose. The findings also show that the CEFR-B2 level seems to be a transition stage in which learners start to recognise the differences between formal and informal writing. In the field of applied linguistics, the LB studies compared English learner writing with professional writing. For instance, Wei and Lei (2011) examined LBs used by advanced Chinese EFL learners and professional writers in the field of applied linguistics, and Qin (2014) compared clusters used by non-native English graduate students at different levels of study and authors of applied linguistics journal articles. These comparisons may make results difficult to interpret because learners and professional writers have different English and writing proficiency levels and specific writing purposes for unique audiences.
Other studies have compared the use of LBs by L1 English and L2 English writers. For instance, Chen and Baker (2010) compared the use of LBs in academic writing by non-native English students with native peer students and native expert writers. They concluded that the structures and functions of LBs in L1 and L2 student writing is similar. However, L2 students tended to underuse some typical bundles in professional academic writing. In a similar vein, Ädel and Erman (2011) analysed the use and the functions of English LBs in advanced undergraduate writing by L1 English and L1 Swedish students. The findings indicated that L1 writers deployed a larger number of LBs with a wider variety, which are generally similar to the results of phraseological analysis tradition in Second Language Acquisition (SLA) (Ädel and Erman, 2011).
There are only a few studies that investigated the use of LBs by L1 and L2 English academic professionals in international journals. Perez-Llantada (2014), for example, analysed the convergent and divergent usage in academic articles from twelve disciplines. However, the wide variety of scientific disciplines is likely to skew the results of the study since registers, genres, and disciplines all affect the structure and function of LBs (Esfandiari & Barbary, 2017). Pan, Reppen, and Biber (2016) compared the use of LBs by L1 and L2 English professional academics in telecommunications articles. Esfandiari and Barbary (2017) analysed the use of LBs by L1 and L2 English professional academics in psychology research articles. However, there is little work that has been devoted to the use of LBs by L1 English and L2 English professional writers, especially L1 Indonesian writers, in the field of applied linguistics. This study tried to fill this gap by analysing and comparing the use, structure, and function of LBs in applied linguistics academic articles written by English professional writers (L1 English) and Indonesian professional writers (L2 English). Examining applied linguistics articles is significant since journal article authors in this field, whose expertise is related to language, seem to be more aware of the use of formulaic language or bundles, which may affect the use of LBs in their writing. This study therefore can contribute to the ongoing discussion regarding the influence of academic disciplines on the use of LBs in professional academic writing.

Methodology Corpus Construction
The corpora of the present study are a collection of applied linguistic scientific articles published in a threeyear period from 2016 to 2018. The decision to include articles from a three-year period intended to mitigate the over-usage of LBs in special issue publications that probably occur in a certain journal, thus avoiding idiosyncrasies of specific issues or topics. The two corpora are the English corpus (EC) comprising articles written by L1 English academics and the Indonesian corpus (IC) consisting of research articles written by L1 Indonesian professional authors.
The EC was collected from scientific research articles published by internationally reputable journals in the field of applied linguistics that have high Impact Factors (IF) and are indexed in the Scopus database and Social Science Citation Index (SSCI). Meanwhile, the IC was taken from research articles published in Indonesian applied linguistics international journals that are indexed by Scopus or Directory of Open Access Journal (DOAJ) and accredited by Indonesian Ministry of Research, Technology and High Education. The selected Indonesian journals only publish articles in English (see Appendix 1 for the journal list). Both corpora have a similar total number of words, with approximately 1,300,000 words in each corpus. The number of words in each corpus was kept equal since LBs are significantly more sensitive to the number of words in a corpus than the number of articles (Cortes, 2004). Therefore, in our corpus there are fewer articles and journals in the EC as the EC articles were typically longer than the IC articles (see Table 1). We considered the selected journals for the IC to be equivalent to the journals for the EC for several reasons. First, the journals for the IC are peer-reviewed journals, following the academic conventions of international journals. Second, the journals in the IC are indexed by international research article databases. Both EC and IC articles consisted of Introduction, Methods, Results/Findings, and Discussion sections (Martinez et al., 2009). Additionally, articles published in Indonesia tend to reflect the language produced by Indonesian authors in Indonesian contexts for international audiences.
To ascertain the first language of the author(s), we followed the method proposed by Wood (2001) which defines L1 English writers as those whose first and last name are considered as typical native English speaker names and those who are affiliated with institutions in countries that use English as their first language. Therefore, L1 Indonesian writers are also categorised as all writers whose first and last names are considered typical Indonesian names and those who are affiliated with Indonesian institutions. We, thus, excluded articles from Indonesian journals where any of the authors did not fulfil both criteria.

Identification of Lexical Bundles (LBs)
This research considered two criteria in identifying LBs, namely frequency and dispersion. For these relatively big corpora, the standardised frequency threshold was set as 40 occurrences per million words to identify bundles that are often considered as characteristics of target texts (Pan et.al., 2016). The cut-off frequency of 40 per million words was equivalent to minimum raw frequency of 53 occurrences for both corpora. This was calculated by multiplying the cut-off frequency by the corpus size and then dividing the result by one million (Wood & Appel, 2014).
The dispersion threshold holds a significant role to avoid individual author idiosyncrasies. Thus, it needs to be clearly determined to guarantee that the bundles are not only used by a handful of authors or texts. Following Chen and Baker (2010) and Hyland (2008a), the current study only included those bundles that occurred in at least 10% of the total texts in each corpus (Hyland, 2008a). The bundles therefore must occur in at least 27 and 16 articles in the IC and EC respectively. Besides, the length of the word sequences (LBs' length) included in the study must also be determined. This study focused on 4-word bundles "because they are far more common than 5-word strings and offer a clearer range of structures and functions than 3-word bundles" (Hyland, 2008b, p.8).
The corpus software AntConc 1 was used to retrieve the bundles. However, context/content dependent bundles such as teaching and learning process, as a foreign language or in the united states were excluded since "they are not the 'building blocks' which carry a distinct discourse function" (Chen, 2009, p. 58). In addition, overlapping clusters were also checked manually via concordance analyses to avoid inflated results of quantitative analyses (Chen & Baker, 2010). For example, in the IC it can be seen and can be seen that occurred as a subset of the 5-word sequence it can be seen that. Thus, the lower frequency sequence was combined into the higher frequency one: it can be seen (that). After identifying LBs with the above-mentioned criteria, we compared the frequency, structure, and function of our LBs. In analysing LBs' structures, we used Biber et al.'s (1999) classification which includes noun phrase-based (e.g. the use of the), prepositional phrase-based (e.g. in the case of), and verb phrasebased bundles (e.g. it can be seen). For functional analysis, we employed Hyland's classification (2008b) since it is more relevant to academic writing domain. The classification includes research-oriented bundles, which are used to structure writers' experiences (e.g. the use of the), text-oriented bundles, which are concerned with the organisation of the text or discourse (e.g. in addition to the), and participant-oriented bundles, which are focused on stance and engagement (e.g. are likely to be).
To classify the bundles, each author worked independently and the inter-rater reliability was 97% (structure) and 94% (functions). The discrepancies were then discussed to reach 100% agreement based on their contexts. We acknowledged that there were several multi-functional bundles (e.g. in the present study), which also have MUCHAMAD SHOLAKHUDDIN AL FAJRI, ANGKITA WASITO KIRANA, CELYA INTAN KHARISMA PUTRI been recognised by the previous studies (Güngör & Uysal, 2020;Salazar, 2014). In these cases, the ambiguous bundles were categorised according to their primary function after cross-checking their contexts.

Comparison of Frequency
After excluding the content/context dependant and overlapping bundles, we identified 2,700 and 4,874 lexical bundle tokens in the EC and IC respectively, which comprised 31 different bundle types in the EC, and 51 bundle types in the IC (see Appendix 2 for the list of LBs). This finding is congruent with several previous studies including Pan et al.'s (2016) research, which found that L2 professional academic writers in telecommunications used LBs more frequently than their L1 counterparts, and Güngör and Uysal's (2016) study, which showed that L2 English academic authors in educational sciences used a larger number of LBs than LI English writers. This is, however, contrary to previous studies that compared the use of LBs in the corpora of native and non-native student writing (e.g. Adel & Erman, 2012;DeCock, 2004), in that L2 learners employed a lower number of bundles than native English students. One of the reasons for the lower number of LBs in L2 student writing is the learners' incorrect use of articles (e.g. the omission of required definite articles within LBs) (Shin, Cortes, & Yoo, 2018), which may not apply to L2 professional writing since academics are comparatively competent writers. Thus, it seems that when it comes to professional academic writing, L2 writers including Indonesians are likely to employ a substantially higher frequency of bundle types and tokens than L1 authors. Hyland (2008a) points out that both groups of expert writers use the fewest clusters compared to master's and doctoral students. This indicates that L2 expert writers still rely on formulaic expressions to some extent. This greater reliance on LBs might also suggest the comparatively smaller vocabulary of L2 writers, while L1 professional writers might be able to present their arguments in a more flexible manner.
Additionally, a comparative analysis showed that 14 bundles were shared between both groups (e.g. on the other hand, at the same time, and as well as the), indicating that nearly half of the LBs used by native writers (14 of 31) were also employed by non-native writers. Most of the shared bundles (9 of 14) are text-oriented ones, which might not be surprising as text-oriented bundles are common in soft sciences research articles (Hyland, 2008b). However, only two LBs (on the other hand and the results of the) were shared among the top ten most frequently used bundles. The top two most frequent LBs in the EC (the extent to which, in the present study) were not used by IC writers. Qin (2014) also reported the absence and low frequency of the extent to which in the master's and doctoral student writing in applied linguistics. This indicates that these L2 advanced writers were still not aware of some typical academic LBs in the field of applied linguistics. Table 2 shows the distribution of structural subcategories of the LBs used by both English and Indonesian writers. Log-likelihood tests comparing the number of tokens in each category were conducted to measure significant differences across the corpora. The results demonstrated that IC writers used considerably more VP-based and PP-based bundle tokens than EC writers. For NP-based bundles, although this category does not indicate substantial differences, the subcategory shows a different pattern. L1 English authors employed significantly more NP with other post-modifier fragments, while L1 Indonesian authors used significantly more NP with of-phrase. Examples of NP-based bundles are presented below:

Comparison of Structural Types
This paper explores the extent to which corpus linguistics can contribute to the study of language ideology in both explicit and implicit forms in news media. (EC: NP-other) Moreover, the use of the nomination strategy also indicates that the Jakarta Post tries to avoid ambiguity. (IC: NP-of) In terms of the structural distribution of LBs (see Table 3), the comparison of the percentages of the main structural categories in both corpora shows that EC writers mostly employed phrasal bundles (NP-and PPbased bundles), accounting for 84% of bundle types and tokens. This finding is congruent with previous studies that point out that the frequency and percentage of phrasal bundles are higher than clausal bundles in English academic prose , and L1 English professional writers used considerably more NP-and PP-based bundles than VP-based bundles in academic research articles (Pan et al., 2016).  On the other hand, IC writers predominantly used PP-based and VP-based clusters, accounting for 77% of the types and 78% of the tokens. This result contrasts somewhat with previous studies on writing from other disciplines that revealed that VP-based bundles were more frequent than the other categories (NP and PP) in L2 writing. For example, Pan et al. (2016) found more VP-based bundles (58%) than NP-and PP-based bundles (34%) in L1 Chinese professional writing in telecommunications journals, and Chen and Baker (2010) found more VP-based bundles (52.3%) than NP-and PP-based bundles (47.5%) in L2 learners' writing. With the high frequency of PP-based bundles in the IC corpus, it therefore suggests that L2 professional Indonesian writers in the field of applied linguistics demonstrate relatively higher academic writing proficiency since both L1 and L2 writers will shift from the clausal to phrasal style as their writing skills increase (Bychkovska & Lee, 2017;Pan et al., 2016;Staples et al., 2013). Wei and Lei (2011) also found that advanced Chinese EFL learners in the discipline of applied linguistics used more NP-and PP-based four-word bundles than VP-based formulaic sequences, which is in contrast to previous research on Chinese EFL learner writing in other disciplines (e.g. Bychkovska & Lee, 2017;Pang, 2009). The use of more phrasal bundles may be due to the fact that applied linguistics majors require a more advanced level of English even at the undergraduate level so students and researchers in this field might be more aware of the conventions of English academic register. However, it should be noted that the considerable use of clausal constructions or VP-based bundles in the IC (37% of the types and 38% of the tokens) indicates that these L1 Indonesian expert writers were still in the process of obtaining more appropriate academic English or acquiring more native-like LBs.

Comparison of functional types
As shown in Table 4, the two corpora contained a relatively similar proportion of the three main functional categories. Text-oriented bundles (types and tokens) comprised the largest proportion in both the EC and IC, with similar percentages at 55% and 51% respectively for types, and 57% and 55% respectively for tokens, while participant-oriented bundles constituted the smallest proportion, accounting for 6% of types and tokens in the EC, and 8% of types and 9% of tokens in the IC. Text-oriented bundles are significantly used in applied linguistics journal articles, more generally in the social sciences, to "provide familiar and shorthand ways of engaging with a literature, providing warrants, connecting ideas, directing readers around the text, and specifying limitations", representing the more discursive and evaluative patterns of arguments in the soft sciences (Hyland, 2008b, p. 16). This finding echoes previous related research on LBs used by Persian writers in psychology research articles (Esfandiari & Barbary, 2017), and LBs used by Chinese writers in telecommunications journal articles (Pan et al., 2016). This indicates that L1 and L2 English professional writers do not differ much in the proportion of the main functional distributions of LBs. However, Table 5 shows profound differences in the frequency of bundle tokens across the two corpora in nearly all subcategories of functional types. The results of log-likelihood tests comparing the number of tokens in each category demonstrate that IC (L2) writers used LBs significantly more frequently than EC (L1) writers in most functional subcategories (procedure, description, transition, resultative, structuring, framing, and engagement), but less recurrently in the subcategories of location and quantification. While L2 writers used fewer bundle tokens of location than L1 writers, the difference in the number of tokens was not significant.
Quantification was the only subcategory that L2 writers employed bundle tokens significantly less recurrently than L1 writers did. EC writers used four types of quantification bundles (the extent to which, the total number of, a wide range of, and the degree to which), which were not used by IC writers. Cortes (2004, p. 415) found the absence of quantification LBs in the learner academic writing in biology. Chen and Baker (2010) also noticed the absence of extent/degree modifiers such as the extent to which and the degree to which (quantifying bundles) in learner writing. From this, we can assume that L2 writers including professional academic writers pay less attention to the use of quantifying bundles in their academic writing. Framing is the subcategory that makes up the largest proportion of bundles in both corpora, which is congruent with the findings of Hyland's (2008a) and Hyland and Jiang's (2018) studies on applied linguistics research articles. This subcategory is used to elaborate arguments by specifying cases (in the case of) and pointing to limitations (with the exception of) (Hyland, 2008a). For example: Moreover, students needed to draw on similar skills and knowledge to those required for their disciplinary assignments, especially in the case of students from Applied Linguistics/ TESOL and Education. (EC) It will also benefit those teaching Computer-Assisted Language Learning (CALL) courses in EFL contexts, particularly in the context of higher education in Indonesia. (IC) However, unlike previous studies that suggest that L2 expert writers employed four-word framing bundles significantly less frequently than their counterparts (e.g. Esfandiari & Barbary, 2017;Pan et al., 2016), this study shows the opposite. This may be due to the higher frequency of PP-based bundles in the IC corpus, since framing clusters consist mainly of PP-based bundles (Pan et al., 2016).
In the participant-oriented category, we did not find any stance bundles in either corpora. Stance refers to how writers explicitly convey their attitudes, epistemic and affective judgments, and evaluations (e.g. it is obvious that and are likely to be) (Hyland, 2008a(Hyland, , 2008b. This finding might lend weigh to Hyland and Jiang's (2018) analysis that showed a dramatic decline (-38.2%) in the stance bundle tokens of applied linguistics journal articles over a 50-year period from 1965 to 2015. Engagement bundles also experienced a decrease, but not significantly, around -9.2% (Hyland & Jiang, 2018). In the present study, similar to Esfandiari and Barbary's (2017) and Pan et al.'s (2016) studies, we found that L2 English writers used relatively more engagement bundle types and tokens than L1 English writers to engage readers and guide them to particular interpretations. For instance: It is important to note that this study uses the term reader knowledge rather than reader characteristics to specifically refer to linguistic knowledge in terms of grammatical knowledge. (IC).
From the teachers' responses, it can be seen that teachers in this study share similar beliefs in grading the students although they come from different schools. . . (IC) It is important to note that three of the four engagement clusters used by IC writers are in the form of anticipatory it and passive constructions (it can be seen, it can be concluded, it can be said), which indicate an impersonal tone. This might be influenced by the writers' preference for impersonality in their academic writing. Like in Hong Kong (Hyland, 2008a) and mainland China (Wei & Lei, 2011), impersonality devices such as passive patterns in academic writing are also suggested in Indonesian universities, accounting for the overuse of passive structure bundles. This might support Li, Franken, and Wu's (2018) study, which argued that one of the reasons for the different bundle selections of L2 learners is classroom learning.

Conclusion
In the present study, we compared the frequency, structure, and function of four-word LBs of L1 and L2 professional writers in research articles published in applied linguistics journals. In terms of frequency, L2 writers employed a higher number of bundles than L1 writers, but L2 writers underused some of the most typical LBs in native writing, such as the extent to which. Structurally, unlike previous studies (e.g. Chen & Baker, 2010;Pan et al., 2016), in this study, L2 writers, more specifically Indonesian writers, predominantly used PP-based clusters, which may indicate that L2 professional writers in the field of applied linguistics demonstrated relatively high academic writing proficiency. However, despite the high frequency of PP-based bundles, L2 authors still used a significant number of VP-based bundles, suggesting that these Indonesian professional writers (L2) were still acquiring more native-like LBs, such as NP-and PP-based formulaic sequences. Functionally, L1 English and L2 English professional writers did not differ much in the proportion of the main functional distributions of LBs, but showed marked differences in nearly all subcategories of functional types. L2 writers employed quantification bundles significantly less frequently than their counterparts, which also had been reported by the previous studies (Chen & Baker, 2010;Cortes, 2004). These findings show that L2 English journal article authors in the discipline of applied linguistics are more aware of the use of LBs, compared to other disciplines. This lends weight to the argument that academic disciplines influence the use of LBs in professional academic writing.
It is important to point out the two limitations of this research. First, the method of determining L1 and L2 writers may not be a satisfactory approach since it may not be accurate to identify the first language of the author(s) by means of their names and institutions. Second, log-likelihood tests have recently been suggested as problematic for measuring lexical variations by Bestgen (2017). However, the test is still commonly used by many lexical bundle researchers including Hyland and Jiang (2018) so we chose this test together with the calculation of percentages for easier comparisons with other bundle studies. Therefore, future studies can employ different methods of determining L1 and L2 writers, such as by looking at their profile and education background, and different lexical variation measures, such as proportions. Also, examining crosslinguistic influence of L1 Indonesian on the use of L2 English bundles can be conducted.
Despite these limitations, the findings of this study have potential implications for academic writing pedagogy.
Teachers are suggested to raise learners' awareness of the most frequently used LBs in a specific academic discipline and to pay attention to the discourse style of academic writing, helping L2 students shift from clausal to phrasal bundle use. Focusing on the most typical bundles in native writing that are underused by L2 writers, such as quantification bundles, is also suggested.