A New Type of Lexicographic Product: Thesaurus of Text Strings. Field of EFL/ESL

The “Thesaurus of Text Strings: The field of EFL/ESL” (TTS) is a structured collection of text fragments extracted from various texts, both printed and digital, that deal with teaching and learning English as a foreign or second language. While the sublanguage of ELF/ESL has been vastly discussed in literature, the TTS is a radically new type of dictionary due to the nature of its constituent objects, the text strings. A text string (TS) is a lexicographic object of unique status; as such, it has not been used before. It is different from all other objects traditionally treated in dictionaries of various types, such as words, collocations, idioms, proverbs and other reproducible linguistic units. TSs have been extracted from specialized texts, they are supposed to reflect the various aspects, even the minute ones, of the referential situations presented in the texts. The TSs in the Thesaurus are arranged mostly according to the conceptual structure of the Foreign Languages Teaching Methodology (a deductive logical procedure, ‘head – bottom’), but on the lower, more concrete, levels of analysis the TSs have to be grouped following the opposite logical direction, ‘bottom – up’ as the Teaching Methodology concepts prove to be too general to differentiate between finer meaning distinctions of numerous TSs. The TTS supplies a considerable amount of carefully structured professional information in language form. It is aimed primarily at teachers of English who are not native speakers of the language and who wish to make their professional communication in English more authentic. It can also be used in classroom activities with students who are preparing for teaching careers. Thus, a conclusion may be justified that the TTS has both the theoretical significance for lexicography and the practical value as a good professional teaching material. The TTS may also be meaningfully considered against the background of today’s Corpus Linguistics. Though not a ‘true’ corpus per se, it has certain features that are essentially similar to those of contemporary linguistic corpora.

The "Thesaurus of Text Strings: The field of EFL/ESL" (TTS) is a structured collection of text fragments extracted from various texts, both printed and digital, that deal with teaching and learning English as a foreign or second language. While the sublanguage of ELF/ESL has been vastly discussed in literature, the TTS is a radically new type of dictionary due to the nature of its constituent objects, the text strings. A text string (TS) is a lexicographic object of unique status; as such, it has not been used before.
It is different from all other objects traditionally treated in dictionaries of various types, such as words, collocations, idioms, proverbs and other reproducible linguistic units. TSs have been extracted from specialized texts, they are supposed to reflect the various aspects, even the minute ones, of the referential situations presented in the texts. The TSs in the Thesaurus are arranged mostly according to the conceptual structure of the Foreign Languages Teaching Methodology (a deductive logical procedure, 'head -bottom'), but on the lower, more concrete, levels of analysis the TSs have to be grouped following the opposite logical direction, 'bottom -up' as the Teaching Methodology concepts prove to be too general to differentiate between finer meaning distinctions of numerous TSs. The TTS supplies a considerable amount of carefully structured professional information in language form. It is aimed primarily at teachers of English who are not native speakers of the language and who wish to make their professional communication in English more authentic. It can also be used in classroom activities with students who are preparing for teaching careers. Thus, a conclusion may be justified that the TTS has both the theoretical significance for lexicography and the practical value as a good professional teaching material. The TTS may also be meaningfully considered against the background of today's Corpus Linguistics. Though not a 'true' corpus per se, it has certain features that are essentially similar to those of contemporary linguistic corpora.
Keywords: thesaurus, text string, ideographic dictionary structure, English teaching, corpus linguistics The "Thesaurus of Text Strings: The field of EFL/ESL" (henceforth, TTS) has been conceived and developed in the 2000s-2010s at the English Department of Orel State University (Russia). At various stages of the work, the author was helped by alternating groups of undergraduate students (twenty-two young people) whose efforts were of immeasurable service and to whom sincere gratitude is due.
The TTS is a collection of minimally independent fragments of texts. The texts are specialized in that they all deal with a special field in the language, so called 'sublanguage' (Gorodetski, 1988;Gorodetski, Raskin, 1971, pp. 13-24), namely, practice and theory of teaching English as a foreign or English as a second, language. The status of the text fragments, strings, will be discussed in this article later.
The concept of TTS originated as a response to the challenge of supplying English teachers (ETs) who are non-native English speakers with numerous systematized samples of professional speech that Keselman, I. (2016). A New Type of Lexicographic Product: Thesaurus of Text Strings. Field of EFL/ESL. Journal of Language andEducation, 2(3), 82-89. doi:10.17323/2411-7390-2016-2-3-82-89 A NEW TYPE OF LEXICOGRAPHIC PRODUCT: THESAURUS OF TEXT STRINGS. FIELD OF EFL/ESL might make their speech performance more authentic. True, there are numerous textbooks, teaching aids and web-sites that supply ETs with ideas, tips and recommendations for their professional activities. They may also be of help to those who aspire to improve their language knowledge and usage. Some ETs actually study them as well as they may study fiction, the media and other English-language sources to improve their language proficiency. However, few ETs, especially those who teach children in schools, can afford time and effort for regular and systematic studies on top of their busy schedules at work. Besides, regular studies may give a person knowledge and skills, while a reference aid can give them an immediate answer to the current question fast and with less effort. This is a function (one of many) of good dictionaries. The TTS has been developed with this function in mind. ETs are expected to be able to get ready-made phrases that represent elements of pedagogical situations evolving in the context of teaching or learning EFL/ESL. Generally, the TTS functions like many other books of reference and dictionaries. What makes it different is the kind of linguistic objects that constitute its content, the text strings (TSs). A TS is a new lexicographic notion, it has never been used before in dictionary making.

Lexicographic Properties of Text Strings
The usual item that is given in dictionaries is the word. In many languages the term for 'dictionary' includes a reference to that of 'word'. Compare, English 'wordbook' and other Germanic languages, e.g., German 'Woerterbuch', Dutch 'woordenboek', Norwegian 'ordbok', also, for Slavic languages, Russian 'slovo -slovarj', Czech, Slovak and Ukrainian 'slovnik', Polish 'słownik', Bulgarian and Serbian 'rechnik' from 'rech' meaning word. Other languages show evidence of similar tendency, for instance, Hungarian 'szótar' from 'szó' or Tatar 'suzlek' from 'suz'. The notions of a dictionary and a word are closely related in human mentality, which is not surprising as the first dictionaries were in fact just collections of words.
Dictionaries can also comprise units longer than a word. There are dictionaries of collocations, idioms, proverbs, quotations, etc. The units presented in such dictionaries are basically unchangeable when used in speech or texts, lending themselves to grammar alterations in particular contexts); they are intrinsically associated with the dictionary 'head words' and are also reproducible with different degrees of probability. The TSs, which are also combinations of two or more words, have only potential ability to be reused in texts (reproducibility), their structure and composition is by far less rigid, besides, they have certain remnants of semantic connection with the type of texts they have been drawn from (intertextuality).
In terms of traditionally viewed structure, a TS may be a 'free' combination of words, minimal (class discussion, choose the topic) or expanded (determine the frequency of assessment); it may coincide with collocations (give homework) or their expansions (give homework orally; form debate teams in class); it may coincide with a full sentence (it is impossible to entirely eliminate the possibility of different answers; attention is paid to the nuances of speech delivery) or be a part of a grammatically complete sentence (be comfortable reading the dialogue; require that the content be related to the current lesson). Thus, the 'traditional' principles of classification for combinations of words may be considered irrelevant for TSs. What is relevant is their relation to various aspects of the chosen sphere of language usage, the EFL/ESL teaching and learning. This relation is thematic in essence, which is ensured by the choice of source texts for extracting TSs. All these texts deal with EFL/ESL or with the broader theme of teaching foreign languages in general.
To ensure a better comprehension of the nature of TSs in TTS an example is given -a description of a typical way the TSs were extracted from a text. The procedure is not formalized and the choices are mostly made intuitively, the main criterion being the thematic property of TSs, which is supported by the thematic properties of the analyzed texts. To illustrate the process, the sample excerpt below is given from an article published by David Petrie on the teachingenglish. org.uk site. The likely "candidates" are numbered. The collected TSs refer to different areas of the referential field of EFL/ESL and represent them unevenly. There are approximately seven times as many TSs representing teaching materials and equipment than those denoting features of education system or results and achievements. This difference in numbers is not a deficiency of the collected corpus of TSs, It should be viewed as an indication of the popularity of different aspects of the referential field as they are treated in special literature.

The Macrostructure of the Thesaurus
It is quite obvious that the extracted TSs cannot be arranged alphabetically, they are to be grouped together on the basis of semantic and/or thematic proximity (Keselman, 2003, p. 55). This requires working out classification principles that are called ideographic and that should embrace the concepts that comprise the field. Actually, attempts have been made to analyze 'the whole world', break it into component concepts and see how language reflects them by building 'a synoptic map of the world' (Karaulov, 1976, pp. 242-274). Similar ideographic arrangement of language units that represent the field of EFL/ESL became the foundation of the TTS structure.
Broadly speaking, the 'synoptic map' of our field may be borrowed from the system of concepts in the academic discipline of Foreign Languages Teaching Methodology (FLTM). The collection of all TSs is subdivided 12 large classes. Each of the classes is further divided into subclasses, which are further subdivided into groups. This deductive, 'top-tobottom', classification corresponds to the conceptual structure of FLTM.
The whole classification chart of the TTS can be seen in the Appendix. Here, an example is provided of how one class is further subdivided. Class 7 Knowledge, Skills, Experience (KSE) has three subclasses: 7.1 Basic skills; 7.2 Background KSE; 7.3 KSE progress. Subclass 7.3 consists of two component parts (major groups) -7.3.1 KSE progress according to speech activity types and 7.3.2 KSE progress according to language aspects. The chart in the mentioned Appendix does not show smaller groups of TSs that can be found in the main body of the TTS. A page from the TTS can serve as a good illustration of how major groups are further analyzed and how the TSs are presented in the book. vocabulary a lot of "active" vocabulary can be taught acquire hundreds of thousands of phrasal lexical items focus on developing vocabulary awareness guarded vocabulary is somewhat of a catch-all phrase incidental learning from context during free reading is the major mode of vocabulary acquisition It should be noted that after the fourth numbered division level there are two more levels. They are important to make the material easily visible. One can also see that the TSs in some groups are crossreferenced. This is required by the nature of many TSs that, being multi-word units, lend themselves to various interpretations. Just compare the above given terminal group structure in 7.3.2.1 and the group grammar structures in 3.1.2.1 (class 3. Methodology, subclass Goals and objectives of education):

KSE Progress
grammar structures get enough practice with the structures manipulate the meaning of the grammar structure understand the meaning of the structures to which the students are introduced correlate the structure with its meaning teach to compose correct sentences and texts This fact should not be understood as a classification deficiency. It is rather a demonstration of the complex interrelations between concepts in the field and their relative non-discreteness.
It should be mentioned that the grouping of TSs into clusters of the fifth and especially the sixth level went beyond the possibilities of the TFLM conceptual structure. The latter deals with the most essential and general phenomena in the theory and practice of foreign languages teaching and learning. Alternatively, thousands of the collected TSs present situations of TFL in a great variety of minute detail. Such TSs are grouped on the basis of similarity of their meanings, which means that the grouping was done according to the 'bottom -up' principle and the procedure was inductive.

A View from the Corpus Linguistics Perspective
As mentioned before, the TTS is a collection of text fragments. In today's linguistic field the notion of language units collection is normally associated with that of dictionaries and linguistic corpora.
The work at the TTS started when corpora studies were not widely known, and still less practiced, in Russia. Nowadays, though, it sounds like a truism to say that Corpus Linguistics studies are popular and gaining attention and respect in this country as well as elsewhere in the world. The first modern linguistic corpus, the Brown University Standard Corpus of Present-day American English, was published in early 1960s (see Kucera & Francis, 1967). It is usually referred to as the Brown corpus, or simply, Brown. The appearance of the Brown Corpus was a momentous event both in linguistics per se and in many other fields associated with it. There appeared a number of studies of certain English-language phenomena, mostly grammatical in the beginning, and their behavior in speech (usage). As a linguistic tool, Brown was conducive to appearance of other corpora of English and other languages, thus favoring the avalanche-like development of Corpus Linguistics. The number and variety of new corpora proliferated spectacularly.
It should be pointed out that computers had been used in English-language lexicography projects even before Brown but mainly as assistant tools in handling the most tedious and time-consuming tasks in traditional dictionary making: preparations for the second edition of Oxford English Dictionary or for the first edition of Longman Dictionary of Contemporary English (1978) may be mentioned as two examples. In 1980 a multimillion corpus The Bank of English was initiated at Birmingham University by John Sinclair with the aim of collecting language-in-context evidence for a new dictionary (Collins COBUILD English Dictionary for Advanced Learners). Since that time, dictionary publishers have been making use of their own corpora, some of them comprising (nowadays) billions of words (referred to as tokens in Corpus Linguistics). Today, there are thousands of corpora in the world designed for various aims and various in size. See a comprehensive bibliography in (Xiao, 2008). See also the NOW corpus (2.8 billion words, automatically adds 4 million words each day) at http://corpus.byu.edu/now/ and the CORE corpus (the first carefully categorized corpus of web registers) at http://corpus.byu.edu/core/ -both released in May 2016. Whatever their peculiarities and differences, they are all digital, kept in computers and operated by (highly) sophisticated software.
Still, the IT aspect of contemporary corpora, though essential, is not their only important feature. Linguistic corpora are first and foremost collections of language products, i.e. fixed elements of speech/ texts (Saussurean langage). Sophisticated software may be instrumental in building and operating corpora, but the aims and results of corpora queries are finding out the langage evidence for languageusage generalizations, thus gaining insights into the psycho-and sociolinguistic, as well as structural, features of language. From this perspective, there is little, if any, principal difference between the present-day language-oriented corpora studies and the language studies in the previous, before-corpora times. The difference is nevertheless essential in that corpora query findings are by far more reliable and trustworthy than the previous statements about language made after looking at a number of examples on the basis of intuitive evaluation of the underlying principles and 'rules'. However bright and credible the previous grammars and dictionaries may appear, they seem less trustworthy than corpora studies and their statistically sound findings.
This is not to be understood as denying any validity to the "previous" language studies. Their significance has been, and still is, great in the development of our understanding of the intricacies of language. Moreover, it is the "previous" statements about language that have been treated as the hypotheses to be explored with the corpora tools. The pre-corpora work in many linguistic fields was very much like corpora studies lacking only in the quantitative dependability in intuiton-guided conclusions. In fact, many collections of language materials may be assessed as 'pre-corpora corpora'. Various collections of quotations, proverbs, sayings, popular maxims, etc. may be viewed as paper-based corpora. Dictionaries of various types have always been produced using databases (collections of 'slips', cards, etc.) that look even more like raw corpora. No wonder dictionary publishers were the first businesses to join universities in developing linguistic corpora. The foregoing considerations make it possible for us to look at the TTS from a corpus linguist perspective. The italicized words in the following paragraph are specific Corpus Linguistics terms that may produce interesting associations.
The TTS may be understood as a corpus of text fragments. It is specialized in that all source texts were about teaching or learning English as a foreign or second language. Unlike today's corpora, it is presented as a list structured after the hierarchy of pedagogical notions, namely, after the slightly modified notions of FLTM. It is not annotated, but each string (text fragment) has a definite position in the hierarchy (cf. tag). It may be thought of as balanced in that the source texts were extracted from a variety of specialized publications -books, articles, textbooks, internet sites, but it does not present professional oral speech. It is not run by any kind of corpora software if digitally published, but in that case it is quite possible to provide it with some hypertext links. In the book form (Keselman 2016), the TTS can be easily navigated with the help of detailed running heads on every page.

Conclusion
The described dictionary is a new lexicographic product. It is new because it presents prepared fragments of speech (langage) as dictionary units, which have never been done before. Its significance may be assessed from two different points of view.
In lexicography, it introduces an essentially new type of unit that may be treated in a dictionary. The completion of the TTS serves as additional proof of the independent status of such units. It has been shown that the TSs' lexicographic properties are best discernable in a limited sphere of language usage (a sublanguage). The ideographic presentation of the material in the TTS also has certain theoretical peculiarities. It has been shown that it is possible to implement both principles of classification, deductive and inductive, depending on the number and nature of the classified units. The TTS may also serve as a sample of unconventional approach to dictionary compiling.
From another standpoint, the TTS may have considerable practical applications. It was conceived as a useful reference book for teachers of English as a foreign or second language who are not native speakers of English and who may have the need of communicating in writing with colleagues or professional publications staff. It may help them make their professional writing and speaking more authentic and fluent.
Still another merit of the TTS is that it can supply abundant language material for classroom tutorials in English and in FLTM at pedagogical departments of universities.
Compared with modern corpora, the TTS is rather small: there are more than 114,000 tokens in it and about 18,000 listed units or text strings. As a printed book, in comparison with traditional dictionaries, it is fairly large, a volume of 403 pages. Whether small or large in size, the TTS has the merits of a pioneer project both for the theory of lexicography and for the teachers of EFL/ESL who would improve their professional language proficiency.