EAP Knowledgebase

corpus reading 3

Krishnamurthy, R., & Kosem, I. (2007). Issues in creating a corpus for EAP pedagogy and research. Journal of English for Academic Purposes, 6(4), 356–373.

This article traces the increasing use of corpora in EAP classrooms and finishes by describing what the most useful kind of corpus for this kind of teaching would be like. Both the interface of the tool for analysis and the structure of the corpus itself are considered.

The motivations for using corpus-based approaches:

  1. DDL
  2. discovery rather than repetition of standard examples.
  3. learner autonomy

Academic discourse has always been included in more general corpora.

Issues in making an academic corpus.

Is it just general. Or divided in some way by subjects. The question is how to make the classifications of discipline.

Then classifications of genre are also arguable. The written ones may come from university assignment tasks or ielts writing questions. A classification of academic speech events is also provided (quoting the MICASE Manual).

Problems of processing the texts. Some corpora place restrictions on the types that can be submitted. Some strip out parts – references and quotations. But this may make the text less authentic.

Classification of level. Some corpora take only staff writing and PhD theses, others only from the fourth year students.

The collection of lower-grade texts would be useful for teachers looking for problem areas to address.

Mentions the sketch engine software as “more research-oriented than pedagogic”.

In the early days all corpus software was called “concordancer” and concordancers are well suited to the classroom because that’s a simple function. The current tools require a complicated query language.

Gives a favourable mention to the BYU interface “Mark Davies’s View interface”.

The writers would like one big corpus covering all the categories. Lots of different disciplines and lots of different levels.

EAP Knowledgebase

corpus – extra reading 2

Gilquin, G., Granger, S., & Paquot, M. (2007). Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes, 6(4), 319–335.

learner copora, especially how one collection was used to create the “Macmillan English dictionary for advanced learners” with its special section on academic writing.

Writers note that most research has been about corpora of native speaker English. The aim of the article is to demonstrate how a corpus of student writing can be helpful for dealing with writing problems.

Cites Flowerdew 2002 with four distinct research paradigms in EAP –

Swalesian genre analysis

contrastive rhetoric

ethnographic approaches

corpus-based analysis.

The first three of these focus on the context or situation of the communication. Corpus based analysis is distinctive because it allows much more detailed information about language structures. The first three all deal with things that are also problems for native speaker novices at academic writing: “pragmatic appropriacy” and “discourse patterns”.

Mentions different software for work with corpora. Including:


Sketch Engine

discover that “academic discourse is highly conventionalised”.

CIA – “contrastive interlanguage analysis” is useful in showing L2 differences in learners with different L1. Or comparisons between learner language and “natives” who are supposed not to be learners in the same sense. (there is some research on corpora of “novice native” writers but there is not so much overlap with the problems of non-natives.

Examples of the kind of things that can be discovered by looking at learner corpora:

Learners are familiar with key EAP verbs but not their lexico-grammatical patterning. Modal verbs, connectors are problem areas.

Interesting note about Coxhead – the list she produced took out the 2000 most commonly used words. These words can be used differently in Academic English and so could be usefully studied too.

Honourable mention for Milton 1998 wordpilot which was based on learner English in Hong Kong – it’s actually a CALL application rather than a coursebook. (traditional resources are more conservative).

Why is it hard to make materials for academic writing based on corpora?

Research shows that the discourses are varied according to the discipline. In the universities students tend to get EAP for General Academic purposes…not so specific.

Learners need to be trained to use corpora.

Problems can be L1 specific which also makes for less generalisable information.

There may not be a clear link between the results of corpus research and what actually ends up being taught. There are other factors: learners’ needs, teaching objectives and teachability.

MLD – monolingual learners’ dictionary. In the Macmillan project 12 rhetorical or organisational functions are identified.

There’s an argument that using a corpus of expert writing is not ideal to teach language learners. Maybe a corpus of writing by novice native speaker students would be more appropriate? But these could also provide not very good models!

The macmillan project used the “International Corpus of Learner English” with 6085 essays written by learners with 16 different mother tongues. The essays are untimed and written with the help of reference tools.

The project goes for a compromise about L1 influence: “ Only linguistic features shared by at least half of the learner populations under study are discussed in the writing sections.”

Examples of learner problems:

overuse of the phrases: for instance and for example.

Overuse of adverbs for certainty like really, of course, absolutely

underuse of hedging adverbs – apparently, possibly, presumably

tendency to put however at the start of a sentence, and less likely to put it in the middle.

Using THOUGH in sentence final position is typical of NS speech but not so likely in academic writing.

Wrong use of ON THE CONTRARY (which actually means the opposite is true) to mean simply “by contrast” or “on the other hand”.

Invented phrase like “as a conclusion”, where NS writes IN conclusion.

This research produced the “get it right” boxes in the dictionary.

EAP Knowledgebase

corpora – extra reading 01

This week I started an online course with Sheffield university about using Corpus tools in EAP. Here are some notes on the extra reading from the first week:

Vyatkina, N., & Boulton, A. (2017). Corpora in Language Teaching and Learning. Language Learning and Technology, 21 (3), 1–8. University of Hawaii,.

This article uses the abbreviation DDL – “data driven learning” to categorise the field. The term covers various strands of research, but they can be grouped as follows:
A– “theoretical underpinnings” – what the corpus data can tell us about the nature of language.

B– “descriptive”. Not really explained in the article, but I imagine articles that mainly describe particular practices used in the classroom or for materials writing and that speculate about future developments of the field.

C– Empirical evaluations – including learner attitudes, measuring the value of DDL for learners,

Empirical evaluations of the results from DDL as a teaching approach only start in around 2000. They note that there is relatively little DDL work done in the USA.

It uses the terms “emic” (from the subject’s perspective) to describe research done by asking students to fill in questionnaires…and “etic” (from the researcher’s or an external perspective) to describe research done with pre- or post intervention tests or other kinds of experimental control.

It notes two trends over time:

From lexico-grammatical studies towards greater interest in the characteristics of discourse.

From corpora as a learning aid towards corpora as a reference resource.