corpus – extra reading 2

Gilquin, G., Granger, S., & Paquot, M. (2007). Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes, 6(4), 319–335.

learner copora, especially how one collection was used to create the “Macmillan English dictionary for advanced learners” with its special section on academic writing.

Writers note that most research has been about corpora of native speaker English. The aim of the article is to demonstrate how a corpus of student writing can be helpful for dealing with writing problems.

Cites Flowerdew 2002 with four distinct research paradigms in EAP –

Swalesian genre analysis

contrastive rhetoric

ethnographic approaches

corpus-based analysis.

The first three of these focus on the context or situation of the communication. Corpus based analysis is distinctive because it allows much more detailed information about language structures. The first three all deal with things that are also problems for native speaker novices at academic writing: “pragmatic appropriacy” and “discourse patterns”.

Mentions different software for work with corpora. Including:


Sketch Engine

discover that “academic discourse is highly conventionalised”.

CIA – “contrastive interlanguage analysis” is useful in showing L2 differences in learners with different L1. Or comparisons between learner language and “natives” who are supposed not to be learners in the same sense. (there is some research on corpora of “novice native” writers but there is not so much overlap with the problems of non-natives.

Examples of the kind of things that can be discovered by looking at learner corpora:

Learners are familiar with key EAP verbs but not their lexico-grammatical patterning. Modal verbs, connectors are problem areas.

Interesting note about Coxhead – the list she produced took out the 2000 most commonly used words. These words can be used differently in Academic English and so could be usefully studied too.

Honourable mention for Milton 1998 wordpilot which was based on learner English in Hong Kong – it’s actually a CALL application rather than a coursebook. (traditional resources are more conservative).

Why is it hard to make materials for academic writing based on corpora?

Research shows that the discourses are varied according to the discipline. In the universities students tend to get EAP for General Academic purposes…not so specific.

Learners need to be trained to use corpora.

Problems can be L1 specific which also makes for less generalisable information.

There may not be a clear link between the results of corpus research and what actually ends up being taught. There are other factors: learners’ needs, teaching objectives and teachability.

MLD – monolingual learners’ dictionary. In the Macmillan project 12 rhetorical or organisational functions are identified.

There’s an argument that using a corpus of expert writing is not ideal to teach language learners. Maybe a corpus of writing by novice native speaker students would be more appropriate? But these could also provide not very good models!

The macmillan project used the “International Corpus of Learner English” with 6085 essays written by learners with 16 different mother tongues. The essays are untimed and written with the help of reference tools.

The project goes for a compromise about L1 influence: “ Only linguistic features shared by at least half of the learner populations under study are discussed in the writing sections.”

Examples of learner problems:

overuse of the phrases: for instance and for example.

Overuse of adverbs for certainty like really, of course, absolutely

underuse of hedging adverbs – apparently, possibly, presumably

tendency to put however at the start of a sentence, and less likely to put it in the middle.

Using THOUGH in sentence final position is typical of NS speech but not so likely in academic writing.

Wrong use of ON THE CONTRARY (which actually means the opposite is true) to mean simply “by contrast” or “on the other hand”.

Invented phrase like “as a conclusion”, where NS writes IN conclusion.

This research produced the “get it right” boxes in the dictionary.