Tom Cobb,
Université du
Québec à Montréal
Chris Greaves
Polytechnic
University of Hong Kong
Marlise Horst
Concordia
University
Introduction
The field of second language (L2) reading has always seemed a
good opportunity to demonstrate the value of computers in
language learning, especially since the Internet has shown its
capacity to multiply the amount and variety of texts and contexts
available to language learners and teachers. However, the
usefulness of computers to the development of reading ability
remains largely an argument in principle.
In pre-Internet days, computer programs for the development of
reading ability followed mainly a skill-development model. The
reading skills targeted included tracking pronoun reference,
finding main ideas in paragraphs, inferring word meanings from
context, and so on--the type of skills that could be coded in
multiple-choice questions following a brief text. However, there
was little evidence that such skills transferred positively to
the reading of full length texts, as indeed there had not been
when such exercises were done on paper. There was even occasional
evidence that they transferred negatively (Oppenheimer, 1997).
The reading research, whether L1 or L2, was rarely brought to
bear on the development of such courseware, as may not be
surprising given its often inconclusive nature. However, one
reasonably clear finding in the research literature is that the
development of reading ability depends on the learner logging
some volume, whether pages or screens, which was not really
possible under the isolated skills approach where there was
"too little to read" (Reinking & Bridwell-Bowles,
1991).
But things have changed, at least in some ways, with the arrival
of the Internet. "Too little to read" is no longer the
problem it was. A standard web-based reading activity nowadays is
for learners to search from site to site for some specific piece
of information, such as, 'How old was Napoleon when he was exiled
to Elba?' It is assumed this search will pass through numerous
reading opportunities en route to the final disclosure. However,
it is not clear that the type of reading needed for such
activities, unless carefully provisioned, involves more than
scanning at best, string-matching at worst. Some task reduction
may be an inevitable part of reading on the web, given that the
Internet is a vast repository of authentic text, most of it
beyond the unaided reach of most second language learners. In
other words, 'too little to read' has been replaced by 'too much
to read,' but with a similar reduction in the quality of the
text-learner interaction that can be expected.
In this chapter we aim to provide an alternative to these two
approaches to computer-assisted reading. The model we propose is
'resource-assisted reading of extensive authentic documents' or
R-READ. The idea is that learners will be able to Read Extended
and Authentic Documents with comprehension if they
are aided by a carefully chosen suite of helper Resources
of a kind that are becoming increasingly available on the
Internet or capable of Internet delivery. The hope is that R-READ
will allow learners to log the volume they need to clear the
threshold to independent reading in their L2. This account of
R-READ is grounded in research findings, and it involves the
development and testing of a preliminary hands-on
implementation--but it begins in the personal experience of one
of the authors.
Almost 20 years ago, a monolingual Manitoba Anglophone had to
pass a French translation test to finish a Master's degree. With
weak high school French, Tom wondered where a quick reading
ability in the other national language was going to come from. A
bilingual friend proposed that anyone can cross the first and
biggest hurdle into a second language by somehow managing to read
a complete book in it, any decent book being a microcosm of the
language as a whole, by looking up every word, parsing the
syntax, and so on--or in this case by using her as an expert
reading partner (decoder, explainer, pronouncer, hypothesis
confirmer and denier). Together with this resource person Tom
embarded on several readings of Voltaire's Candide,
stopping for discussion and note-taking a good deal in the
beginning but noticeably less within just a few hours. By the end
of the first reading all the words seemed familiar, and by the
end of five days and three readings he was ready for the
translation test. On a subsequent visit to Paris, Tom reported
finding that much of the ambient signage and advertising could be
worked out on the basis of what had been learned. He had
apparently crossed the threshold to independent learning.
However, not everyone is fortunate enough to know a
resource person willing and able at the moment they have the time
and motivation to make their move for a second language. But if
having such a resource peson is so effective for the learner, is
there some way it would be possible to recreate a resource person
in a tutorial computer program? This idea was a fantasy
until recently, but now the Internet has the bandwidth,
distribution, and quality of resources including streaming audio
to make such a program possible. In this chapter we describe such
a Web-based reading expert, outline the theoretical basis for
this approach in the reading literature, and then present a case
study of a learner using the program.
The reading experience provided for learners of French is a
parallel version and further development of one that was
developed previously for students learning English. For the
English site, the Jack London novel Call of the Wildwas
chosen, for the following reasons: it is a novel rich in
vocabulary, with appeal to children as well as adults, has
extensive but not excessive length, is out of copyright, is
available on the Internet as a text file for download, and has
several good cassette read-aloud renditions. The operation of the
site is quite simple. Readers can read and/or listen to Call
of the Wild in the sequence they choose. When reading, they
can pause the sound recording and click on any word in the text
that they are curious about, and a concordance is generated
showing all the rest of the word's occurrences throughout the
novel. From the concordance frame, readers can then click up a
dictionary definition provided by Princeton University's Wordnet
site. If readers wish to make a note about any word, they can
type or cut and paste any of this information into a personal
database which they can later retrieve and print or save to their
hard disk. The site can be visited at http://132.208.224.131/callwild/.
An alternative version of this implementation may also be seen at
http://vlc.polyu.edu.hk,
which also features adaptations of Alice in Wonderland and other
works implemented using the same model.
The French parallel site is based on de Maupassant's Boule de
Suif, and the interactions envisaged are identical. The
construction of the site was straightforward in that many of the
technical challenges had already been dealt with in constructing
the English site. However, the French site presented other
challenges. First, there are far fewer French novels in text file
format ready for download on the Internet, and fewer still with a
corresponding sound recording whether on-line or off (and none at
all that could found in the case of French-Canadian authors).
Second, there are far fewer on-line dictionaries to choose from
than there are in English, and the French ones that exist tend to
be both unsuitable for learners and difficult to access by
command line from an outside site. The choice of Boule de Suif
was somewhat of a compromise, in that the text while extensive
but not excessively long (13,418 words) and very rich in
vocabulary is neither particularly modern nor suitable for
children (dealing with sexual predation and middle class
hypocrisy). However, the choice of a de Maupassant story solved
one major problem observed with Call of the Wild. This was that
readers often clicked for further contexts for a word they were
trying to work out the meaning of, only to find that the word
occurred only once in the entire text and hence there were no
other contexts. Indeed, research by Horst (2000) found that for
texts of intermediate size (5,000 to 15,000 words), between
5 and 10 per cent of the lexis is one-off. Kucera (1982)
determined that it was precisely these least used words in a text
that often carry most meaning. The solution to this problem
was to generate the concordances from the entire de Maupassant
oeuvre of more than 1 million words, which had fortuitously been
made available by Thierry Selva, a Belgian colleague. The
concordance engine runs slightly more slowly with such a volume
of text to search through, but to date there has been no case
of a word appearing only once in the entire corpus. Figure 1
gives screen shots of the Boule de Suif website, displaying the
main interactions it proposes.

Figure 1 The main interactions proposed by R-Read
The text is selected from the black sidebar on the left, with a
dramatic recitation of the text if desired (either text or sound
can be selected without the other, so for example the text could
be heard before it was read). Any interesting word (such as lambeaux
in the figure) can be clicked upon to produce a concordance in
the lower window, which as mentioned is drawn from the entire
corpus of de Maupassant texts. Also generated with the
concordance is a link in the upper right corner to a bilingual
learners' dicitionary, which will take the learner either to the
exact target word or else to a list of words in the alphabetical
vicinity of the target word. These interactions have produced the
computer screen as it appears in Figure 1.
Three features of the proposed interaction may not be readily
noticeable. First, a classic problem with click-on dictionaries
is that the word clicked on is a plural or other variant which,
when sent to the dictionary's search engine, returns a "Not
Found." In a paper dictionary, the search for chats
brings the reader into the vicinity of chat, and the
problem does not occur. The expensive solution to this problem
on-line is to lemmatize the search process, so that all the
morphologies for each word family are grouped together. The cheap
solution and the one adopted by Coffey, the creator of the
dictionary accessed here, is to present instead of "Not
Found" all the words in the vicinity of the search word (char,
chat, chaud) on the assumption that learners will recognize
one of them as the base form of the word they are looking for.
This is basically a simulation of what happens when looking up a
word in a paper dictionary, where you can see the rest of the
page. In Figure 1, the learner has clicked on lambeaux and
been sent to a list including lambeau, which when clicked
produces the requested information. Second, the keywords in the
middle of the concordance lines are hypertext links, which when
clicked expand the amount of context to roughly the size of a
small paragraph. Third, the site links to a historical
backgrounder on the Franco-Prussian War of 1870, the setting of
the events in the story.
When learners have found a word and related information that they
wish to keep, they can record it in a database that they access
by clicking on "Lexique Utilisateur" in the top left
corner. This database is the e-quivalent of those little
notebooks that language learners love so much to write their new
words in, except that this is neater, contains more and richer
information, and occupies much less of their time (since it can
be assembled from the text, the concordances, or the dictionary
on a copy-paste basis). This database can be viewed for all
learners' entries or for just those of the current user, and can
be downloaded for assembly into a personal glossary or lexicon
which can be further assembled, edited, and sorted using Excel on
the learner's machine. Figure 2 shows an example of the
database. In this case "Tom" has requested just his own
entries, but with "Tout Voir" he can see the entries of
all users of the site.

Figure 2 Electronic vocabulary notebook
So here is a website where students who want to read an extended
French text can do so, and in addition listen along, see all
instances of interesting words or phrases ever composed by the
same author, look up words in a learner's dictionary, and keep a
record of interesting findings. But are, one might ask, all of
these interesting options mainly amusements, or is there any sign
that learning will be aided by them in any interesting way?
3 Research base
It often seems that the development of computerized language
learning materials is dominated by either commercial software
companies or CALL hobbyists, who both have the complexities of
ever changing technologies to keep up with which may limit their
interest in the additional complexities of language acquisition
(LA) research. Still, over the years CALL and LA research have
come together on occasion, usually where acquisition scholars
have undertaken studies with media implications or opined on the
media implications of their other findings. These convergences
will be outlined briefly categorized according to the main
interactions offered on the Boule de Suif site. The main research
of interest concerns lexical acquisition and handling, which is
uncontroversially the beginning reader's major hurdle.
3.1 Listen and Read
Stanovich (1986) is one of the foremost investigators of first
language (L1) reading problems. In his well known paper on
Matthew Effects (1986) he ventured some remarks on the type of
CALL programs he thought might aid learners having
difficulty learning to read. He cited some rather old software
(Draper & Moeller, 1971, which is almost certainly not in
existence any longer) which had produced very strong learning
effects simply by giving learners the opportunity to click on
words in a reading text and hear them spoken. The idea was that
in the L1 situation, many words are not recognized in writing
that are in fact known in speech. This would normally be the case
less often in L2, where new words other than very high frequency
items are more likely to be met in text before speech. Still, L2
research supports a strong role for reading and listening along.
Lightbown (1992 ) looked into the acquisition of English of young
Francophone New Brunswickers who had not been provided with
classroom instruction but instead had read and listened to
cassettes of self selected materials at their own pace. The
surprise finding was that learners (at least in the early stages
of L2 acquisition) seem to gain as much from reading and
listening as they do from being in a classroom. The irony of this
finding is that it comes at a time when many schools are
abandoning their facilities for listening to cassettes in favour
of computer labs. Fortunately, new Internet technologies like
streaming audio can make a listening lab out of any computer lab,
and at the same time deliver additional advantages like on-line
dictionaries.
3.2 Concordance
A major limitation on the instructional use of authentic texts is
that learners apparently are much less able to infer the meanings
of new words from context (Laufer & Sim, 1985; Haynes, 1983;
Huckin, Haynes et Coady, 1991) than was once believed (e.g., by
Smith, 1971; Goodman, 1976; Krashen, 1989, and their many
followers). However, research by Cobb (1997; 1999 ) has shown
that contextual inference can be substantially supported by
multiplying the number of contexts available for a given word
with the aid of a computer. The specific program which does this
is called a concordance, which assembles all the contexts
available for a given word or phrase throughout a text or corpus.
The support for learning is thus: When several contexts are
available, many of them will be opaque, but one or more of them
is likely to have the mix of linguistic and semantic support that
provides the learning conditions needed by a particular learner
to build an initial stable representation for a new word. If
learners can be persuaded to examine several contexts, they will
make better inferences than if they merely examined one. In other
words, we propose concordances as a means to computer-aided
contextual inference. Another benefit to concordances is that
beginners need to meet words in some frequency if they are to
learn them (Zahar, Cobb, & Spada, in preparation), more
frequently than is actually possible for to meet them without
some artificial means of boosting the number of encounters.
3.3 Database
Another of Cobb's findings finding of (1997) is that the enormous
time that learners are willing to spend writing down lexical
entries while reading can be made more efficient through the
judicious use of the computer, with look-up and write-down time
redeployed to reading more and hence meeting more new words and
old words more often. Further, it was found that this
computerization could facilitate collaborative use of lexical
look-ups, and that the prospect of having their entries seen by
others encouraged learners to spend more time sifting through
concordances for good examples of words.
3.4 Dictionary
While much research casts doubt on the value of dictionary work
for beginning readers in an L2, there is no point in trying to
prevent the use of dictionaries. One can merely try to encourage
the use of decent dictionaries that learners can comprehend and
yet which do not encourage them in the belief that all L1-L2
mappings are one to one (Bland, Noblitt, Armstrong & Gray's,
1990, "naïve lexical hypothesis"). We should
also try to to ensure that concordance work precedes dictionary
work, following the constructivist principle that learning is
more about building generalized knowledge than about receiving it
(Cobb, 1999). This sequence has recently received support from a
study by Fraser (1999), who found that contextual inference
combined with dictionary look-up supported more lexical
acquisition than either alone, but also that the sequence of
these strategies was important: attempted inference first,
dictionary confirmation second, is the more effective sequence.
To her finding we would add computer-aided contextual inference
first, dictionary second.
3.5 Click-on Interface
An important study of resource based or "instructionally
enhanced" reading has recently been published by Hulstijn,
Hollander, et Greidanus (1996). One of the findings of this study
is that many learners who are aided by lexical lookups while
reading nevertheless do not take the time do so, presumably out
of some form of laziness, but that they will do so if the lookup
is made sufficiently easy. The click-on resources in the Boule de
Suif website could hardly be more easy, and yet type-in resources
are also available for more sophisticated searches or for testing
hypotheses about French not directly stimulated by the immediate
text.
3.6 Approach
The resource-based approach exemplified in the Boule de Suif site
is one of three main approaches to treating the lexical demands
of L2 reading. One approach is that outlined by Krashen (1989),
Nagy (1997) followers to the effect that reading itself will
teach learners all the words they need to know to be able to
read. This approach has run into problems, as already mentioned.
The approach at the other extreme is the direct pre-teaching of
vocabulary learners will need in order to read particular types
of texts successfully, for example by working their way through
the wordlists that Nation and colleagues (Nation, 1990; Nation
& Waring, 1997; Sutarsyah, Nation & Kennedy, 1994) have
identified as comprising the vast majority of lexis in average
texts. Between these two approaches and not really excluding
either is vocabulary enhanced reading (Hulstijn, Holander, &
Greidanus, 1996), where learners are left to make their own way
through texts but it is assumed they will need support resources
to do so successfully. R-READ, resource assisted reading of
extended authentic documents, is intended to be a substantial
test of this middle approach.
4.1 Background and experimental
method
To investigate the usefulness of concordancing with easy access
to full concordances, dictionary definitions, and easy data
storage, we asked a learner of French to use the experimental
materials. The research question of this pilot case study was
this:
How do the vocabulary learning results of reading with the online
tools described above compare to the results of reading without
these tools?
The baseline for comparison to "normal" unassisted
reading comes from a series of case studies by Horst (2000). In
one of these, she investigated the amounts of new vocabulary
learners acquired through reading texts resembling Boule de Suif
in that they were nineteenth-century literary classics. R, an
adult intermediate learner of German, agreed not to consult a
dictionary while he read a German novella. A few days later he
rated his knowledge of 300 words that occurred in the story only
once. When he rated his knowledge of these target words again a
few days later, the difference was modest. After reading the
9500-word literary text (which took about three hours), R rated
only five more words "definitely known" than he had on
the pretest. Thus we can conclude that he had learned about two
words per hour of unassisted reading.
Our test of the computerized lexical resources follows the same
design that Horst used in this and a series of similar
experiments. This time the participant was J, an adult
intermediate learner of French. Six weeks before reading the
R-Read version of Boule de Suif, J was pre-tested on 400
words that occurred only once in the text, assigning each word a
knowledge rating according to the following scheme (Horst &
Meara, 1999):
0 = I don't know
what this word means
1 = I am not sure
what this word means
2 = I think I know
what this word means
3 = I definitely
know what this word means
She assigned a 0 (don't know) rating to 180 words, so
there was clearly ample opportunity for new learning through use
of the experimental materials.
After a brief training session, J began reading Boule de Suif
following the prescribed procedure of clicking on unknown words
and looking at contexts provided by the concordance. In most
cases, she requested dictionary definitions as well. Progress was
marginally slower than normal or unassisted reading; it took her
about six hours to complete the entire14,500-word text (compared
to Rs three hours for 9,500 words).
4.2 Results
Three days after completing Boule de Suif, J rated her knowledge
of the 400 target words. The number of words rated 3 (definitely
known) amounted to 137, a 59-word increase over her pretest total
of 78 words definitely known. Since J had spent about six hours
using the program, we can conclude that about ten new words per
hour had entered the definitely known category,
considerably more Rs two words per hour.
A week later, J read Boule de Suif for a second time, this time
using the sound option (the pace of the oral narrative proved to
be too fast the first time around). Again, she rated her
knowledge of the target words a few days after completing the
story. Then, seven days later, there followed yet another round
of reading, listening and testing. J spent about four hour using
the materials in each of these later sessions; thus the total
number of hours amounted to 14 (6 + 4 + 4). The numbers of words
assigned to the various knowledge categories after each reading
are shown below in Table 1.
Table 1
Word knowledge ratings before reading and
after each of three readings
| |
Pretest |
Posttest 1 |
Posttest 2 |
Posttest 3 |
| 0 (not known) | 180 |
74 |
49 |
28 |
| 1 or 2 (unsure) | 142 |
189 |
165 |
170 |
| 3 (known) | 78 |
137 |
186 |
202 |
Table 1 shows that by the end of the
experiment, J "definitely knew" 202 words, up 124 from
her starting point of 78. The table also confirms growth in other
ways: the figures show that the number of words rated 0 (not
known) decreased substantially over the course of the experiment,
and that many unknown words became partially known.
The unassisted reader of German, R, had also read his novella
repeatedly, and this allows for convenient comparison to the
results of J's three sessions with the experimental materials.
The results for both learners after three rounds of reading are
shown in Table Y. Their pretest starting points are very similar
both participants rated 45% of the targets unknown at the
outset, and were also similar in their ratings of known words at
the outset (27% of targets for J, 20% for R). After three
reading sessions, the advantage for the assisted process seems
clear. The number of words rated definitely known has
remained constant in the case of R, but has increased
dramatically in the case of J. Although R eventually
learned dozens of new words over the course of several additional
readings, his initial progress was slow, and even after ten
exposures his responses were less accurate than J's. At the end
of the repeated readings (four repetitions in the case of J, ten
in the case of R), both participants were asked to provide
translation equivalents for items they had rated "definitely
known." About 94% of J's responses were judged to be
accurate, while R identified correct translation equivalents in
only 77% of cases. The comparative data is summarized in Table 2.
Percentage of targets in each category at
outset and after three readings, assisted and unassisted
| |
Results for R (unassisted) |
Results for J (assisted) |
||
|
Pretest |
3rd posttest |
Pretest |
3rd posttest |
| 0 (not known) | 45 |
38 |
45 |
7 |
| 1 or 2 (unsure) | 28 |
33 |
36 |
43 |
| 3 (known) | 27 |
27 |
20 |
51 |
Conclusion
The comparison between Rs and Js rates of vocabulary
acquisition seems to confirm the usefulness of the R-READ
approach, and to indicate a middle way in vocabulary growth
through reading. Resource-based reading seems able to render
irrelevant the choice between incidental acquisition and direct
vocabulary instruction. The pace and accuracy of Js
acquisition is reminiscent of the best results of direct
instruction (Nation, 1982), but with the enjoyability and
possibly deeper learning that comes with meeting words in rich
contexts.
Both readers started their readings with 45 per cent of target
words unknown in their respective texts. Working from unadorned
context, R managed to reduce his number of unknown words by only
7 per cent, while J reduced hers by 38 per cent. At the
definitely known end, R did not manage to increase
his holdings in this category at all with three readings, while J
increased hers by 250 per cent. Further, on the translation
post-test, a greater number of Js answers were correct
after three readings than Rs were after 10 readings.
None of this is surprising, in itself, except for that fact that
time invested to achieve the greater learning was only marginally
greater, thanks to the effective use of on-line tools.
Vocabulary acquisition from reading has always been a major
problem in the development of literacy in a second language.
Contexts are rich but unreliable, definitions are precise but
incomprehensible, and the number of words to be acquired is
daunting. Resource-assisted reading seems a promising approach to
making vocabulary acquisition through reading possible and even
efficient. And given the expected increase in the use of the
World Wide Web in coming years, it may be even an approach that
can reach many of the people who want and need it
providing an effective linguistic consultant for
those not lucky enough to know one.
Bland,
S., Noblitt, J., Armstrong, S, & Gray, G. (1990). The naive
lexical hypothesis: Evidence from computer assisted learning. Modern
Language Journal 74, 440-450.