Contextual Vocabulary Acquisition: description

Since July 2001, funded by an NSF ROLE pilot project, we have been investigating ...

contextual vocabulary acquisition (CVA):

the active, deliberate acquisition of word meanings from text by reasoning from contextual cues, background knowledge, and hypotheses developed from prior encounters with the word, but without external sources of help such as dictionaries or people.

Our ultimate goal is not merely to improve vocabulary acquisition, but also to increase students' reading comprehension of science, technology, engineering, and mathematics (STEM) texts, thereby leading to increased learning, by using a "miniature" (but real) example of the scientific method, viz., CVA.

The computational and educational strands of our research are fully integrated and jointly serve this ultimate goal. We are attempting to:

increase our understanding (based on observations of "think-aloud" protocols) of how good readers use CVA to hypothesize a sense for an unknown word encountered in written context,
use these observations to extend our computational theory of CVA,
develop further a computer program that implements and tests this theory, and
create and evaluate a curriculum (based on the computational theory) to improve students' abilities to use CVA.

People know the meanings of more words than they are explicitly taught, so they must have learned most of them as a by-product of reading or listening. Some of this is the result of active processes of hypothesizing the meaning of unknown words from context.

How do readers do this? Most published strategies are quite vague; one simply suggests to "look" and "guess". This vagueness stems from a lack of relevant research about how context operates. There is no generally accepted cognitive theory of CVA, nor is there an educational curriculum or set of strategies for teaching it. If we knew more about how context operates, had a better theory of CVA, and knew how to teach it, we could more effectively help students identify context cues and know better how to use them.

AI studies of CVA (including our own) have necessarily gone into much more detail on what underlies the unhelpful advice to "guess", since natural-language-processing systems must operate on unconstrained input text independently of humans and can't assume a "fixed complete lexicon". But they have largely been designed to improve practical natural-language-processing (NLP) systems. Few, if any, have been applied in an educational setting, and virtually all have been ignored in the reading- and vocabulary-education literature.

AI algorithms for CVA can fill in the details that can turn "guessing" into "computing";
these can then be taught to students.

Thus, the importance of our project stems from the twin needs

for NLP systems that operate independently of human assistance and
to improve both the teaching of reading and students' reading ability (especially in STEM).

Hence, our project combines basic and applied research. Its theoretical significance comes from the development of an NLP system that does CVA. Its educational significance lies in whether the knowledge gained by developing this system can be applied to teaching CVA strategies to students so that they are able to use them successfully when they encounter hard words in their regular reading of STEM texts. Our project is also distinctive in its proposed use of mutual feedback between the development of the computational theory and the educational curriculum, making this a true cognitive-science project.

Ongoing projects:

We are continuing the two-way flow of research results between the education and the AI teams, the education team providing data for improving the definition algorithms, the AI team providing the algorithms to be converted into a curriculum.

The AI team is:

developing noun, verb, and adjective algorithms using insights from the think-aloud protocols produced by the education team;
developing an explanation facility for the system;
developing a natural-language input and output system;
using this NL I/O system to investigate the use of "internal" context (morphology) for developing definitions;
investigating the possibility of using OpenCYC to serve as one source of general background information; and
improving the definition-revision system.

The education team is building, implementing, and evaluating a curriculum designed to help secondary-school and college students become better able to use CVA processes to increase knowledge of word meanings, thereby leading to increased content learning and reading comprehension in STEM.

The curriculum is based on our algorithms and uses teacher-modeled protocols that are practiced by students with the teacher, in small groups, and alone. We are developing student materials and a teacher's guide that emphasize how our method of CVA is an example of the scientific method "in the small", and we are studying the curriculum's effectiveness.