The Project
The five-year project is spearheaded by Dr Nan Bernstein Ratner, an applied developmental psycholinguist whose research primarily centres on child language and fluency development and disorder. ‘We have two major concerns’, says Bernstein Ratner. ‘One question is, are any of the measures that we’ve adopted for child language assessment good? But we’re also trying to answer the question whether or not children who speak other dialects, such as African American English, could be misdiagnosed using any of these measures.’ Currently, no large scale normative scores are available for any of the most commonly used language development measures. Bigger, more diverse data are needed in order to evaluate the effectiveness of the different methods.
Towards Inclusive Assessment
The two most frequently used assessment measures are mean length of utterance (MLU) and type-token ratio (TTR) to measure vocabulary diversity. Other measures, such as Developmental Sentence Scoring (DSS) and the Index of Productive Syntax (IPSyn) are very detailed, but tedious to do by hand; thus, few clinicians use them. CLASP’s findings so far suggest that, in isolation, some measures have only limited value, but that a combination is effective. Bernstein Ratner says: ‘MLU turns out to be very good at distinguishing between children with problems and children without problems. It’s a good filter.’ In contrast, CLASP has added to a growing literature suggesting that TTR should not be used in assessment (see article). It does not change as children mature, and children with known disorders actually outscore typical children on TTR. Other measures, such as DSS and IPSyn, are less good filters, but better for helping to plan therapy, because they show which aspects of the language the child does not appear to use, or does not appear to use properly. This means that a clinician can quickly see what the child needs help with.
Measures most commonly used to assess children’s language development are likely to be biased against children who do not speak mainstream American English. CLASP aims to work against such biases, but progress has been slowed as there is very little data to address this question. To resolve this, the CLASP team has acquired additional data and is currently re-transcribing them, which will help to create a robust database with which to explore the question of bias when assessing preschool age children in the US.
During additional data acquisition, CLASP has found that in order to measure children’s IPSyn scores, a much smaller sample than previously thought can be used to arrive at a reliable result, which may significantly increase clinical uptake of the instrument. Bernstein Ratner says: ‘Clinicians hesitated to use this particular routine, especially if they don’t use computers to assess language. But now we know that half the number of words is enough. Whether or not you use a computer, getting and transcribing 50 utterances rather than 100 will take literally half as long. We think that has great clinical impact.’
Distinguishing Difference from Disorder
Improving existing data includes back-annotating a number of different datasets, for instance by adding information as to whether the child’s language includes actual errors or perhaps a language variety, although that can be difficult for samples of very young children.
In addition, the team is working closely with Barbara Zurer Pearson at the University of Massachusetts Amherst, who is one of the authors of a dialect sensitive evaluation tool called the DELV, the Diagnostic Evaluation of Language Variation. Pearson has been helping the CLASP team develop annotations that are systematic, and can be computed. In this way, they can be sure to be marking all children's speech coming in for a particular set of features in the future.
Data that is annotated in this way could be used to provide a ‘warning shot’ to clinicians when they run diagnostic analyses, which may be particularly valuable if clinicians are not aware that a child's pattern might reflect a language difference and not a disorder.
Bernstein Ratner says: ‘We're trying to come up with a feature that will look at a child's transcript. For certain things, for instance, absence of the third person singular marker in the present tense, lack of the auxiliary verb in progressive phrases, like “He running” for “He's running”, or the use of some dialect-specific forms, we're trying to develop what we're calling a dialect detector. Something that will pop up if a clinician doesn't annotate, because he or she doesn't know to annotate this feature of a dialect. If they just type what they hear, we're hoping to come up with an alert that essentially says: Have you considered that the child in front of you may not speak your version of American English, but that it's not in itself problematic?’
The dialect detector mechanism is envisaged to become part of the other big utility programs, such as CLAN’s KidEval and FluCalc routines, that can be used by clinicians and researchers, without any need for coding. Bernstein Ratner says: ‘Even for technophobic people, this is accessible. Our goal is to have a system where people just type what they hear, and then push a couple of buttons, and then the work is done for them.’
Thus far, citations to the CHILDES database tend to refer to basic research articles in journals. Bernstein Ratner sees CLASP as a bridge between basic research and real-world application: the project uses data initially collected for basic research, and turns them into a clinical asset.
Learning from Dis-fluencies
In addition to her own research and teaching, Bernstein Ratner also runs FluencyBank, a large, shared TalkBank database of spoken language resources for clinicians and researchers whose work focuses on stuttering and general disfluency in speech. FluencyBank’s annotations follow a uniform standard (until recently, each lab or practice made up their own), all data is open access, and easy-to-use tools are available to analyse and compare transcripts. The site is frequently used by researchers and clinicians, and the clinical impact is well documented.
As the data are easily available, shareable and follow a uniform standard, users know how to deal with them. Moreover, they are not only being used to train software systems to recognise people who stutter, but also as a way to identify speech problems associated with dementia, for example: ‘If you can teach a computer to look past something, you can also teach it to pay attention to something, because they're really flip sides of the same coin. We see a number of emerging research publications that are using TalkBank materials, such as FluencyBank, Aphasia Bank and CHILDES, to actually measure things – for instance, trying to catch early signs of dementia by looking at the available data. It's really exciting, […] it's a great use of the data.’
Views on CLARIN and Open Science
Dr Nan Bernstein Ratner, Professor, Department of Hearing and Speech Sciences, University of Maryland, College Park