General Recognition Theory

I wrote in my first and second posts that I plan on writing 'science' posts describing research that is relevant to my own. Well, here's my first post doing just that, although I would like to note quickly that this post will not be as technical as some that follow. This post will be about a problem that I've set up for myself and my initial thoughts on how I plan on going about solving it.

First, a bit of background. I am working on a double major PhD in Linguistics and Cognitive Science at Indiana University. I also plan on getting the certificate in modeling in Cognitive Science (pdf link explaining the requirements, if you're interested). The combination of a double major and the certificate has allowed me to take, if I'm not mistaken, exactly zero elective classes. This is fine, though, as my coursework has for the most part been thoroughly enjoyable and extremely well tailored to my interests.

Anyway, my first exposure to the modeling that now consumes so much of my time occurred when I received an email about a reading group described as an "introduction to general recognition theory (GRT). GRT is a general, multi-dimensional signal detection theory of identification and categorization. We will focus on the original theory (Ashby & Townsend, Psy. Rev. 1986), which was applied to perceptual independence of psychological dimensions and features."

I recall thinking that a model that deals with perceptual independence of features was precisely the kind of model I needed to know about. I'm sure that the mention of signal detection theory influenced my decision to participate in the reading group, as well, as a professor I respect greatly said once, in an offhand manner, that signal detection theory was one of the most important ideas he ever learned about. Having now learned about it, I feel confident saying the same.

The bit about features caught my eye, though, because, within linguistics, I tend toward the phonology and phonetics side. Phonology and, to a lesser extent, phonetics are built on a foundation of distinctive features, yet I couldn't recall anyone in either phonology or phonetics ever having addressed the issue of (in)dependence between two (or more) features. Well, here was a reading group addressing the issue directly. So I joined in.

Here's a grossly over-simplified description of GRT. In the simplest case, the experimental task involved the identification of four stimuli. These four stimuli consist of the factorial combination of two features, each with two values. To provide a simple, concrete example, suppose that the two features of the stimuli are two 200 ms pure tones, one at 200 Hz, the other at 500 Hz, and that the two values are 'absent' and 'present'. If '0' indicates absence, '1' indicates presence, and the first position represents the lower frequency tone, the four stimuli are 00, 01, 10, and 11. Now, the stimuli in these sorts of experiments have to be confusable, either because they similar enough that the noise of the perceptual system itself makes it difficult to always identify them correctly, or because they are purposefully embedded in noise. In the present case, it would probably work best to embed them in noise.

Confusable stimuli are important because the data we work with are confusion probabilities. For example, given that the stimulus '01' was presented, we tally the number of times a subject responded '00', '01', '10', and '11'. We then divide each tally by the total number of times that stimulus '01' was presented and find that, say, Pr( response = '10' | stimulus = '01' ) = 0.23. We do this for each stimulus and get a 4x4 confusion matrix (typically arranged so that the columns correspond to responses, rows to stimuli).

It is assumed that each physical stimulus corresponds to a probabilistic distribution in perceptual space and that the distributions corresponding to the four stimuli overlap. In the most general form of the theory, no assumptions about the particular character of these distributions are made. In the most common special case of the general theory, it is assumed that each perceptual distribution is bivariate Gaussian. In any case, the perceptual effect of each stimulus presentation is assumed to be a point in the perceptual space. Because the perceptual representations are overlapping and distributed, the point is ambiguous - there is a non-zero probability that it came from each of the four possible distributions. So, we need the multidimensional analogs of signal detection theory's scalar criterion, which turn out to be decision bounds - curves in the perceptual space that define response regions. If the perceptual effect of a given stimulus presentation falls, say, above the bound separating the 'absence' and 'presence' regions for the 200 Hz tone and below the bound separating the 'absence' and 'presence' regions for the 500 Hz tone, the subject responds '10'.

One of the great strengths of GRT is that it allows us to differentiate a number of different kinds of perceptual independence. To make a long and very interesting story very short, GRT provides three key definitions. Perceptual independence holds for a given stimulus if statistical independence holds in the corresponding perceptual distribution; perceptual separability holds for a feature if the perceptual effect of that feature do not depend on the level of the other feature; and decisional separability holds if the decision bounds are parallel to the coordinate axes in perceptual space.

Perhaps the greatest weakness of the theory, though, is the fact that a 4x4 confusion matrix does not suffice to determine the properties of the underlying perceptual distributions and decision bounds. Some very smart people have put a lot of time and effort into coaxing useful information out of such a matrix, though, so the theory is, in fact, quite useful. Significantly, if failures of independence and separability occur, observed probabilities can be used to demonstrate this fact. Evidence for independence or separability is much harder to come by.

Which brings me to my problem. Maybe the most important aspect of standard signal detection theory is the ability to transform response probabilities into separate measures of sensitivity and response bias. It is well documented that systematically manipulating the frequencies of the stimuli or the payoff values of the responses can induce changes in subjects' response biases without affecting sensitivity. Well, these same kinds of manipulations ought to have the same kinds of effects in the multidimensional generalization of the standard theory. Changing the stimulus frequencies should induce systematic changes in subjects' decision bounds without changing the relative positions or properties of the perceptual distributions.

My problem is to figure out what kinds of manipulations of stimulus frequencies will produce what kinds of usable changes in the observed response probabilities. My intuition tells me that the larger degrees of freedom provided by the additional confusion matrices ought to enable more direct investigation of the underlying distributions. If shifts in stimulus frequencies induce predictable shifts in decision bounds, it ought to be possible to abstract away from the particulars of the decision bounds themselves to more directly get at properties of the perceptual distributions.

So, the first thing I need to do is figure out exactly what decision bound changes I expect when I change stimulus frequencies. Then I need to figure out what, if anything, this buys me in terms of the relationships between the underlying perceptual representations and the observable response statistics. At the very least, the additional data should allow for some Gaussian model fitting, but I would like to produce some respectable analytic results, too.

[Postscript: Okay, so this post was less about my problem and how I plan to solve it than it was a quick and dirty description of GRT. By quick and dirty, of course, I mean terribly incomplete. Among other things, I almost completely avoided explicit mathematical notation. I also mentioned only indirectly two of the many scholars who have developed the theory, without whom I wouldn't be thinking or writing about this stuff at all. I will rectify both of these shortcomings in the future with more complete presentations of a number of particular papers on GRT.

Also, although I am happy that I am, thus far, upholding yesterday's solemn oath, it seems likely that, by committing myself to daily blog posts, I have implicitly committed myself to less careful proof reading. For example, this post turned out to be longer than I expected it to be, and I don't really feel like proofing it again, but I don't want to withhold publication, so caveat lector.]

No comments: