Theoretical frameworks, part 3

The [intlink id=”225″ type=”post”]first[/intlink] and [intlink id=”324″ type=”post”]second[/intlink] instalments of this saga discussed the thinking and writing processes. However, I also need to fess up to reality and do some measuring.

A theoretical framework is not a theory. The point of a theoretical framework is to frame theories – to provide all the concepts and variables that a theory might then make predictions about. (If I were a physicist these might be things like light and mass). You can test whether a theory is right or wrong by comparing its predictions to reality. You can’t do that for theoretical frameworks, because there are no predictions, only concepts and variables. The best you can do is determine whether those concepts and variables are useful. This really means you have to demonstrate some sort of use.

And so it falls to me to prove that there’s a point to all my cogitations, and to do so I need data. In fact, I need quite complex data, and in deference to approaching deadlines and my somewhat fatigued brain, I need someone else’s quite complex data.

The truth is – I’m probably not going to get it; at least, not all of it. Ideally, I need data on:

the length of time programmers take to assimilate specific pieces of knowledge about a piece of software;
the specific types of knowledge required to assimilate other specific types of knowledge;
the probability that programmers will succeed in understanding something, including the probability that they find a defect;
the probability that a given software defect will be judged sufficiently important to correct;
the precise consequences, in terms of subsequent defect removal efforts, of leaving a defect uncorrected;
the cost to the end user of a given software defect;
the propensity of programmers to find higher-cost defects; and
the total number of defects present in a piece of software in the first place.

I also need each of these broken down according to some classification scheme for knowledge and software defects. I also need not just ranges of values but entire probability distributions. Such is the pain of a theoretical framework that attempts to connect rudimentary cognitive psychology to economics via software engineering.

With luck, I may be able to stitch together enough different sources of data to create a usable data set. I hope to demonstrate usefulness by using this data to make recommendations about how best to find defects in software.