The Death of Theory: Kaggle, randomised control trials, and optimisation without understanding

Another effort to draw together some semi-coherent thoughts prompted by a couple of recent-ish items in the media.

Item 1:

“Down with experts”, a New Scientist opinion/interview piece from last month. It looks like Slate has the same interview subscription-free.

In a nutshell: Kaggle is a website for self-declared ‘data wizards’ to compete in developing solutions for data analysis problems. The examples given include marking students’ essays, predicting the properties of molecules being screened for potential use as drugs, and forecasting tourist arrivals and departures at airports. The interviewee is Jeremy Howard, Kaggle’s chief scientist. Anyone can enter these competitions, from amateur analysts to experts in the competition’s subject matter. And— big surprise— it’s not the experts who do well. They, it is inferred, are too hamstrung by the prior assumptions and theoretical frameworks of their specialism to make the big breakthroughs. It is the analysts who triumph: they find creative, unprejudiced ways to abstract information from the data and feed it to clever algorithms, such as the ‘random forest’. These operate blindly to select the most optimal solution.

The competitive nature of the process makes it akin to evolution. This optimisation of predictive models by natural selection requires no actual understanding of the underlying physical mechanisms; indeed, experts’ tedious search for such understanding is actively unhelpful in the process of finding the best solution.

Item 2:

A Radio 4 piece by Ben Goldacre about his new Big Idea: introducing randomised controlled trials into the evaluation of social policy. There are links on his blog.

My initial response to the programme was “of course it’s a great idea- why hasn’t anyone thought of this before?” Randomised controlled trials (RCTs) are frequently cited as the ‘gold standard’ of health research because in an adequately powered study, the randomisation bit should ensure a fair evaluation to a much greater extent than in any other research methodology: the trial groups should have almost identical characteristics aside from the intervention under investigation. Or, in epidemiology-speak, both the known and unknown confounders are addressed. The ‘controlled’ bit means that participants are allocated (randomly, as above) into intervention and control groups for ease of comparison, and the ‘trial’ bit covers the rest of the mechanics of the study— finding a relevant study population, developing a clear intervention, and identifying and measuring the most appropriate outcomes. So— why not introduce such methods into social policy?

To cut to the chase, I’m sure Ben Goldacre is right— more RCTs in particular, and more methodologically robust evaluation in general, would certainly be a good thing for social policy development.

That said, I do have some rather vague, nagging reservations about this focus on RCTs, which I think relate to the vague, nagging discomfort I experienced when reading the NS Kaggle piece.

Semi-coherent thoughts:

Going back to the Kaggle article, this is the final question-and-answer:

Can you see any downsides to the data-driven, black-box approach that dominates on Kaggle?
Some people take the view that you don’t end up with a richer understanding of the problem. But that’s just not true: The algorithms tell you what’s important and what’s not. You might ask why those things are important, but I think that’s less interesting. You end up with a predictive model that works. There’s not too much to argue about there.

I suspect a lot of scientists would disagree with the idea that why something works is less interesting than the fact that it does work, or indeed, that this distinction is particularly meaningful. Jeremy Howard does admit that experts— albeit demoted to ‘strategists’— are important early on ‘for when you’re trying to work out what problem you’re trying to solve.’ But at the recognition-of-useful-patterns-in-data stage, he asserts, they are worse than hopeless.

You might also expect experts to be useful in terms of understanding and contextualising the end results of data analysis, but the Kaggle approach happily dispenses with them at that stage too— what’s important is whether the problem is solved, not what the problem means.

I’m sure the Kaggle approach has real benefits in terms of the data analysis/pattern recognition element of how science is done. I can certainly see how too much pre-existing theoretical baggage slows down the identification of what works and what doesn’t. But as portrayed in the interview, Jeremy Howard seems to go way beyond this. He makes sweeping judgements about the necessity of understanding things at all. If we make a sweeping set of assumptions about what what constitutes a ‘problem’ and what’s ‘optimal’ in how we judge the outcomes of data analysis, then yes, I suppose he’s right: there’s no need to go beyond the blind, evolutionary processes that efficiently lead to a result; perhaps Theory is Dead. But to most of us living in the real world, and I suspect to most scientists trying to make sense of it, this is an impossibly narrow and incomplete way of living and learning.

Back to Ben Goldacre and using RCTs to judge social policy.

I like RCTs. Within the constraints of how we select the study population, how the intervention is specified and delivered, and how we choose the outcomes with which we evaluate the intervention, they will always deliver the fairest, most unprejudiced answer for what works and what doesn’t. Where we can use them, we should use them more widely, in social policy as in health. But if we are going to use them more widely we need to understand them, and their limitations.

The problem is that there’s an artificiality about RCTs that’s always going to be hard to eliminate completely. When we are talking about complex interventions and cluster RCTs (which we are, if we are talking about social policy) it gets even harder. Binary decisions are rare; there will be complex overlapping and nesting of interacting interventions and variations in how the intervention will be delivered, and differences in opinion as to the most important outcomes; and later on, of course, problems in implementing a RCT-proven intervention effectively in the real world.

RCTs are very clearly the gold standard for evaluating the efficacy of a pill that either prolongs life or does not in a healthy person (or otherwise healthy person with a single disease). And they will be the gold standard in many other situations too, but as you progressively introduce nuance and complexities— from pills and binary decisions to treat, on to decisions on deploying complex social interventions; from healthy adults, on to the multimorbid elderly or the complex needs and interactions of a developing child— the clarity of this benefit becomes progressively muddied.

Like Kaggle data analysis, RCTs in themselves

1/ tell us nothing directly about underlying mechanisms, just whether the intervention works or not; and

2/ require some care and presumably some expert knowledge (whether specialists or Kaggle-type ‘strategists’) to effectively state the problem and provide the context to render the process meaningful.

Perhaps (1) is unimportant anyway and (2) should go without saying. But I have some philosophical objections to this response to (1) and there are practical concerns regarding (2) I’m reluctant to dismiss too readily. The philosophical objections I’ve touched on above, but will otherwise leave aside for now. I’ve already mentioned some of the pragmatic concerns about implementing RCTs in social policy. As already stated, I don’t think they are a reason not to use RCTs more widely. But narrowly focused, poorly designed and inadequately interpreted RCTs are just as likely to be used for political or corporate distortion and deception as any other study design, with the extra danger that these distortions will come burnished with shiny ‘gold standard’ RCT status.

You could say that RCTs, like more recent innovative data analysis techniques, are just optimisation tools. And optimisation without understanding is meaningless. If we use such tools without awareness of their limitations— while simultaneously denigrating the role of theory— there may be trouble ahead…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: