All Watched Over by Machines of Loving Grace

This article tries to draw some distinctions between using management data or business intelligence, and the use of “big data”, with some caveats about the latter and the possible blind trust in numbers from the less-than-numerate.


Last week I went to a demonstration of a piece of a piece of software to help with student retention. There are some great things that the tool allows – integration with student information systems (including SITS), access to VLE analytics; the ability for any member of staff to flag a concern about a student. In addition to that however, the system looks at the last three years worth of retention data, looking at who withdraws and why and then  predicting correlations (if not causality).

So far, so good. I’m a big fan of exploiting data that we have available to us, to allow us to perform more effectively and successfully.

For example, looking at national data, we can identify how well we perform as an institution compared with others, either overall, or in individual subject areas. From this we could identify how successful we are in recruitment, or in degree outcomes

At a more granular level, ,we can look internally at portfolio performance information, to see how academic awards perform overall compared to each other – how overall retention rates or good degree outcomes compare between subjects. At a lower level of granularity, we look at the marks achieved on individual modules, their distribution, and how they compare to each other.

All of this provides simple and useful management information (or at the least granular level, business intelligence) which can help us to improve what we deliver, and improve the outcomes for our students.

What it does not do is provide a “big data” approach to education.

With enhanced student information, linked to personal tutoring or coaching we could start to look at how we could support individuals better, to identify their likely outcomes and to support them in achieving them. This is still a management information approach.

Going to eh next stage though, of profiling students, based on their various individual characteristics is where the water starts to be muddied.

We cloud provide information to tutors on information such as: entry qualifications; attendance; engagement with the VLE and marks obtained. In addition we also hold information on age, ethnicity, gender, socio-economic class, first generation HE, distance from home and many others. Individual staff may not be able to make any inferences from this themselves, but an algorithmic approach could.

Considering retention, the big data approach would look at all of this, and provide algorithms to identify a risk factor for students withdrawing. It could use a traffic light system – red, amber and green, with those scoring red as being most likely to withdraw.

Kate Crawford of MIT and writing a blog for the Harvard Business Review says:

But can big data really deliver on that promise? Can numbers actually speak for themselves?

Sadly, they can’t. Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.

Depending on how the algorithm has been decided, we would then decide where to focus our interventions. Assuming that there will always be withdrawals, maybe we would’t intervene in studnets flagged as red, a their probability of withdrawing is high?

We’d need to look behind the algorithm, These are not as agnostic as the purveyors of technology might have us believe. If we found that students with BTEC entry qualifications were more likely to withdraw, we might flag them as a concern. However, we also know that students of a BME background are more likely to have a BTEC qualification. Our  algorithm might now have produced an unintended consequence of flagging these students as a high risk of withdrawal, and our policy might possibly even limit the interventions we might use.

If we adopt a big data approach, just to this simple aspect of HE, further questions arise for me:

  1. What information do you share with teaching staff – do they see the colour coding?
  2. What do you share with students – do they know how they have been categorised?
  3. How easy is it to change categorisation?

The HE sector has plenty of data to use, some of it could be treated as “big data”, and although  it might be useful to identify some correlations, unless we include human agency in our decisions then we cede control to a series of computer algorithms. We have to be prepared or able to challenge the outputs, and must not naively trust any set of numbers we are presented with.

I’ll finish with a couple of quotes from David Kernohan of JISC:

After all, if big data can reduce every problem to a bar chart, you don’t need people to choose the option that the machine tells you will make the numbers go up. – See more at:


those of us who wish to continue being knowledge workers need to start making sense of data (and for that matter finance, but that’s maybe another story). If every policy position is “justified” by a slew of numbers, we need more people that can make sense of these numbers. Maths – naturally – is hard and we’d all rather be shopping or watching cat videos. But if we want to understand the decisions that affect the world around us, we need to learn to read numbers and to be confident in disputing them. Policy is now quantitative – we need to get better at teaching people how to participate. – See more at:

My title, by the way, comes from a poem by Ricahrd Brautigan, and was used as the title of a series of BBC documentaries in 2011.