Big data and uncertainty
Unprecedented quantities and types of data cannot eradicate uncertainty.
Technological progress and pervasiveness is allowing us to collect new kinds of data, and to collect data on scales that were previously unimaginable. But it should come as no surprise that having more data is not synonymous with having better data.
More importantly, it will not provide a perfect description of the system or process of interest. There are inescapable uncertainties that arise in the data collection process, in the validation and analysis techniques with which we scrutinise data, and in the models with which we interrogate the data (whether formal, such as mathematical models, or informal, such as mental models).
Accordingly, it is paramount to identify how these uncertainties affect our use of data to come to a conclusion or make a decision. As our data collection, analysis and interpretation methods become ever more sophisticated and nuanced, so too must our understanding and communication of the layers of uncertainty that underpin these processes.
This need is especially clear when engaging with non-experts, who may understandably assume that the vast wealth of data and computing power at our disposal renders any problem well-defined and tractable, and for whom the presentation or visualisation of uncertainty must be excruciatingly for appropriate and sensible translation of research findings into engagement and policy. Data wealth cannot eradicate uncertainty.