Image: Google server farm Oregon. Screen grab from Google Maps by the author
Before attempting a definition of ‘big data’ it’ll be useful to clarify what we actually mean by the term data in its own right and it’s relationship to connected concepts such as information. For example we often refer to ‘personal data’ when we describe information about ourselves on social media platforms or go ‘data mining’ from information collections stored in a data warehouse or information repository. Luciano Floridi has written extensively on this topic so I’m going to channel him as a means to unpicking a term which to use his phrase ‘enjoys considerable latitude’.
Two of the most commonly understood formulations of data are as a given entity or integer in a mathematical context, or as empirical evidence (‘facts’) gathered to inform a legal process or scientific enquiry. A fact of course requires a complex set of contested definitional terms in its own right, and as Floridi points out, might be more correctly understood as an artefact resulting from a chain of processes including, data collection, processing and analysis. He gives the example of ‘census data’ which establish ‘facts’ about populations through organisation of raw data into information sets. For the sake of argument then we might say that data pre-exist information which cannot exist without it (although there is some lively debate on this matter in philosophical circles).
It goes without saying (but I will anyhow) that in the contemporary context data is also inextricably linked to the digital, i.e.understood as collections of strings of alphanumeric characters; symbols; electrical signals; binary switches etc. stored and distributed across a number of different digital platforms. These components, are then subject to algorithmic and other processes which organise them into informational forms, e.g. databases or ‘digital media’ such as sound, image, moving image etc. Again we see a move from data to (organised) information allegedly productive of insight or enabling elaboration of concepts (or as I’ve argued elsewhere experiences). Anyone who has mucked about with any image or sound editing will also know that digital data is also highly manipulable, mobile and capable of being reused in contexts sundered from its original referents. This isn’t unique to digital data but does underscore an important characteristic, namely its abstract qualities. This obviously has significant ramifications both for epistemic conceptions of data as signifiers of truth and for creative explorations of the type we attempt in this project.
So to summarise ‘data’ is a slippery concept that depending on context of use can mean ‘information’; be descriptive of commonly understood digital processes; or from an epistemic point of view be a measure of some referent ‘in the world’ with a view to collecting facts toward a reasoned engagement with it.
Floridi has more to say on this subject but that’s for the next post.
Tom Corby 19th May 2014
Floridi, L”Data”, pre-print article for the International Encyclopedia of the Social Sciences, 2nd edition, ed. W. A. Darity (Detroit: Macmillan, 2008). </www.philosophyofinformation.net/publications/pdf/data.pdf> (accessed 15th May, 2014)