The research initiative of Helsinki Institute for Social Sciences and Humanities (HSSH) Datafication of society and SSH research that focuses on the opportunities and challenges of the rapidly evolving, data-intensive, digitalized infrastructures. Following the mission of the institute, the programme supports multidisciplinary research that focuses on “datafication”
This programme outline offers an entry point and a preliminary of a field of questions for social scientists and humanities (SSH) scholars interested and involved in the initiative. Understanding “datafication” as an interrelated socio-cultural challenge and a new condition for SSH research points to an inquiry of an evolving field of actors, interests and topics. In order to encourage a multidisciplinary effort to study this landscape, the outline aims at describing the research initiative without anchoring itself to any specific disciplinary vocabulary or theoretical perspective.
The aspects, themes and questions raised in the outline point to the core of social sciences and humanities. The Datafication of society and SSH research works to a wide mobilization of the whole community of scholars to pay attention to the ways in which the methodological, epistemological, legal, ethical and political environment of SSH research is changing. In order to meet the challenge of the transforming social interaction infrastructures and the increasingly data-intensive environment a broad, collective effort and then full intellectual power of different disciplinary perspectives is called for.
The outline is structured in three interrelated sections, starting with an analytical working definition of datafication, then identifying of its general societal challenges, and finally drafting more specific, critical questions for social sciences and humanities.
Digitalization of the infrastructures of social and cultural interaction has steadily gained speed during past decades. “Datafication” is a term that has surfaced in popular, political and academic in discourse in recent years to capture the recent stage of this change.
At the heart lies the exponential growth of the volume of digital data. The amount of digital data has exploded both because of constant data collection built into interfaces of everyday life and because of efforts to digitalize traces of human activity (texts, audiovisual contents, documentation of public registers, etc.) that were not initially produced in digital form. Digitalized societies produce, collect and archive data in an unforeseen scale (“big data”), reflecting a dominant belief on all data and information as raw material that is or can become valuable. Consequently, the variety of data being harvested and stored has expanded dramatically, now extending from individuals opinions, consumer choices, social network relations and mobility to physical information such as heart beat rate or facial recognition and emotional status – and so on. Living in a datafied world produces data traces constantly and often unconsciously.
The digital compatibility of these different and multiplying information flows presents a qualitatively new condition of identifying new patterns of behavior, interactions or structures of meaning and following them in time and space. This allows posing questions about society and individuals at a new level of complexity, suggesting enhanced abilities to explain the workings of social and cultural systems and new level of accuracy of profiling groups and individual actors.
The societal effect of this new volume, variety and compatibility of digital data relies on the growth computational capacity of data-analysis. This capacity entails both a centrifugal and centripetal element. On the one hand, computational capacity has spread more widely into societies, empowering individuals and groups by offering them (locally) better tools of accessing knowledge and analyzing available and open data sources. On the other hand, the ability to take advantage of global data flows demands massive capacities (collection, storage, computational power) that have concentrated into the hands of few actors (transnational corporations and states).
In a datafied social condition, the analysis of data flows is also increasingly automatized, relying on complex, sometimes self-evolving algorithms that are intertwined with and sometimes replacing human decision-making. Such automated analysis has also qualitatively changed the speed and temporal dynamics of information flows, allowing more effective feedback loops in digitalized life. Algorithmically analyzed, complex profiling (based on variety of data points) can help serve and manage the “needs” of individuals and the information and services they are exposed to, or at least theoretically, open a terrain of predictive knowledge about individual behavior.
Not surprisingly, the social condition of datafication has raised both high hopes and deep concerns about the kind of social change that these developments entail or enable. In these debates, the ability to harvest and control vast amount of data is recognized as a new power resource that increasingly shapes contemporary social order. This is something that both optimists and critics of datafication development agree on: extracting and analyzing data is a key for managing the complexity of contemporary networked infrastructures (circulation of information, influence, goods and services, etc).
Thus, data-intensive environment re-organizes human relations in a systemic scale, disrupting old practices and yielding new innovations. The increasing diversity of data collected, sometimes without clear sense of what it will used for, in turn, reflects a practical belief that essential features of individuals or human action can be captured through collection and combination of digital data and that human behavior can be anticipated through complex intersections of such information. This highlights a dominant social trend of tailoring information, services and other measures through profiling based on data-analysis. On the one hand, such developments provide functional interfaces between institutions and individuals. On the other hand, such effective mechanisms are never benign but always situated in specific kinds of contexts and social power relations.
Increasingly intrusive collection of behavioral data also raises critique about the changing boundaries of privacy. It also raises a broader question about what aspect and experiences are not captured in the dominant data flows of contemporary society. Such question can point for instance towards studying the troubling effects of biased data in datafied systems, taking stock of the refusal of people to engage with datafied interfaces, paying attention to people’s attempts to protect their data rights and privacy, or to negotiate “discretionary spaces” inside datafied institutional practices.
While data-intensive practices can promise new means to identify, anticipate and address sometimes very specific and contextualized needs, this capacity plays out differently in various situations and fields (commerce, administration, political influence, workforce management, etc.) The intersectional logic of data analysis can also serve as a powerful mechanism that consolidates existing social inequalities pointing to the need for contextualized evaluation of data use and social justice.
A major sign of datafication is the rise of dominant global data-actors. These include private companies that have arguably reached historically new position as they provide the core of (global) infrastructure of political, social and cultural interaction which has prompted debate and policy responses. In different political regimes, the relationship between state and market actors vary, but power over the digital infrastructures everywhere relies not only on extraction of data but also on control of opaque (black boxed) algorithms that enables management of data flows. The questions of accountability of the use of algorithms and artificial intelligence extend from the level of social systems to specific decision-making contexts, posing difficult dilemmas about the trade-offs effectiveness, accuracy and fairness.
Networked infrastructures have empowered individuals, communities and movements (or publics) in important ways, consistent concerns have been raised about information loops that enhance political polarization, biased information flows and echo chambers. Discussions about the role of datafication in political, cultural or social polarization point to the importance on setting such research into the context of political divisions rooted in larger social developments and policy choices. Still, the potential effectiveness of input-feedback –loops created by algorithmic data control and decisions have created a new environment where such divisions play out. Taken far enough, the logic of detailed profiling and the possibility of predictive knowledge about the attitudes, reaction and opinions of targeted groups of population, can indeed undermine the very conceptual base of public opinion.
Datafication raises new methodological, epistemological, ethical and normative questions to SSH-scholars. Simultaneously, they also point to potential changes in the position of SSH-research among other sciences and in relation to societal demand of research-based knowledge.
Data intensive research suggests new opportunities for testing of old theories and developing of new, innovative conceptualizations. The computational effectiveness of digital analysis can enhance more explorative approaches to large data set materials and produce new combinations between qualitative and quantitative analysis – and representation of findings. Datafication does not put theory out of business, but actually creates new demand for it.
The growing variety of data sources has offered new kind of information to SSH-scholars, bringing new domains of evidence to social sciences and humanities (e.g. tracking physically “measureable” reactions in social situations, mapping multimodal features of interaction). At the same time digitalization of old sources can open old materials to new kind of analysis, enhancing also understanding of the past. Taking full advantage of these possibilities clearly enriches the explanatory power SSH research. At the same time, scholars also have to remain alert to the quality and relevance of (new) data, and stay also connected to findings and interpretations that remain outside computational analysis and behavioral evidence.
The compatibility of different data sources offers a powerful method of recognizing social problems, identifying cultural patterns, social groups and identities, locating new kinds of correlations and intersections of cultural and social forces. Such intersectional analysis can suggest new level of societal self-understanding. However, the ethical and political aspects of such knowledge also call for more reflection on the relationship between research and (individual) agency.
It is crucial to bear in mind that large parts of the data accumulating from various applications and interfaces is practically structured to serve particular kinds of knowledge interests (e.g. more effective reach of specific target groups of population). In a data-intensive research culture this can lead to a temptation to craft questions based on what the available data resources make possible and easy to ask. This calls for the counter balance of autonomous epistemological debate where SSH research questions are formulated also independently of datafied resources. There is also good cause to resist the risk that complex computational methods become opaque elements of the analysis of large data-sets. Hence, developing and formulating good scholarly practices that help secure the validity of complex computational analysis is one of the responsibilities of the SSH-community.
Social quantification in modern societies is historically linked to national statistics as a form of public knowledge production. Datafication denotes an era of an increasingly complex, rich and partly privatized infrastructure of quantification where the demand for principal accountability of knowledge production also becomes more complex. Developing open data practices can be one way of defending public knowledge in this conjuncture. Sustaining independent critical debate about key SSH research questions in the era of datafication is another core resource.
By bringing together research that highlights process of datafication and its impacts to SHH research the research initiative Datafication of society and SSH research works to support the need for social science and humanist inquiry and to enable it to retain its full capacity and usefulness in a data-intensive society.