Connections and Gaps: How to (Better) Understand Societies and their Archives with Letter Metadata?
This group analyzes epistolary metadata (names and dates of senders/receivers of letters) of two major aggregated collections: the correspSearch collection, that comprises German epistolary data from the 17th to the early 20th century, and the Finnish CoCo collection, that concentrates on the long 19th-century epistolary material. These two collections pose different challenges to the Humanistic and computational enquiry. The correspSearch aggregates curated data that has been published previously in epistolary editions, and it thus reflects the scholarly choices as to the important and interesting persons. The CoCo corpus casts a socially wider net, as it harmonizes and publishes “raw metadata” acquired directly from archives and museums. This means that the quality of the data varies greatly and the scholars working with the dataset need to be both inventive and careful regarding the processing methods and research questions.
The work of this team is inspired by the question, what can we learn about writers, societies, communities or epistolary cultures that have not yet been achievable with purely qualitative/traditional analogue means? We will reflect on persons writing and sending letters, correspondences and society, but we will also think critically about archival collection practices. What kind of processes of heritagization have contributed to the formation of epistolary collections, and consequently, to our understanding of the past? What kind of source or data critical practices and methods we need to develop to use this data filled with gaps and absences? From the computational perspective, the datasets provide an interesting opportunity to study history by applying computational methods and technologies to the data, such as Linked Data, social network analysis, knowledge discovery, and data visualization.
We will use a wide range of tools and approaches. The data, tools and supervision will be provided by members of the project Constellations of Correspondence and experts on network analysis. The group can both study the already existing LOD corpora (the correspSearch) and work with the harmonizing and enrichment of the Finnish material (e.g. regarding occupations and social classes).
The letter metadata consists mainly of person and place names and temporal information which means that specific linguistic skills are not particularly relevant.
Much research within computational social science has been done on what happens when groups espousing conflicting opinions and worldviews interact in online spaces. However, thus far the majority of this research has thrown away the interactional structure already formally encoded in thread and message-reply relationships. In essence, such research has started from a viewpoint where each message only appears as an individual shout into the darkness, instead of a way to participate in an actual ongoing discussion.
As a consequence, researchers using such data have been left no alternative but to try to recover the discourses and communities that interest them through macro-level aggregate techniques such as the network analysis and clustering of retweet or follower networks or the like. In this group, our approach will be to start from the exact opposite. Capturing the flow and structure of discussion as a core asset alongside the content, the group will focus on finding patterns and commonalities in the micro-level, discussional interactions that happen in online debates.
Using multiple case studies, the group will study what rhetorical and structural strategies different participants in these debates utilize to e.g. form and convey identities, support in-group members in the conversation and to deride and push down outsiders. Tentatively, the case study materials will cover charged discussions around issues ranging from abortion policy through the incel (“involuntary celibacy”) phenomenon to discussion around lynchings in India and the US.
Students from a wide variety of backgrounds will find things to do in the group. Students with a qualitative methods background will find work in identifying and teasing out the interactions and framings that interest us. From the computational side, there is room for both quantitative analysis and data mining of the conversation structures, as well as for natural language processing and information extraction in complementing the structural signals with signals derived from the content of the discussions.
The Enlightenment saw a great rise in printing of scientific publications in the 18th century. Illustrations played significant and varied roles in these works, as they allowed easier communication of information and ideas, from mathematical theories to descriptions of animals. These illustrations are a well known phenomena, but they have not been previously studied at scale. This group will employ image processing and machine learning methods to analyze them in a dataset of eighteenth century publications.
The questions asked can revolve around overall understanding of the role of illustration in scientific publishing, such as how did the use of illustrations differ in different fields, and did the volumes, dimensions and types of illustrations change over the 18th century? What kind and types of illustrations were used? What was the role of illustrations in the scientific discourse of the period, and how did this change? Or the group can focus on a narrower front, and map the nature of illustrations in, for example, natural history publications in more detail. Other examples of specific categories that can be studied include illustrations of plants or animals, maps, technical drawings in publications documenting arts and trades, and anatomical diagrams in studies on medicine.
The group is suitable for participants with various backgrounds. Participants with an understanding of qualitative methods and/or interest in literature, history of science and art will find work in formulating and answering the research questions, furthering the understanding of the materials and contextualizing the project in relation to prior research. On the technical side, machine learning and computer vision methodologies for categorizing the illustrations, and elements in them, as well as statistical analysis of the results are among the tasks the participants in the group can expect to employ and develop further understanding in.
The data used will be Eighteenth Century Collections Online (ECCO), a dataset of over 200 000 volumes, approximately half of everything printed in the century. In addition, metadata for identifying the scientific publications in the corpus is available, as well as information on locating the illustrations on the raw page images.
This group uses big parliamentary data to explore political polarization in the short and long term. Increasing political polarization has been argued to threaten the future of European and American societies in the 21st century, as liberal democracies require a genuine will among different political groups to discuss, negotiate, and compromise on common issues in parliament (Levitsky & Ziblatt 2018; Mudde & Kaltwasser 2013). In addition to increased polarization, its nature has allegedly changed in the last decades: the traditional left-right division based on the economy has been replaced by multidimensional identity-political issues such as the rights of sexual minorities, vegetarianism or immigration (Hobsbawm 1996; Fukuyama 2018). Arguments about political polarization have often been based on qualitative close readings of a limited number of contemporary sources. The recent rise of machine-readable parliamentary data allows researchers to study such arguments with computational methods (La Mela et al. 2022). In addition, novel theories can emerge when political phenomena are placed in a longer-term context.
The group focuses on the parliamentary debates in the British Parliament, one of the oldest representative assemblies in the world, from the 19th century to the present day. The debates from the 2010s and 2020s with rich metadata and linguistic annotations have been made available by the CLARIN ERIC ParlaMint project (Erjavec et al. 2021). The older parts of the debates can be accessed through easy-to-use interfaces for close reading, or XML files can be downloaded for computational analysis. As supplementary materials we can use, for example, parliamentary debates from other countries, voting data in the House of Commons and House of Lords, and general election data. This group is ideal for anyone who is interested in finding patterns in text data and combining linguistic and network analysis in order to better understand the human mind and societies.
Computational tasks can include but are not limited to
* representing multidimensional data (e.g. political speeches) as vectors and embeddings
* enriching parliamentary debates with other datasets (e.g. election data)
* comparing the similarities / differences of individual politicians, parties, and historical periods
* analyzing and visualizing time-series data and complex networks
Humanities and social-science tasks include but are not limited to
* discovering research questions related to changes in political polarization
* inventing meaningful units of interest that can be measured computationally
* validating the results from computational analysis by manually close reading parliamentary debates
* refining elementary quantitative information into insightful interpretations
Erjavec, T. et al., 2021, Linguistically Annotated Multilingual Comparable Corpora of Parliamentary Debates ParlaMint.ana 2.1, Slovenian Language Resource Repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1431