Introduction to methods for digital humanities

Please see the newer version at

gitbook.io/meth4dh/">

This page surfaces the material for the "Introduction to methods for digital humanities" course taught at the University of Helsinki (note the definition of ). The intention is for this material to eventually be usable for complete self study in addition to contact teaching. We're not there yet though, so at present you the reader are responsible for sifting out the bits that are already usable without an accompanying lecture.

For any enquiries, please contact .

Learning goals

Together with its sister course, this course acts as an introductory signposting course to our digital humanities. While the sister course, , lays out the landscape of all the different digital humanities and charts our place there, this course is aimed at mapping the landscape within that definition. The course thus provides students with the knowledge they need to choose their own focus within computational humanities, also manifesting in the ability to choose from the optional courses in the .

After this course the student understands the multiple ways in which methods benefit work within the digital humanities. She herself is able to use simple tools to work with data. In addition, she has attained knowledge of the fundamental concepts of programming, through which she can start to expand her capabilities, should she so choose. She also learns how open, reproducible research and publishing is done in practice. Further, the student gains a general literacy on advanced computer science methods applicable to digital humanities, and when to apply them. Finally, she learns to apply all of the above in practice in a small concrete digital humanities project.

Prerequisites: absolutely none

Course content
  •  ()
    • Easy, ready-made tools for visualization and exploration
    • Fundamentals of programming for data processing
    • Data analysis method literacy
    • Assignments to be completed before proceeding to the next part:
      • Answer the course background
      • Look over the final projects from last year as well as the datasets listed. Select the project and dataset that interest you the most. Post a short message on the #meth4dh channel on the course  on why you chose those two.
  •  ()
    • Reading assignment for the next part: 
      • sections 2.1-2.4 for a categorization of different uses for visualization
      • & for learning to not trust visualizations blind
  •  ()
    • Assignments to be completed before proceeding to the next part:
      • the of the
      • the 
      • Experiment with at least one of the following tools described in the slides. Post a message on Slack about your experience with the tool you chose.
  •  ()
    • Assignments to be completed before proceeding to the next part:
      • the , including the of the 
      • the
      • the  and the  (create regexes to match first names, last names, years, birth places etc)
  •  ()
    • Reading assignment: 
      • Try to answer the questions given under the "Reading material" heading
    • Check out the site
  •  ()
    • Reading assignment: 
  •  (, )
    • Assignments:
      •  ()
      • Read on some small, actual work:
        • The  of the DHH15 key concepts of socialism group
        • The of the DHH15 Finnair Blue Wings multimodality group
        • If you understand Finnish, the
    •  (, )
    • Digital humanities project
Final project

To pass the course, you are required to demonstrate some grasp of actual digital humanities work. Therefore, you are tasked with taking some dataset, and processing it in some way to yield an interesting analysis. 

Potential datasets/APIs are for example:

  • We also have a set of automatic transcriptions from 35,000 Finnish language TV and radio programs courtesy of the Finnish public broadcaster YLE available by asking

Tools for processing and analysis are for example:

  • Preprocessing: , , , , , , , , ...
  • Topic modeling: , , , , ...
  • Dimensionality reduction/clustering: , , , , , ...
  • Social network analysis: , , , , …
  • Simulation: , ...
  • Neural networks: , , ...
  • Association rule learning: , ,
  • Anomaly detection: , ...
  • Visualization: , , , , , , ...

To return the assignment, you will need to upload your data, code and results into a repository, link that repository with and give us the Zenodo for your work. Include a description of what you've done, following as best as possible the guidelines for open, reproducible research:

  1. which data did you use
  2. what did you do to it (and how can I reproduce it)
  3. what do the end results show
  4. how would you continue the work (towards academic meaning)

Further info: your work doesn't need to be a full-blown pipeline from raw data to interesting results. It can be just some steps towards that. However, if you don't have end results, you need to very explicitly describe what your next steps would be to get those (i.e. a plan for future research). 

To further aid you in your work, here are some previous submissions for inspiration (for most of them, you should actually click the GitHub link on the right to start to make sense of them): 

  • themes in Hungarian folk love songs - DOI: 
  • extracting and visualizing biographical information from an old bank matricle - DOI: 
  • analysis of a survey on user involvement in software development - DOI: 
  • polite vs casual address form use by Finnish language learners in different situations - DOI: 
  • discovering patterns in chalcolithic and early bronze age burials in northeast England- DOI: 
  • themes discussed in Helsingin Sanomat in 1905 - DOI: 
  • differences in use between the words maahanmuuttaja and pakolainen in Finnish newspapers 1970- to present - DOI: 
  • differences in how frequently Finnish and Swedish newspapers talk about the Romani people - DOI: 
  • contrasting Beck's lyrics to blues lyrics - DOI: 
  • extracting and analysing recipe information in an old cookbook - DOI:
  • a thematic analysis of the discussion around Guggenheim on the Suomi24 forum - DOI: 
  • differences in language between texts dealing with altered states of mind and normal fiction - DOI: 
  • preliminary analysis comparing different Finnish cabinet strategies against each other - DOI: 
  • preliminary analysis of patterns in the holdings of the Finnish National Gallery - DOI: 
Reading material

During the course, you will be given material to read before proceeding to the next part. Typically, these will be academic papers that make use of digital humanities methods. When reading such papers, we ask that you focus on at least:

  • Research questions - What are the humanities research questions? Does the project also target computer science research questions? If so, what? What is the relationship between the CS an humanities research questions?
  • Data - How has the data used been gathered? What are the data sources used? How has the data been processed? Is the data available for others to use?
  • Methods - What methods does the project apply? How do the methods support answering the research questions?
  • Partners - What is the make-up of the project? Which disciplines are represented by the participants?

Provisional core reading list:

  • On visualization:
    •  sections 2.1-2.4 for a categorization of different uses for visualization
    •  &  for learning to not trust visualizations blind
    •  based on what you want to show 
    • As a guide to reading the first paper, look to answer the general questions described above For the second paper, contrast its style with the first. Also note that you can experiment yourself with the model described in the first paper .
Where to continue?

Here I'll gather some relevant links to further resources. I think these are good, but they're also somewhat of a random selection.

  • Further courses
  • The programming humanist:
    • , a good general purpose book
    • , an excellent introduction to statistical  analysis with interactive Python notebooks
    • , the best introduction to programming for humanists that I could find
    • , lessons and tutorials for doing various DH things
    • , a nicely built general, interactive introduction to programming
  • On data:
    •  (simple introduction to different kinds of data in the digital humanities)
  • On visualizations:
    •  (different uses for visualization: gaining new insight vs communicating said insight to others)
    •  (how visualizations lie if you are not careful)
    •  (a good introductory book on choosing suitable visualizations for highlighting different aspects in data, and avoiding pitfalls in tuning them)
  • On statistics:
    • , an excellent introduction to statistical  analysis with interactive Python notebooks
    •  (chapters 1-5 essential)
    •  at the University of Helsinki
  • On open science:
  • Environments for reproducible research: