ANEE Lexical Networks v.2.0

This page introduces a new way of exploring lexical semantics in Akkadian.

Team 1 of the Centre of Excellence in Ancient Near Eastern Empires (ANEE) has created a lexical portal that functions as a graphic semantic dictionary. Via this portal the user can explore semantic networks for one (or multiple) words that one is interested in. By following the links, one can also trace attestations back to the dataset in and from there to Open Richly Annotated Cuneiform Corpus (). This page gives a very short introduction to the portal and the research behind it. For more information on the methods, data and processes behind it, see our (listed below, with links to electronic offprints and datasets). This is the updated version of the previous “ANEE lexical portal of Akkadian: PMI”. To view the old version, please navigate to the “Archived Previous Versions” part of the page.

Help Page

A User Guide, videos, help, and FAQs are available for the ANEE Lexical Portal Help Page.

How to cite this portal

Cite portal

Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. URN: .

Cite specific lexical network

Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. See esp. “[title of the network]”. URN: .

Example: To cite the English version of the Neo-Assyrian network in Assyrian, it should be cited as: Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. See esp. “Neo-Assyrian texts in Assyrian with all Akkadian words (in English)”. URN:

Cite data used to create these networks

If you are interested in the data used to create these networks, you can access them using this URN. It will take you to the META-SHARE record, which will in turn take you to the Zenodo repository that holds the data. If you do use this data for future work, please cite it as the following:

Sahala, Aleksi, Jauhiainen, Heidi, Alstola, Tero, Hardwick, Sam, Bennett, Ellie, Jauhiainen, Tommi, Svärd, Saana, & Lindén, Krister. (2022). ANEE Lexical Networks v. 2.0 - the dataset [Data set]. Zenodo.

What is “ANEE Lexical Networks”?

The approach we adapted has its roots in the classic work of Jost Trier and the Saussurean distinction that is commonly made between syntagmatic and paradigmatic relations in the meaning of words. Semantically, there is a paradigmatic connection between words that belong to the same general category. For example, in English, the concept “chair” belongs to the semantic domain “furniture,” together with “tables” and “beds.” At the same time, there is a syntagmatic semantic connection between words that co-occur frequently (e.g., “pitch black”). For example, the word “chair” appears in many different contexts, with differing connotations. In the domain HOME the word “chair” could associate syntagmatically with words like “comfort” and “family,” as for example in the sentence: “A comfortable chair is important for the whole family.” At the same time, within the domain COMMERCE, “chair” can associate with “money,” “discount” or “store.” Therefore, in addition to “chair” having paradigmatic connections to “table” and “bed,” it belongs to a multitude of syntagmatic semantic categories.

We have used the methods of language technology to trace paradigmatic and syntagmatic relationships in a large corpus of the Akkadian language. The method called Pointwise Mutual Information (PMI) is able to capture the nuances of syntagmatic relations. PMI detects words that co-occur frequently in the dataset. To continue with the simple example sentence: “A comfortable chair is important for the whole family,” PMI can calculate co-occurrence probabilities for words that occur close to “chair” (eg “comfortable” or “family”). These probabilities attest to syntagmatic relationships between lexemes. The visualization of lexemes and their relationships as networks have proved to be the most fruitful approach to analyzing semantic domains created by our methods.

This short introduction is naturally a much abbreviated and simplified description of our work, but we hope that the portals will provide tools for scholars to reflect on the semantic domains of the words. The research is ongoing and the lexical portals presented here are by no means the final result. We hope to get feedback from colleagues regarding the portals -- please don’t hesitate to get in touch with team leader .

Networks available to you

We have created a suite of 35 Lexical Networks based on PMI calculations. They are divided into three categories: a network only of proper nouns; those where the language displayed in the network is Akkadian; and those where the language displayed is English.

In the Akkadian and English groups are three further subsections: 1) networks built upon all texts in Oracc that are tagged as ‘Akkadian’; 2) networks built upon data for the 1st and 2nd Millennia (as tagged in Oracc); 3) and networks based on Neo-Assyrian texts (as tagged in Oracc).

Under each of these subsections are three further versions of the network. The first includes all words, the second excludes proper nouns, and the third only shows proper nouns.

Network for Proper nouns

In this section is a network based upon the proper nouns found in all Akkadian texts.
Number of words in all Akkadian texts on Oracc: 2,068,030
(Number of nodes: 3,625)

Networks in Akkadian

In this section are the networks where the words are presented in Akkadian. There are English translations in the networks, and you can find English versions of these networks on this webpage.

All texts written in Akkadian

These networks explore the entire dataset from Oracc with texts tagged as ‘Akkadian’. We provide a network that includes all Akkadian words, and another that excludes proper nouns.

Number of words in all Akkadian texts on Oracc: 2,068,030
(Number of nodes: 12,683)
(Number of nodes: 7,484)

2nd and 1st Millennia

In this section are networks that allow for some basic diachronic analysis between Akkadian texts in the 2nd and 1st Millennia. For each millennium, we provide a network that includes all Akkadian words, one that does not include proper nouns, and one that only shows proper nouns.

2nd Millennium

Number of words in 2nd Millennium texts on Oracc: 260,251
(Number of nodes: 3,871)
(Number of nodes: 1,868)
(Number of nodes: 724)

1st Millennium

Number of words in 1st Millennium texts on Oracc: 1,789,754
(Number of nodes: 14,233)
(Number of nodes: 6,544)
(Number of nodes: 3,905)

Neo-Assyrian

In this section are networks that allow for some analysis between Neo-Assyrian texts written in two Akkadian dialects: Assyrian and Standard Babylonian. We also include networks that cover all Neo-Assyrian texts. For each dialect, we provide a network that includes all Akkadian words, one that does not include proper nouns, and one that only shows proper nouns.

Number of words in Neo-Assyrian texts on Oracc: 1,228,320
(Number of nodes: 8,780)
(Number of nodes: 4,288)
(Number of nodes: 1,665)

Assyrian texts

Number of words in Neo-Assyrian texts written in Assyrian on Oracc: 439,614
(Number of nodes: 6,540)
(Number of nodes: 2,527)
(Number of nodes: 1,456)

Standard Babylonian

Number of words in Neo-Assyrian texts written in Standard Babylonian on Oracc: 250,648
(Number of nodes: 3,884)
(Number of nodes: 3,046)
(Number of nodes: 366)

Networks in English

In this section are the networks where the words are presented in English. The graphs are exactly the same as the Akkadian versions, except we provide the English translation for the words in place of the Akkadian. You can still search for the Akkadian term in these networks as well.

All texts written in Akkadian

These networks explore the entire dataset from Oracc with texts tagged as ‘Akkadian’. We provide a network that includes all Akkadian words, and another that excludes proper nouns.

Number of words in all Akkadian texts on Oracc: 2,068,030
(Number of nodes: 12,683)
(Number of nodes: 7,484)

2nd and 1st Millennia

2nd Millennium

Number of words in 2nd Millennium texts on Oracc: 260,251
(Number of nodes: 3,871)
(Number of nodes: 1,868)
(Number of nodes: 724)

1st Millennium

Number of words in 1st Millennium texts on Oracc: 1,789,754
(Number of nodes: 14,233)
(Number of nodes: 6,544)
(Number of nodes: 3,905)

Neo-Assyrian

Number of words in Neo-Assyrian texts on Oracc: 1,228,320
(Number of nodes: 8,780)
(Number of nodes: 4,288)
(Number of nodes: 1,665)

Assyrian texts

Number of words in Neo-Assyrian texts written in Assyrian on Oracc: 439,614
(Number of nodes: 6,540)
(Number of nodes: 2,527)
(Number of nodes: 1,456)

Standard Babylonian

Number of words in Neo-Assyrian texts written in Standard Babylonian on Oracc: 250,648
(Number of nodes: 3,884)
(Number of nodes: 3,046)
(Number of nodes: 366)

A short note on data

The data used for the graphs has been downloaded as JSON files from Open Richly Annotated Cuneiform Corpus (Oracc) in June 2021. For the analysis we used a dataset consisting of 7,346 texts that have in Oracc been tagged as having been written in “Akkadian.” These texts were written primarily in the Neo-Assyrian period (c. 930–612 BCE) in both Assyria and Babylonia, but earlier and later texts are also included. The texts belong to several genres, with royal inscriptions being the most prominent one in terms of word count.

We standardized the spellings of divine and place names and removed duplicate texts following the procedure explained in Alstola et al. (2019). We only used dictionary forms, as defined in Oracc (following Concise Dictionary of Akkadian), of content words—nouns, verbs, and adjectives—while all the other words have been replaced with an underscore character as a placeholder. Since neither the cuneiform script nor the Oracc metadata indicates sentence endings, the text of each document is handled as one continuous line of text.

From all the lexemes in our dataset, we chose all those that appear at least 5 times. We then used PMI to produce lists of the most semantically similar words to each of these 4930 lexemes. These lists were then visualized with Gephi.

To produce the networks based on different time periods, we relied upon the metadata provided by Oracc. Those texts tagged with names of periods from the 2nd millennium were collated together, and the same for those from the 1st millennium. These corpora became the basis for the diachronic analysis. For the Neo-Assyrian data, we collected the texts tagged as ‘Neo-Assyrian’, and then within this corpus divided the texts according to the metadata tags ‘Assyrian’ and ‘Standard Babylonian’.

Please note that the lexical portal is diachronically flat. Even our larger networks based on the 2nd or 1st Millennia, and the Neo-Assyrian period, will not reflect changes that happened within these time periods. In all instances, and in order to provide as much data as possible, we have relied on the metadata labels provided by Oracc. If you are interested in a particular set of data, we recommend using the (Jauhiainen et al 2019) or downloading the full dataset on which these graphs are based and creating more specific networks from that data (Jauhiainen et al 2021).

Annotated bibliography

We have used these approaches in several articles. Selected publications are listed below, most with links to full-text articles.

Tero Alstola, Heidi Jauhiainen, Saana Svärd, Aleksi Sahala, and Krister Lindén. 2023. “Digital Approaches to Analyzing and Translating Emotion: What Is Love?” In Karen Sonik and Ulrike Steinert (eds.), The Routledge Handbook of Emotions in the Ancient Near East. London: Routledge, pp. 88-116.

Saana Svärd, Tero Alstola, Heidi Jauhiainen, Aleksi Sahala, and Krister Lindén. 2021. “.” In Shih-Wei Hsu and Jaume Llop-Raduà (eds.) The Expression of Emotions in Ancient Egypt and Mesopotamia. Culture and History of the Ancient Near East. Leiden: Brill, pp. 470-502. DOI:

Contains detailed information on how lexical networks can be created. Freely available behind DOI.

Tero Alstola, Shana Zaia, Aleksi Sahala, Heidi Jauhiainen, Saana Svärd, and Krister Lindén. 2019. “Aššur and his Friends: A Statistical Analysis of Neo-Assyrian Texts.” Journal of Cuneiform Studies 71: 159–80. Downloadable at .

Saana Svärd, Heidi Jauhiainen, Aleksi Sahala, Krister Lindén 2018 "". In Vanessa Juloux, Amy Gansell, & Alessandro di Ludovico, (eds.) CyberResearch on the Ancient Near East and Neighboring Regions: Case Studies on Archaeological Data, Objects, Texts, and Digital Archiving. Digital Biblical Studies 2. Brill: Leiden, pp 224-256. DOI:

Proof-of-concept article. The first article where our language technological methods were tested with Akkadian texts. Freely available behind DOI.

Datasets and scripts

In this section you can find more detailed information regarding the datasets and scripts used to create the Lexical Portal.

Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Saana Svärd, and Krister Lindén. 2022. ANEE Lexical Networks v. 2.0 - the dataset. Zenodo. .

The repository contains the full dataset and the scripts used to create the networks and show them online.

Aleksi Sahala, Tero Alstola, and Heidi Jauhiainen. Open Richly Annotated Cuneiform Corpus, Korp Version, June 2021. Kielipankki. .

Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information on the search results. Downloading the query results is possible as well. On how to use it, see the . Click to access Oracc in Korp 2021. For all versions of Oracc in Korp, see .

Aleksi Sahala. 2019. Pmizer: A Tool for Calculating Word Association Measures. Github. .

The Pmizer tool was used to calculate PMI scores that are used as edge weights in the network.

Raphaël Velt. 2011. JavaScript GEXF Viewer for Gephi. Available at .

This code (needed to display the networks in the portal page) has been modified and improved by Sam Hardwick.

Acknowledgements to institutions and projects

The project “ANEE Lexical Networks” was part of the Semantic Domains project, which was funded by the Academy of Finland (decision number 298647), and hosted by the University of Helsinki from 2016 to 2020. After that it has been part of Team 1 of the Centre of Excellence Ancient Near Eastern Empires (ANEE), funded by the Academy of Finland (decision number 312051, 312052, 312053, 336673, 336674, and 336675), and hosted by the University of Helsinki.

Data and research infrastructures were made possible by , Language Bank of Finland.

Valuable help and feedback was offered by the other members of ANEE Team 1: Johannes Bach, Ellie Bennett, Céline Debourse; Kaisa Autere, Evelien Vanderstraeten, Julia Giessler, Mikko Luukko, Sebastian Fink, Gina Konstantopoulos, Lena Tambs, Repekka Uotila, Jonathan Valk, and Shana Zaia.

We are indebted to the work of many scholars who have tirelessly digitized thousands of texts and added them as sub-projects to Oracc. These sub-projects have only been made possible due to the funding graciously provided by many different funding bodies. We used data from June 2021, and due to the nature of these sub-projects, credits may change over time. We therefore recommend you click the links to the relevant sub-projects in order to view the current, full credits for all of these vital digitisation projects.

All projects included in Oracc in June 2021, with information correct as of August 2022:


Abbreviation	PI(s) name	Funding Body
ADsD	Reinhard Pirngruber	Austria Science Funds (FWF)
AkkLove	Nathan Wasserman
ARIo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
blms	Steve Tinney and Mark Geller	National Endowment for the Humanities and the Deutsches Forschungsgemeinschaft
amarna	Shlomo Izre'el
CAMS	Eleanor Robson
CASPo	Alan Lenzi
CKST	Niek Veldhuis
CTIJ		“Ancient Israel" (New Horizons) Research Program, Tel Aviv University and the “Greater Mesopotamia” Research Project, funded by the Belgian Science Policy Office - BELSPO in the framework of the Interuniversity Attraction Poles (IAP), KU Leuven. Tel Aviv University, the Office of the Research Dean.
DCCLT	Niek Veldhuis	Hellman Family Fund
DCCMT	Eleanor Robson	Early Career Fellowship from the University of Cambridge's Centre for Research in Arts, Social Sciences and Humanities
eCUT	Jamie Novotny, Karen Radner	Alexander von Humboldt Foundation (through the establishment of the Alexander von Humboldt Chair for Ancient History of the Near and Middle East) and Ludwig-Maximilians-Universität München (Historisches Seminar - Abteilung Alte Geschichte)
ETCRSI	Gábor Zólyomi	Hungarian Scientific Research Fund (OTKA).
Glass	Eduardo A. Escobar
HBTIN	Laurie Pearce	The France-Berkeley Fund (FBF), American Association of University Women, Unit 18 Professional Development Fund Grant, Humanities and Arts Research Technologies (HART)
Idrimi	Jacob Lauinger
LaOCOST	Ilan Peled
OBMC	Gabriella Spada
OBTA	Eleanor Robson
RIAo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
RIBo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
rimanum
RINAP	Grant Frame	National Endowment for the Humanities
SAAo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
Suhu	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation