ANEE Lexical Networks v.2.0

This page introduces a new way of exploring lexical semantics in Akkadian.

Team 1 of the Centre of Excellence in Ancient Near Eastern Empires (ANEE) has created a lexical portal that functions as a graphic semantic dictionary. Via this portal the user can explore semantic networks for one (or multiple) words that one is interested in. By following the links, one can also trace attestations back to the dataset in Korp and from there to Open Richly Annotated Cuneiform Corpus (Oracc). This page gives a very short introduction to the portal and the research behind it. For more information on the methods, data and processes behind it, see our published works (listed below, with links to electronic offprints and datasets). This is the updated version of the previous “ANEE lexical portal of Akkadian: PMI”. To view the old version, please navigate to the “Archived Previous Versions” part of the page.

Help Page

A User Guide, videos, help, and FAQs are available for the ANEE Lexical Portal Help Page.

How to cite this portal

Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. URN: http://urn.fi/urn:nbn:fi:lb-2022100301.

Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. See esp. “[title of the network]”. URN: http://urn.fi/urn:nbn:fi:lb-2022100301.

Example: To cite the English version of the Neo-Assyrian network in Assyrian, it should be cited as: Aleksi Sahala, Heidi Jauhiainen, Tero Alstola, Sam Hardwick, Ellie Bennett, Tommi Jauhiainen, Krister Lindén and Saana Svärd “ANEE Lexical Networks v.2.0.”. See esp. “Neo-Assyrian texts in Assyrian with all Akkadian words (in English)”. URN: http://urn.fi/urn:nbn:fi:lb-2022100301.

If you are interested in the data used to create these networks, you can access them using this URN. It will take you to the META-SHARE record, which will in turn take you to the Zenodo repository that holds the data. If you do use this data for future work, please cite it as the following:

Sahala, Aleksi, Jauhiainen, Heidi, Alstola, Tero, Hardwick, Sam, Bennett, Ellie, Jauhiainen, Tommi, Svärd, Saana, & Lindén, Krister. (2022). ANEE Lexical Networks v. 2.0 - the dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7124352

What is “ANEE Lexical Networks”?

The approach we adapted has its roots in the classic work of Jost Trier and the Saussurean distinction that is commonly made between syntagmatic and paradigmatic relations in the meaning of words. Semantically, there is a paradigmatic connection between words that belong to the same general category. For example, in English, the concept “chair” belongs to the semantic domain “furniture,” together with “tables” and “beds.” At the same time, there is a syntagmatic semantic connection between words that co-occur frequently (e.g., “pitch black”). For example, the word “chair” appears in many different contexts, with differing connotations. In the domain HOME the word “chair” could associate syntagmatically with words like “comfort” and “family,” as for example in the sentence: “A comfortable chair is important for the whole family.” At the same time, within the domain COMMERCE, “chair” can associate with “money,” “discount” or “store.” Therefore, in addition to “chair” having paradigmatic connections to “table” and “bed,” it belongs to a multitude of syntagmatic semantic categories.

We have used the methods of language technology to trace paradigmatic and syntagmatic relationships in a large corpus of the Akkadian language. The method called Pointwise Mutual Information (PMI) is able to capture the nuances of syntagmatic relations. PMI detects words that co-occur frequently in the dataset. To continue with the simple example sentence: “A comfortable chair is important for the whole family,” PMI can calculate co-occurrence probabilities for words that occur close to “chair” (eg “comfortable” or “family”). These probabilities attest to syntagmatic relationships between lexemes. The visualization of lexemes and their relationships as networks have proved to be the most fruitful approach to analyzing semantic domains created by our methods.

This short introduction is naturally a much abbreviated and simplified description of our work, but we hope that the portals will provide tools for scholars to reflect on the semantic domains of the words. The research is ongoing and the lexical portals presented here are by no means the final result. We hope to get feedback from colleagues regarding the portals -- please don’t hesitate to get in touch with team leader Saana Svärd.

Networks available to you

We have created a suite of 35 Lexical Networks based on PMI calculations. They are divided into three categories: a network only of proper nouns; those where the language displayed in the network is Akkadian; and those where the language displayed is English.

In the Akkadian and English groups are three further subsections: 1) networks built upon all texts in Oracc that are tagged as ‘Akkadian’; 2) networks built upon data for the 1st and 2nd Millennia (as tagged in Oracc); 3) and networks based on Neo-Assyrian texts (as tagged in Oracc).

Under each of these subsections are three further versions of the network. The first includes all words, the second excludes proper nouns, and the third only shows proper nouns.

Network for Proper nouns

In this section is a network based upon the proper nouns found in all Akkadian texts.

Number of words in all Akkadian texts on Oracc: 2,068,030

All Akkadian texts with only proper nouns (Number of nodes: 3,625)

Networks in Akkadian

In this section are the networks where the words are presented in Akkadian. There are English translations in the networks, and you can find English versions of these networks on this webpage.

These networks explore the entire dataset from Oracc with texts tagged as ‘Akkadian’. We provide a network that includes all Akkadian words, and another that excludes proper nouns.

Number of words in all Akkadian texts on Oracc: 2,068,030

All Akkadian texts with all Akkadian words (in Akkadian) (Number of nodes: 12,683)

All Akkadian texts with no proper nouns (in Akkadian) (Number of nodes: 7,484)

In this section are networks that allow for some basic diachronic analysis between Akkadian texts in the 2nd and 1st Millennia. For each millennium, we provide a network that includes all Akkadian words, one that does not include proper nouns, and one that only shows proper nouns.

2nd Millennium

Number of words in 2nd Millennium texts on Oracc: 260,251

2nd millennium texts with all Akkadian words (in Akkadian) (Number of nodes: 3,871)

2nd millennium texts without proper nouns (in Akkadian) (Number of nodes: 1,868)

2nd millennium texts showing only proper nouns (in Akkadian) (Number of nodes: 724)

1st Millennium

Number of words in 1st Millennium texts on Oracc: 1,789,754

1st millennium texts with all Akkadian words (in Akkadian) (Number of nodes: 14,233)

1st millennium texts without proper nouns (in Akkadian) (Number of nodes: 6,544)

1st millennium texts showing only proper nouns (in Akkadian) (Number of nodes: 3,905)

In this section are networks that allow for some analysis between Neo-Assyrian texts written in two Akkadian dialects: Assyrian and Standard Babylonian. We also include networks that cover all Neo-Assyrian texts. For each dialect, we provide a network that includes all Akkadian words, one that does not include proper nouns, and one that only shows proper nouns.

Number of words in Neo-Assyrian texts on Oracc: 1,228,320

Neo-Assyrian texts with all Akkadian words (in Akkadian) (Number of nodes: 8,780)

Neo-Assyrian texts with without proper nouns (in Akkadian) (Number of nodes: 4,288)

Neo-Assyrian texts showing only proper nouns (in Akkadian) (Number of nodes: 1,665)

Assyrian texts

Number of words in Neo-Assyrian texts written in Assyrian on Oracc: 439,614

Neo-Assyrian texts in Assyrian with all Akkadian words (in Akkadian) (Number of nodes: 6,540)

Neo-Assyrian texts in Assyrian without proper nouns (in Akkadian) (Number of nodes: 2,527)

Neo-Assyrian texts in Assyrian showing only proper nouns (in Akkadian) (Number of nodes: 1,456)

Standard Babylonian

Number of words in Neo-Assyrian texts written in Standard Babylonian on Oracc: 250,648

Neo-Assyrian texts in Standard Babylonian with all Akkadian words (in Akkadian) (Number of nodes: 3,884)

Neo-Assyrian texts in Standard Babylonian without proper nouns (in Akkadian) (Number of nodes: 3,046)

Neo-Assyrian texts in Standard Babylonian showing only proper nouns (in Akkadian) (Number of nodes: 366)

Networks in English

In this section are the networks where the words are presented in English. The graphs are exactly the same as the Akkadian versions, except we provide the English translation for the words in place of the Akkadian. You can still search for the Akkadian term in these networks as well.

These networks explore the entire dataset from Oracc with texts tagged as ‘Akkadian’. We provide a network that includes all Akkadian words, and another that excludes proper nouns.

Number of words in all Akkadian texts on Oracc: 2,068,030

All Akkadian texts with all Akkadian words (in English) (Number of nodes: 12,683)

All Akkadian texts with no proper nouns (in English) (Number of nodes: 7,484)

A short note on data

The data used for the graphs has been downloaded as JSON files from Open Richly Annotated Cuneiform Corpus (Oracc) in June 2021. For the analysis we used a dataset consisting of 7,346 texts that have in Oracc been tagged as having been written in “Akkadian.” These texts were written primarily in the Neo-Assyrian period (c. 930–612 BCE) in both Assyria and Babylonia, but earlier and later texts are also included. The texts belong to several genres, with royal inscriptions being the most prominent one in terms of word count.

We standardized the spellings of divine and place names and removed duplicate texts following the procedure explained in Alstola et al. (2019). We only used dictionary forms, as defined in Oracc (following Concise Dictionary of Akkadian), of content words—nouns, verbs, and adjectives—while all the other words have been replaced with an underscore character as a placeholder. Since neither the cuneiform script nor the Oracc metadata indicates sentence endings, the text of each document is handled as one continuous line of text.

From all the lexemes in our dataset, we chose all those that appear at least 5 times. We then used PMI to produce lists of the most semantically similar words to each of these 4930 lexemes. These lists were then visualized with Gephi.

To produce the networks based on different time periods, we relied upon the metadata provided by Oracc. Those texts tagged with names of periods from the 2nd millennium were collated together, and the same for those from the 1st millennium. These corpora became the basis for the diachronic analysis. For the Neo-Assyrian data, we collected the texts tagged as ‘Neo-Assyrian’, and then within this corpus divided the texts according to the metadata tags ‘Assyrian’ and ‘Standard Babylonian’.

Please note that the lexical portal is diachronically flat. Even our larger networks based on the 2nd or 1st Millennia, and the Neo-Assyrian period, will not reflect changes that happened within these time periods. In all instances, and in order to provide as much data as possible, we have relied on the metadata labels provided by Oracc. If you are interested in a particular set of data, we recommend using the Korp interface (Jauhiainen et al 2019) or downloading the full dataset on which these graphs are based and creating more specific networks from that data (Jauhiainen et al 2021).

Annotated bibliography

We have used these approaches in several articles. Selected publications are listed below, most with links to full-text articles.

Tero Alstola, Heidi Jauhiainen, Saana Svärd, Aleksi Sahala, and Krister Lindén. 2023. “Digital Approaches to Analyzing and Translating Emotion: What Is Love?” In Karen Sonik and Ulrike Steinert (eds.), The Routledge Handbook of Emotions in the Ancient Near East. London: Routledge, pp. 88-116.

Saana Svärd, Tero Alstola, Heidi Jauhiainen, Aleksi Sahala, and Krister Lindén. 2021. “Fear in Akkadian Texts: New Digital Perspectives on Lexical Semantics.” In Shih-Wei Hsu and Jaume Llop-Raduà (eds.) The Expression of Emotions in Ancient Egypt and Mesopotamia. Culture and History of the Ancient Near East. Leiden: Brill, pp. 470-502. DOI: https://doi.org/10.1163/9789004430761_019

The most recent published article. It contains detailed information on how lexical networks can be created. Freely available behind DOI.

Tero Alstola, Shana Zaia, Aleksi Sahala, Heidi Jauhiainen, Saana Svärd, and Krister Lindén. 2019. “Aššur and his Friends: A Statistical Analysis of Neo-Assyrian Texts.” Journal of Cuneiform Studies 71: 159–80. Downloadable at http://hdl.handle.net/10138/303986.

Saana Svärd, Heidi Jauhiainen, Aleksi Sahala, Krister Lindén 2018 "Semantic Domains in Akkadian Texts". In Vanessa Juloux, Amy Gansell, & Alessandro di Ludovico, (eds.) CyberResearch on the Ancient Near East and Neighboring Regions: Case Studies on Archaeological Data, Objects, Texts, and Digital Archiving. Digital Biblical Studies 2. Brill: Leiden, pp 224-256. DOI: https://doi.org/10.1163/9789004375086_009

Proof-of-concept article. The first article where our language technological methods were tested with Akkadian texts. Freely available behind DOI.

Datasets and scripts

In this section you can find more detailed information regarding the datasets and the coding scripts used in the creation of the Lexical Portal.

Heidi Jauhiainen, Aleksi Sahala, Tero Alstola, Saana Svärd, & Krister Lindén. (2021). ANEE Lexical Portal - the dataset [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4646662.

DOI leads to the Zenodo-repository where the full dataset for Lexical portal can be found. This includes data for PMI as well as another probabilistic-based method called fastText.

Heidi Jauhiainen, Aleksi Sahala & Tero Alstola 2019: Open Richly Annotated Cuneiform Corpus, Korp Version, May 2019 [text corpus]. Kielipankki. Retrieved from http://urn.fi/urn:nbn:fi:lb-2019060601

Open Richly Annotated Cuneiform Corpus (Oracc) brings together the work of several Assyriological projects to publish online editions of cuneiform texts. The Korp version of Oracc allows extensive searches on the texts and presents the results as a KWIC concordance list. Korp also offers statistical information and comparison of the search results. Downloading the query results is possible as well. On how to use it, follow the URN to metadata and see “ORACC in Korp User Guide” (under Heading “Documentation”). The URN for the actual Oracc in Korp data. For all versions of Oracc in Korp data, see https://www.kielipankki.fi/corpora/oracc/

Aleksi Sahala 2019. Pmizer: A Tool for Calculating Word Association Measures. Github. https://github.com/asahala/Pmizer.

“JavaScript GEXF Viewer for Gephi.” Found at https://github.com/raphv/gexf-js.

This code (needed to display the networks in the portal page) has been modified and improved by Sam Hardwick.

Acknowledgements to institutions and projects

The project “ANEE Lexical Networks” was part of the Semantic Domains project, which was funded by the Academy of Finland (decision number 298647), and hosted by the University of Helsinki from 2016 to 2020. After that it has been part of Team 1 of the Centre of Excellence Ancient Near Eastern Empires (ANEE), funded by the Academy of Finland (decision number 312051, 312052, 312053, 336673, 336674, and 336675), and hosted by the University of Helsinki.

Data and research infrastructures were made possible by FIN-CLARIN, Language Bank of Finland.

Valuable help and feedback was offered by the other members of ANEE Team 1: Johannes Bach, Ellie Bennett, Céline Debourse; Kaisa Autere, Evelien Vanderstraeten, Julia Giessler, Mikko Luukko, Sebastian Fink, Gina Konstantopoulos, Lena Tambs, Repekka Uotila, Jonathan Valk, and Shana Zaia.

We are indebted to the work of many scholars who have tirelessly digitized thousands of texts and added them as sub-projects to Oracc. These sub-projects have only been made possible due to the funding graciously provided by many different funding bodies. We used data from June 2021, and due to the nature of these sub-projects, credits may change over time. We therefore recommend you click the links to the relevant sub-projects in order to view the current, full credits for all of these vital digitisation projects.

All projects included in Oracc in June 2021, with information correct as of August 2022:


Oracc Sub-project name	Abbreviation	PI(s) name	Funding Body
Astronomical Diaries Digital	ADsD	Reinhard Pirngruber	Austria Science Funds (FWF)
Akkadian Love Literature	AkkLove	Nathan Wasserman
Achaemenid Royal Inscriptions online	ARIo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
Bilinguals in Late Mesopotamian Scholarship	blms	Steve Tinney and Mark Geller	National Endowment for the Humanities and the Deutsches Forschungsgemeinschaft
Contributions Amarna	amarna	Shlomo Izre'el
Corpus of Ancient Mesopotamian Scholarship	CAMS	Eleanor Robson
Corpus of Akkadian Shuila-prayers online	CASPo	Alan Lenzi
Corpus of Kassite Sumerian texts	CKST	Niek Veldhuis
Cuneiform texts mentioning Israelites, Judeans, and other related groups	CTIJ		“Ancient Israel" (New Horizons) Research Program, Tel Aviv University and the “Greater Mesopotamia” Research Project, funded by the Belgian Science Policy Office - BELSPO in the framework of the Interuniversity Attraction Poles (IAP), KU Leuven. Tel Aviv University, the Office of the Research Dean.
Digital Corpus of Cuneiform Lexical Texts	DCCLT	Niek Veldhuis	Hellman Family Fund
Digital Corpus of Cuneiform Mathematical Texts	DCCMT	Eleanor Robson	Early Career Fellowship from the University of Cambridge's Centre for Research in Arts, Social Sciences and Humanities
Electronic Corpus of Urartian Texts	eCUT	Jamie Novotny, Karen Radner	Alexander von Humboldt Foundation (through the establishment of the Alexander von Humboldt Chair for Ancient History of the Near and Middle East) and Ludwig-Maximilians-Universität München (Historisches Seminar - Abteilung Alte Geschichte)
Electronic Text Corpus of Sumerian Royal Inscriptions	ETCRSI	Gábor Zólyomi	Hungarian Scientific Research Fund (OTKA).
Corpus of Glass Technological Texts	Glass	Eduardo A. Escobar
Hellenistic Babylonia: Texts, Iconography, Names	HBTIN	Laurie Pearce	The France-Berkeley Fund (FBF), American Association of University Women, Unit 18 Professional Development Fund Grant, Humanities and Arts Research Technologies (HART)
Idrimi: The Statue of Idrimi	Idrimi	Jacob Lauinger
Law and Order: Cuneiform online sustainable tool	LaOCOST	Ilan Peled
Old Babylonian Model Contracts	OBMC	Gabriella Spada
Old Babylonian Tabular Accounts	OBTA	Eleanor Robson
Royal Inscriptions of Assyria online	RIAo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
Royal Inscriptions of Babylonia online	RIBo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
Rīm-Anum: The House of Prisoners	rimanum
Royal Inscriptions of the Neo-Assyrian periods	RINAP	Grant Frame	National Endowment for the Humanities
State Archives of Assyria Online	SAAo	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation
Suhu: The Inscriptions of Suhu online	Suhu	Jamie Novotny, Karen Radner	LMU Munich and the Alexander von Humboldt Foundation