Data

BEDLAN is committed to making the data used for our analyses freely available to the academic community, so that our findings can replicated and extended.

Published

UraLex basic vocabulary dataset (v1.0)

UraLex is a dataset consisting of lexical reflexes of 313 meanings from 26 Uralic languages. Most of the meanings originate from standardized basic vocabulary lists. The lexical reflexes are accompanied by multistate characters that represent their historical relationships.

Cite the dataset as:

Syrjänen, Kaj, Lehtinen, Jyri, Vesakoski, Outi, de Heer, Mervi, Suutari, Toni, Dunn, Michael, Määttä, Urho, Leino, Unni-Päivä. (2018). lexibank/uralex: UraLex basic vocabulary dataset. DOI:10.5281/zenodo.1459402

Upcoming

In the future we intend to release the following datasets:

  • Digital maps of Uralic language speaker areas
  • Uralic language typological data set

We collect a typological dataset of the Uralic languages which is comprised of ca. 300 questions on phonology, morphology, and syntax. Ideally, all the 300 features are collected from all the Uralic languages as binary data. As a result of the project, we will have comparative data available on the Uralic languages, which we can use to achieve our aims set for the project, but that can also be used by typologists, Uralists, and others to advance Uralic studies.

We cooperate with the Grambank team who has developed questions to collect data from about half of the world’s languages (see more https://glottobank.org/). In fact, 150 questions with answers on Uralic languages we receive from them. The remaining 150 questions have been developed by us following principles used in Grambank. These 150 questions are in one way or other relevant for the Uralic languages, which enables us to get a better picture of the differences and similarities within the Uralic language family.

The collection of the linguistic data is coordinated by Miina Norvik (University of Tartu/University of Turku). In addition to Miina Norvik, the Uralic-specific data is collected by Minerva Piha (University of Turku) and Eva Saar (University of Tartu/University of Turku). Richard Kowalik (University of Stockholm) is responsible for collecting the Grambank data on the Uralic languages, which can later be used together with the Uralic specific questions.

The Grambank principles (including the procedure of coding) were introduced by Harald Hammarström, Michael Dunn, and Rogier Blokland from the University of Uppsala. Gerson Klumpp, Karl Pajusalu, and Helle Metslang from the University of Tartu participated in the process of developing the Uralic specific questions.

The collection of the typological dataset for the Uralic languages is part of the Kipot ja kielet (Pots and languages) project funded by the University of Turku. We are also thankful for the feedback provided by Jeremy Bradley (University of Vienna) and Ksenia Shagal (University of Helsinki).