Data

BEDLAN has created a data collection including multidisciplinary data from the Uralic language speaker area. It includes basic vocabulary with cognate (correlate) coding and loan word information (UraLex), typological data (UraTyp, which is part of Grambank), geospatial data of language speaker areas and interdisciplinary georeferred data (environment, archeology), as well as Finnish dialect data from 100 yrs ago, travel effort database (spatial data), archaeological artefact data and cultural and ecological data. These all form a collection called Uralic Trove, UraLaari. The last submitted version of the data collection description is here (with links to the published data).

BEDLAN is committed to making the data used for our analyses freely available to the academic community, so that our findings can replicated and extended. The Data page will be updated hopefully soon, but please check the links from the UraLaari ms.

Published

UraTyp, Uralic typological data

UraTyp is a collaboration with UTU, University of Tartu (UT) and Uppsala University.

The version 1.0 consists of 35 Uralic languages and a total of 360 features, mainly covering the levels of morphology, syntax, and phonology. The features belong to two different datasets: 195 features’ definitions originate from the Grambank (GB) database, developed for comparison of world language typology (to be released soon for word languages), whereas 165 features (UT) have been designed specifically to describe the typological variation within the Uralic language family.

UraTyp 2.0, coordinated by Miina Norvik, UT, will include examples, more languages and even data from Uralic neighbours (with focus on the North Eurasian languages also present in Grambank data).

The launching paper and documentation Norvik et al. 2022

Dataset in Zenodo

User interface Uralic Areal Typology Online

More about the background of the UraTyp and in Finnish.

UraLex basic vocabulary dataset (v2.0)

UraLex is a dataset consisting of lexical reflexes of 313 meanings from 26 Uralic languages. Most of the meanings originate from standardized basic vocabulary lists. The lexical reflexes are accompanied by multistate characters that represent their historical relationships.

The new 2.0 version of UraLex (released May 21, 2021) adds loanword information related to the lexical reflexes.

Digital dialect atlas of Finnish

Lauri Kettunen’s Dialect Atlas of Finnish from the 1940s includes 213 pages of dialectal features describing variation within the Finnish language. The atlas was originally digitized from its book format by the Finnish Dialect Atlas project, led by Sheila Embleton and Eric S. Wheeler and funded by the Social Sciences and Humanities Research Council of Canada. This data was checked for errors and converted into its current format by the BEDLAN research project. The atlas is available online as part of the Kotus Language Atlas. An easy-to-use version of the data will be published with our next paper (submitted revision, Santaharju et al. ). Until that:

Dialect atlas maps as pdf

Original dataset by KOTUS, York University and BEDLAN

Geographical database of the Uralic languages

Geographical database of the Uralic languages is a comprehensive spatial database of past and present language distributions. It consists of state-of-the-art Uralic maps and geospatial datasets.

The launching paper and documentation Rantanen et al. 2022

Dataset available in Zenodo

Uralic speaker areas in the web app Uralic Historical Atlas, URHIA