VepKar :: Open corpus of Veps and Karelian languages

About VepKar

Kard’alan kielen lyydin murreh

Welcome to VepKar — the Open corpus of Veps and Karelian languages containing dictionaries and corpora of the Baltic-Finnish languages of Karelia peoples.

The VepKar project is a continuation of the work on the Veps language corpus. The corpus of the Karelian language includes the Karelian Proper, Livvi-Karelian and Ludic Karelian dialects, which have newly created writing tradition (“младописьменный”, mladopis'mennyy type of languages).

The corpus website contains texts in Karelian and Veps languages, dictionaries and folklore collections. The Speech corpus contains texts with audio recordings. The VepKar User Guide (in Russian) will help you work in the corpus and make full use of the corpus search engine. The VepKar corpus data is the basis for growing resources such as the Baltic-Finnish Audio Map of Karelia, the Karelian Multimedia Dictionary (LiPaS – Livvin paginan sanat) and Ludic dialect lexicon.

The developed corpus manager is an open source project Dictorpus. Also the database, including dictionaries and texts (see the list of database dumps), have open license (CC-BY).

The name of the project "Dictorpus" indicates the union of the dictionary (DICTionary) and the corpus (cORPUS). The program Dictorpus is designed for teams of linguists working with the languages​ of the world. At the moment, the program supports and takes into account the features of Veps and Karelian languages.

See the publication list.

What is "the language corpus"

The corpus is an information and reference system based on the collection of texts in electronic form. This linguistic corpus includes texts and dictionaries stored in a database, and a computer program (corpus manager) for searching and processing data.

VepKar in numbers

The Open corpus of Veps and Karelian languages was opened on July 24, 2016. At the moment in the corpus:
69 339 articles
about words
6 751 texts on 53 dialects
2 162 327    words
ä