Contributing or helping
Introduction
Omorfi is a knowledge-driven analyser of Finnish language. A lot of manual labour is required to keep currently around 300,000 word entries in database up-to-date. Here are some ways to participate.
Wiktionary
Word data from Finnish Wiktionary is drawn to our database every once in a while. If the word(s) you want added to omorfi are in accordance to Finnish Wiktionary’s guidelines, you are best off adding it there first. Please ensure that you specify inflectional classification.
Send patches directly to databases
The word data in omorfi is currently stored in tsv format databases that you can modify directly using text editors or office calc apps. The basic tasks have also been automated with shell scripts. To add a word, use add-word.bash
and to modify existing word entry, use change-class.bash
. The obligatory arguments to the scripts are word’s lemma and the inflectional class. To edit these manually, the lemmas are in first field of the database and inflectional classes in the second. The word list is in one master database and optional attributes are collected in files in attributes
directory. If you do not have write permission to git, use git’s format-patch and/or send-mail capabilities to provide patch to the project.
Note that tsv databases obviously do not maintain data properly so you need to use make && make check
to ensure that results are consistent before sending patches.