Automata sizes
While automata sizes are dependent on the underlying enginery, some changes in dictionary structures and implementations can greatly affect the sizes and thus efficiency of the stuff. This page shows automated test results of that for omorfi releases.
Automated size tests
These are in ascending order of tiem since I » them to the end of the file.
2014-04-01 (manual tests)
../src/dictionary.default.hfst sizes
| Feature |
Value |
| file size |
52M |
| states |
652713 |
| arcs |
2893005 |
| average arcs per state |
4.432277 |
| average input epsilons per state |
0.417595 |
| average input ambiguity |
1.095717 |
| average output ambiguity |
1.095717 |
../src/generation.ftb3.hfst sizes
| Feature |
Value |
| file size |
69M |
| states |
652713 |
| arcs |
2893005 |
| average arcs per state |
4.432277 |
| average input epsilons per state |
0.326942 |
| average input ambiguity |
2.024982 |
| average output ambiguity |
1.095717 |
../src/lemmatize.default.hfst sizes
| Feature |
Value |
| file size |
52M |
| states |
652713 |
| arcs |
2893005 |
| average arcs per state |
4.432277 |
| average input epsilons per state |
0.417595 |
| average input ambiguity |
1.095717 |
| average output ambiguity |
2.024982 |
../src/morphology.ftb3.hfst sizes
| Feature |
Value |
| file size |
88M |
| states |
652713 |
| arcs |
2893005 |
| average arcs per state |
4.432277 |
| average input epsilons per state |
0.417595 |
| average input ambiguity |
1.095717 |
| average output ambiguity |
2.024982 |
2015-03-26 sizes
../src/generated/omorfi-omor.analyse.hfst
| Feature |
Value |
| file size |
27M |
| states |
442094 |
| arcs |
955017 |
| average arcs per state |
2.160213 |
| average input epsilons per state |
0.336906 |
| average input ambiguity |
1.125475 |
| average output ambiguity |
1.087680 |
../src/generated/omorfi-omor.generate.hfst
| Feature |
Value |
| file size |
36M |
| states |
442192 |
| arcs |
937498 |
| average arcs per state |
2.120115 |
| average input epsilons per state |
0.072903 |
| average input ambiguity |
1.069566 |
| average output ambiguity |
1.128215 |
../src/generated/omorfi-omor.lexc.hfst
| Feature |
Value |
| file size |
20M |
| states |
441660 |
| arcs |
929726 |
| average arcs per state |
2.105072 |
| average input epsilons per state |
0.065209 |
| average input ambiguity |
1.064512 |
| average output ambiguity |
1.121208 |
../src/generated/omorfi-ftb3.analyse.hfst
| Feature |
Value |
| file size |
50M |
| states |
431164 |
| arcs |
1692833 |
| average arcs per state |
3.926193 |
| average input epsilons per state |
0.541446 |
| average input ambiguity |
1.201040 |
| average output ambiguity |
1.342586 |
../src/generated/omorfi-ftb3.generate.hfst
| Feature |
Value |
| file size |
36M |
| states |
452122 |
| arcs |
1005093 |
| average arcs per state |
2.223057 |
| average input epsilons per state |
0.046505 |
| average input ambiguity |
1.087878 |
| average output ambiguity |
1.100758 |
../src/generated/omorfi-ftb3.lexc.hfst
| Feature |
Value |
| file size |
20M |
| states |
451709 |
| arcs |
935372 |
| average arcs per state |
2.070740 |
| average input epsilons per state |
0.038926 |
| average input ambiguity |
1.016209 |
| average output ambiguity |
1.060113 |
../src/generated/omorfi.accept.hfst
| Feature |
Value |
| file size |
31M |
| states |
431164 |
| arcs |
1692833 |
| average arcs per state |
3.926193 |
| average input epsilons per state |
0.541446 |
| average input ambiguity |
1.201040 |
| average output ambiguity |
1.201040 |
../src/generated/omorfi.lemmatise.hfst
| Feature |
Value |
| file size |
50M |
| states |
431164 |
| arcs |
1692833 |
| average arcs per state |
3.926193 |
| average input epsilons per state |
0.541446 |
| average input ambiguity |
1.201040 |
| average output ambiguity |
1.372894 |
../src/generated/omorfi.segment.hfst
| Feature |
Value |
| file size |
28M |
| states |
394347 |
| arcs |
911855 |
| average arcs per state |
2.312316 |
| average input epsilons per state |
0.215173 |
| average input ambiguity |
1.011387 |
| average output ambiguity |
1.027478 |
../src/generated/omorfi.tokenise.hfst
| Feature |
Value |
| file size |
21K |
| states |
18 |
| arcs |
1078 |
| average arcs per state |
59.888889 |
| average input epsilons per state |
0.888889 |
| average input ambiguity |
1.000000 |
| average output ambiguity |
1.001859 |
2015-09-04 sizes
../src/generated/omorfi-omor.analyse.hfst
| Feature |
Value |
| file size |
22M |
| states |
442471 |
| arcs |
953519 |
| average arcs per state |
2.154986 |
| average input epsilons per state |
0.337575 |
| average input ambiguity |
1.125778 |
| average output ambiguity |
1.085401 |
../src/generated/omorfi-omor.generate.hfst
| Feature |
Value |
| file size |
22M |
| states |
442552 |
| arcs |
936005 |
| average arcs per state |
2.115017 |
| average input epsilons per state |
0.068790 |
| average input ambiguity |
1.067308 |
| average output ambiguity |
1.128531 |
../src/generated/omorfi-omor.lexc.hfst
| Feature |
Value |
| file size |
20M |
| states |
442028 |
| arcs |
930242 |
| average arcs per state |
2.104487 |
| average input epsilons per state |
0.064976 |
| average input ambiguity |
1.064522 |
| average output ambiguity |
1.121211 |
../src/generated/omorfi-ftb3.analyse.hfst
| Feature |
Value |
| file size |
35M |
| states |
431151 |
| arcs |
1635723 |
| average arcs per state |
3.793852 |
| average input epsilons per state |
0.541439 |
| average input ambiguity |
1.192366 |
| average output ambiguity |
1.297094 |
../src/generated/omorfi-ftb3.generate.hfst
| Feature |
Value |
| file size |
23M |
| states |
452112 |
| arcs |
972277 |
| average arcs per state |
2.150522 |
| average input epsilons per state |
0.042733 |
| average input ambiguity |
1.052337 |
| average output ambiguity |
1.085213 |
../src/generated/omorfi-ftb3.lexc.hfst
| Feature |
Value |
| file size |
20M |
| states |
451705 |
| arcs |
935438 |
| average arcs per state |
2.070905 |
| average input epsilons per state |
0.038946 |
| average input ambiguity |
1.016213 |
| average output ambiguity |
1.060144 |
../src/generated/omorfi.accept.hfst
| Feature |
Value |
| file size |
185M |
| states |
535827 |
| arcs |
8682596 |
| average arcs per state |
16.204103 |
| average input epsilons per state |
0.000000 |
| average input ambiguity |
1.103236 |
| average output ambiguity |
1.103236 |
../src/generated/omorfi.lemmatise.hfst
| Feature |
Value |
| file size |
35M |
| states |
431151 |
| arcs |
1635723 |
| average arcs per state |
3.793852 |
| average input epsilons per state |
0.541439 |
| average input ambiguity |
1.192366 |
| average output ambiguity |
1.326331 |
../src/generated/omorfi.segment.hfst
| Feature |
Value |
| file size |
22M |
| states |
394463 |
| arcs |
910218 |
| average arcs per state |
2.307486 |
| average input epsilons per state |
0.215300 |
| average input ambiguity |
1.011443 |
| average output ambiguity |
1.025282 |
../src/generated/omorfi.tokenise.hfst
| Feature |
Value |
| file size |
22K |
| states |
18 |
| arcs |
1078 |
| average arcs per state |
59.888889 |
| average input epsilons per state |
0.888889 |
| average input ambiguity |
1.000000 |
| average output ambiguity |
1.001859 |