Automata sizes

While automata sizes are dependent on the underlying enginery, some changes in dictionary structures and implementations can greatly affect the sizes and thus efficiency of the stuff. This page shows automated test results of that for omorfi releases.

Automated size tests

These are in ascending order of tiem since I » them to the end of the file.

2014-04-01 (manual tests)

../src/dictionary.default.hfst sizes

Feature Value
file size 52M
states 652713
arcs 2893005
average arcs per state 4.432277
average input epsilons per state 0.417595
average input ambiguity 1.095717
average output ambiguity 1.095717

../src/generation.ftb3.hfst sizes

Feature Value
file size 69M
states 652713
arcs 2893005
average arcs per state 4.432277
average input epsilons per state 0.326942
average input ambiguity 2.024982
average output ambiguity 1.095717

../src/lemmatize.default.hfst sizes

Feature Value
file size 52M
states 652713
arcs 2893005
average arcs per state 4.432277
average input epsilons per state 0.417595
average input ambiguity 1.095717
average output ambiguity 2.024982

../src/morphology.ftb3.hfst sizes

Feature Value
file size 88M
states 652713
arcs 2893005
average arcs per state 4.432277
average input epsilons per state 0.417595
average input ambiguity 1.095717
average output ambiguity 2.024982

2015-03-26 sizes

../src/generated/omorfi-omor.analyse.hfst

Feature Value
file size 27M
states 442094
arcs 955017
average arcs per state 2.160213
average input epsilons per state 0.336906
average input ambiguity 1.125475
average output ambiguity 1.087680

../src/generated/omorfi-omor.generate.hfst

Feature Value
file size 36M
states 442192
arcs 937498
average arcs per state 2.120115
average input epsilons per state 0.072903
average input ambiguity 1.069566
average output ambiguity 1.128215

../src/generated/omorfi-omor.lexc.hfst

Feature Value
file size 20M
states 441660
arcs 929726
average arcs per state 2.105072
average input epsilons per state 0.065209
average input ambiguity 1.064512
average output ambiguity 1.121208

../src/generated/omorfi-ftb3.analyse.hfst

Feature Value
file size 50M
states 431164
arcs 1692833
average arcs per state 3.926193
average input epsilons per state 0.541446
average input ambiguity 1.201040
average output ambiguity 1.342586

../src/generated/omorfi-ftb3.generate.hfst

Feature Value
file size 36M
states 452122
arcs 1005093
average arcs per state 2.223057
average input epsilons per state 0.046505
average input ambiguity 1.087878
average output ambiguity 1.100758

../src/generated/omorfi-ftb3.lexc.hfst

Feature Value
file size 20M
states 451709
arcs 935372
average arcs per state 2.070740
average input epsilons per state 0.038926
average input ambiguity 1.016209
average output ambiguity 1.060113

../src/generated/omorfi.accept.hfst

Feature Value
file size 31M
states 431164
arcs 1692833
average arcs per state 3.926193
average input epsilons per state 0.541446
average input ambiguity 1.201040
average output ambiguity 1.201040

../src/generated/omorfi.lemmatise.hfst

Feature Value
file size 50M
states 431164
arcs 1692833
average arcs per state 3.926193
average input epsilons per state 0.541446
average input ambiguity 1.201040
average output ambiguity 1.372894

../src/generated/omorfi.segment.hfst

Feature Value
file size 28M
states 394347
arcs 911855
average arcs per state 2.312316
average input epsilons per state 0.215173
average input ambiguity 1.011387
average output ambiguity 1.027478

../src/generated/omorfi.tokenise.hfst

Feature Value
file size 21K
states 18
arcs 1078
average arcs per state 59.888889
average input epsilons per state 0.888889
average input ambiguity 1.000000
average output ambiguity 1.001859

2015-09-04 sizes

../src/generated/omorfi-omor.analyse.hfst

Feature Value
file size 22M
states 442471
arcs 953519
average arcs per state 2.154986
average input epsilons per state 0.337575
average input ambiguity 1.125778
average output ambiguity 1.085401

../src/generated/omorfi-omor.generate.hfst

Feature Value
file size 22M
states 442552
arcs 936005
average arcs per state 2.115017
average input epsilons per state 0.068790
average input ambiguity 1.067308
average output ambiguity 1.128531

../src/generated/omorfi-omor.lexc.hfst

Feature Value
file size 20M
states 442028
arcs 930242
average arcs per state 2.104487
average input epsilons per state 0.064976
average input ambiguity 1.064522
average output ambiguity 1.121211

../src/generated/omorfi-ftb3.analyse.hfst

Feature Value
file size 35M
states 431151
arcs 1635723
average arcs per state 3.793852
average input epsilons per state 0.541439
average input ambiguity 1.192366
average output ambiguity 1.297094

../src/generated/omorfi-ftb3.generate.hfst

Feature Value
file size 23M
states 452112
arcs 972277
average arcs per state 2.150522
average input epsilons per state 0.042733
average input ambiguity 1.052337
average output ambiguity 1.085213

../src/generated/omorfi-ftb3.lexc.hfst

Feature Value
file size 20M
states 451705
arcs 935438
average arcs per state 2.070905
average input epsilons per state 0.038946
average input ambiguity 1.016213
average output ambiguity 1.060144

../src/generated/omorfi.accept.hfst

Feature Value
file size 185M
states 535827
arcs 8682596
average arcs per state 16.204103
average input epsilons per state 0.000000
average input ambiguity 1.103236
average output ambiguity 1.103236

../src/generated/omorfi.lemmatise.hfst

Feature Value
file size 35M
states 431151
arcs 1635723
average arcs per state 3.793852
average input epsilons per state 0.541439
average input ambiguity 1.192366
average output ambiguity 1.326331

../src/generated/omorfi.segment.hfst

Feature Value
file size 22M
states 394463
arcs 910218
average arcs per state 2.307486
average input epsilons per state 0.215300
average input ambiguity 1.011443
average output ambiguity 1.025282

../src/generated/omorfi.tokenise.hfst

Feature Value
file size 22K
states 18
arcs 1078
average arcs per state 59.888889
average input epsilons per state 0.888889
average input ambiguity 1.000000
average output ambiguity 1.001859