Jukka Suomela, 2016
types2 is a freely available corpus tool for comparing the frequencies of words, types, and hapax legomena across subcorpora. The tool uses accumulation curves and the statistical technique of permutation testing to compare the subcorpora with a “typical” corpus of a similar size, in order to visualize the frequencies and to identify statistically significant findings.
The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily. The sample data sets are provided by Tanja Säily. Please see the paper “types2: Exploring word-frequency differences in corpora” for more information on how to use the tool.
There is a new version of this tool available, see types3 on GitHub!
template/types.sqlite
to db/types.sqlite
db/types.sqlite
with your input databin/types-run
to perform data analysisbin/types-web
to create the web user interfaceweb/index.html
in web browsergit clone https://github.com/suomela/types.git git clone https://github.com/suomela/types-examples.git cd types ./config make mkdir db cp ../types-examples/bnc-input/db/types.sqlite db/types.sqlite bin/types-run --citer=100000 --piter=100000 bin/types-web open web/index.html