types2 is a freely available corpus tool for comparing the frequencies of words, types, and hapax legomena across subcorpora. The tool uses accumulation curves and the statistical technique of permutation testing to compare the subcorpora with a “typical” corpus of a similar size, in order to visualize the frequencies and to identify statistically significant findings.
The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily. The sample data sets are provided by Tanja Säily. Please see the paper “types2: Exploring word-frequency differences in corpora” for more information on how to use the tool.
There is a new version of this tool available, see types3 on GitHub!
db/types.sqlite with your input data
bin/types-run to perform data analysis
bin/types-web to create the web user interface
web/index.html in web browser
git clone https://github.com/suomela/types.git git clone https://github.com/suomela/types-examples.git cd types ./config make mkdir db cp ../types-examples/bnc-input/db/types.sqlite db/types.sqlite bin/types-run --citer=100000 --piter=100000 bin/types-web open web/index.html