
Jukka Suomela, 2016
types2 is a freely available corpus tool for comparing the frequencies of words, types, and hapax legomena across subcorpora. The tool uses accumulation curves and the statistical technique of permutation testing to compare the subcorpora with a “typical” corpus of a similar size, in order to visualize the frequencies and to identify statistically significant findings.
The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily. The sample data sets are provided by Tanja Säily. Please see the paper “types2: Exploring word-frequency differences in corpora” for more information on how to use the tool.
There is a new version of this tool available, see types3 on GitHub!
template/types.sqlite to db/types.sqlitedb/types.sqlite with your input databin/types-run to perform data analysisbin/types-web to create the web user interfaceweb/index.html in web browser
git clone https://github.com/suomela/types.git
git clone https://github.com/suomela/types-examples.git
cd types
./config
make
mkdir db
cp ../types-examples/bnc-input/db/types.sqlite db/types.sqlite
bin/types-run --citer=100000 --piter=100000
bin/types-web
open web/index.html