Improving the effectiveness Lucene's BM25 (and testing it using community QA and ClueWeb* collections). Please, see my blog post for details. It works for early version of Lucene 6.x, e.g., 6.0. However, the later Lucene versions (I think starting from 6.3) changed internal API, so this similarity class will not work without changes. In addition to an input file (which can be gzipped or bzipped2), you have to specify the output directory to store a Lucene index. For community QA data you can specify the location of an output file to store TREC-style QREL files.