Sunday 12 June 2011

Nutch 1.3 released + BerlinBuzzwords presentation

Nutch 1.3 has been released and contains quite a few changes, some of which have been retrofitted from Nutch 2.0 in trunk.

The main modification is that Nutch now relies entirely on SOLR for indexing and searching and we removed our indexer based on Lucene as well as the search webapps (NUTCH-837). The dependencies are managed with Apache Ivy (NUTCH-821) and we've upgraded the versions of SOLR to 3.1 and Tika to 0.9. Another important change is that we have two separate runtime environments for local and deployed configurations (NUTCH-843). Nutch 1.3 contains a lot more improvements and bugfixes so if you use Nutch you should probably migrate to it.

The presentation I gave this week at BerlinBuzzwords is now available online and covered both 1.3 and 2.0, as well as an overview of Nutch. The conference itself was great and I met quite a few Nutch users and people who planned to use it as well as Doug Cutting, the creator of Nutch himself!

There are quite a few things planned for the next release(s) and also a large amount of work to do on the documentation which is a bit dated and patchy. Luckily some new committers have recently joined the project and seem keen to help with this.