StormCrawler 1.15 was released yesterday and as usual, contains loads of improvements and bugfixes.
You can find the full list of changes on https://github.com/DigitalPebble/storm-crawler/milestone/25?closed=1
We recommend that all users upgrade to this version as it contains very important fixes and performance improvements.
Dependency upgrades
Core
- /bugfix/ CharsetIdentification crashes on binary content (#747)
 - FetcherBolt skips tuples which have spent too much time in queues (#746)
 - Fetcher bolts generate metrics for HTTP status (#745)
 - improvements to URLFilterBolt (#740)
 - /bugfix/ FetcherBolt doesn't recover when entering maxNumberURLsInQueues (#738)
 - /bugfix/ RemoteDriverProtocol does not set user agent correctly (#735)
 - Force English Locale for SimpleDateFormat in cookie converter (#732)
 
LangID
- LangId normalises and returns value found via extraction (#733)
 
Elasticsearch
- Pluggable URLBuffer and Hybrid Elasticsearch spout (#752)
 - ES spouts control how long the search is allowed to take with timeout (#753)
 - Improve types used for numeric values for metrics mappings (#744)
 - Use sniffer for ES connections (#734)
 - ScrollSpout to quit logging when finished (#727)
 - ES spouts use nextFetchDate RangeQuery as a filter (#725)
 - MetricsConsumer takes an optional date format (#724)
 - StatusMetricsBolt returns a max of 10K results per status (#723)
 
Happy crawling and thanks to all contributors!