Thursday 19 September 2019

What's new in StormCrawler 1.15?

StormCrawler 1.15 was released yesterday and as usual, contains loads of improvements and bugfixes.


We recommend that all users upgrade to this version as it contains very important fixes and performance improvements.

Dependency upgrades

Core

  • /bugfix/ CharsetIdentification crashes on binary content (#747)
  • FetcherBolt skips tuples which have spent too much time in queues (#746)
  • Fetcher bolts generate metrics for HTTP status (#745)
  • improvements to URLFilterBolt (#740)
  • /bugfix/ FetcherBolt doesn't recover when entering maxNumberURLsInQueues (#738)
  • /bugfix/ RemoteDriverProtocol does not set user agent correctly (#735)
  • Force English Locale for SimpleDateFormat in cookie converter (#732)

LangID

  • LangId normalises and returns value found via extraction (#733)

Elasticsearch

  • Pluggable URLBuffer and Hybrid Elasticsearch spout (#752)
  • ES spouts control how long the search is allowed to take with timeout (#753)
  • Improve types used for numeric values for metrics mappings (#744)
  • Use sniffer for ES connections (#734)
  • ScrollSpout to quit logging when finished (#727)
  • ES spouts use nextFetchDate RangeQuery as a filter (#725)
  • MetricsConsumer takes an optional date format (#724)
  • StatusMetricsBolt returns a max of 10K results per status (#723)

Happy crawling and thanks to all contributors!