StormCrawler 1.15 was released yesterday and as usual, contains loads of improvements and bugfixes.
You can find the full list of changes on https://github.com/DigitalPebble/storm-crawler/milestone/25?closed=1
We recommend that all users upgrade to this version as it contains very important fixes and performance improvements.
Dependency upgrades
Core
- /bugfix/ CharsetIdentification crashes on binary content (#747)
- FetcherBolt skips tuples which have spent too much time in queues (#746)
- Fetcher bolts generate metrics for HTTP status (#745)
- improvements to URLFilterBolt (#740)
- /bugfix/ FetcherBolt doesn't recover when entering maxNumberURLsInQueues (#738)
- /bugfix/ RemoteDriverProtocol does not set user agent correctly (#735)
- Force English Locale for SimpleDateFormat in cookie converter (#732)
LangID
- LangId normalises and returns value found via extraction (#733)
Elasticsearch
- Pluggable URLBuffer and Hybrid Elasticsearch spout (#752)
- ES spouts control how long the search is allowed to take with timeout (#753)
- Improve types used for numeric values for metrics mappings (#744)
- Use sniffer for ES connections (#734)
- ScrollSpout to quit logging when finished (#727)
- ES spouts use nextFetchDate RangeQuery as a filter (#725)
- MetricsConsumer takes an optional date format (#724)
- StatusMetricsBolt returns a max of 10K results per status (#723)
Happy crawling and thanks to all contributors!
No comments:
Post a Comment
Note: only a member of this blog may post a comment.