I have just released StormCrawler 1.8. As usual, here is a summary of the main changes:
Dependency updates
Dependency updates
Core
- Add option to send only N bytes of text to indexers #476
- BasicURLNormalizer to optionally convert IDN host names to ASCII/Punycode #522
- MemorySpout to generate tuples with DISCOVERED status #529
- OKHttp configure type of proxy #530
- http.content.limit inconsistent default to -1 #534
- Track time spent in the FetcherBolt queues #535
- Increase detect.charset.maxlength default value #537
- FeedParserBolt: metadata added by parse filters not passed forward in topology #541
- Use UTF-8 for input encoding of seeds (FileSpout) #542
- Default URL filter: exclude localhost and private address spaces #543
- URLStreamGrouping returns the taskIDs and not their index #547
WARC
- Upgrade WARC module to 1.1.0 version of storm-hdfs, fixes #520
SOLR
- Schema for status index needs date type for nextFetchDate #544
- SOLR indexer: use field type text for content field #545
Elasticsearch
As usual, thanks to all contributors and users. Happy crawling!
No comments:
Post a Comment
Note: only a member of this blog may post a comment.