Dependencies updates
- jsoup 1.10.3
- crawler-commons 0.8
Core
- Use ISO representation of time for modifiedtime in adaptivescheduler #496
- Use ISO representation of time for discoveryDate and lastProcessedDate, #477
- Improved Charset Detection #495
- SitemapParserBolt configure use SAX or not
- SitemapParserBolt generates metrics for average processing time
- HTTP protocol based on OKHTTP #484
- Apache Http client can use HEAD method on a per URL basis #485
- ContentFilter to leave trace of the pattern that matched #480
- Metadata has a new public method for getting first non-empty value from a set of keys
- Added ARTICLE to patterns for content filter
LangID
- Can add more than one lang code based on configurable prob threshold. #481
WARC
- Added rotation policy based on time and filesize
ES
- ES: added es.status.reset.fetchdate.after #478
- Removed Grafana resources - can be downloaded from Grafana portal
No comments:
Post a Comment
Note: only a member of this blog may post a comment.