Dependency upgrades
- OKHttp 3.10.0 #546
- JSoup 1.11.2 #552
- icu4j 61.1 #556
- Rometools 1.9.0 #556
- HTTPClient 4.5.5 #558
- Tika 1.18 #566
- Crawl-delay in robots.txt should optionally not shrink the configured delay #549
- Optimisation: faster extraction of META tags #553
- CollectionMetric synchronized access to List #555
- Configurable Robots Caches #557
- JSOUPParserBolt: lazy DOM conversion #563
- Purge internal queues of tuples which have already reached timeout #564
- Added ParseFilter to convert single valued Metadata to multi-valued ones #571
- Caching of redirected robots.txt may overwrite correct robots.txt rules, fixes #573
WARC
- WARCBolt to handle incorrect URIs gracefully #560
- WARCRecordFormat use ByteBuffer instead of ByteArrayOutputStream #561
Archetype
- Uses flux-core 1.2.1 #559
- Added FeedParser to archetype topology #551
- Added .kml and .wmv to url filters
SOLR
- MetricsConsumer handles recursive values #554
Elasticsearch
- MetricsConsumer handles recursive values #554
- ES Indexer and Deletion Bolts to get index name from constructor #572
LanguageID
- Added option to LanguageID to skip if metadata already set #570
As usual, we advise all users to move to this version as it fixes several bugs. Thanks to all contributors and users. Happy crawling!