DigitalPebble's Blog: What's new in StormCrawler 2.2

StormCrawler 2.2 has just been released. This marks the beginning of having releases only for 2.x, 1.18 was the last release for the 1.x branch which is now discontinued. In case you were wondering why there was no "What's new in StormCrawler 2.1", it is simply that it contained the same modifications as 1.18 and did not get its own announcement.
This version contains many bugfixes, as usual, users are advised to upgrade to this version.
Happy crawling and thanks to our sponsors, contributors and users! PS: I am tempted to run a workshop on webcrawling with StormCrawler at the BigData conference in Vilnius in November. Anyone interested? If so please get in touch and let me know what you'd like to learn about. https://bigdataconference.eu/
Dependency upgrades

See individual upgrades in #914

Storm 2.3.0 #911
Log4j 2.17.0 #936

As of writing, Apache Storm has not released a version containing a fix for the Log4J vulnerability - CVE-2021-44228 (see discussion). It is however possible to patch a running version of Storm as explained by Sebastian.

Core

StackOverFlow issue in CharsetIdentification #895
OkHttp protocol: make connection pool configurable #918
Remove selenium.instances.num #933
Changed ProtocolFactory to be a singleton #932
Need to register Status class with Kryo #924
JSoupParserBolt cannot configure more than one JSoupFilters per worker #925
Remove static keyword on JSoupFilters field #927
Support HEAD method in okhttp protocol #923
Allow to set http.content.limit per page in metadata #922
OkHttp protocol: add support for Brotli compression (Content-Encoding) #919
Protocols: Integer.MAX_VALUE not save as max. content size #854
Protocols: adding support for custom headers #912
Replace Guava caches with Caffeine #903 and #905
DelegatorProtocol #900
Fixed bug with StackOverflowError in fast charset identification #895
Multi proxy support #890

Elasticsearch

ES Spout to connect to local shards when available #852
Issue with ConcurrentModificationException for Metadata in StatusMetricsBolt #909

Tuesday, 11 January 2022

What's new in StormCrawler 2.2

Core

Elasticsearch

No comments:

Post a Comment