Tuesday 11 January 2022

What's new in StormCrawler 2.2

StormCrawler 2.2 has just been released. This marks the beginning of having releases only for 2.x, 1.18 was the last release for the 1.x branch which is now discontinued. In case you were wondering why there was no "What's new in StormCrawler 2.1", it is simply that it contained the same modifications as 1.18 and did not get its own announcement.

This version contains many bugfixes, as usual, users are advised to upgrade to this version.

Happy crawling and thanks to our sponsors, contributors and users! PS: I am tempted to run a workshop on webcrawling with StormCrawler at the BigData conference in Vilnius in November. Anyone interested? If so please get in touch and let me know what you'd like to learn about. https://bigdataconference.eu/

Dependency upgrades

See individual upgrades in #914

As of writing, Apache Storm has not released a version containing a fix for the Log4J vulnerability - CVE-2021-44228 (see discussion). It is however possible to patch a running version of Storm as explained by Sebastian.

Core

  • StackOverFlow issue in CharsetIdentification #895
  • OkHttp protocol: make connection pool configurable  #918
  • Remove selenium.instances.num #933
  • Changed ProtocolFactory to be a singleton #932
  • Need to register Status class with Kryo #924
  • JSoupParserBolt cannot configure more than one JSoupFilters per worker #925
  • Remove static keyword on JSoupFilters field  #927
  • Support HEAD method in okhttp protocol  #923
  • Allow to set http.content.limit per page in metadata #922
  • OkHttp protocol: add support for Brotli compression (Content-Encoding) #919
  • Protocols: Integer.MAX_VALUE not save as max. content size #854
  • Protocols: adding support for custom headers #912
  • Replace Guava caches with Caffeine #903 and #905
  • DelegatorProtocol  #900 
  • Fixed bug with StackOverflowError in fast charset identification  #895 
  • Multi proxy support  #890

Elasticsearch

  • ES Spout to connect to local shards when available #852
  • Issue with ConcurrentModificationException for Metadata in StatusMetricsBolt #909




No comments:

Post a Comment

Note: only a member of this blog may post a comment.