I will be running a full-day workshop on crawling with StormCrawler on the 24th April in Berlin. See full details on https://endoctus.com/course/web-crawling-with-stormcrawler.
Please find the program below:
Please find the program below:
In this workshop, we will explore StormCrawler a collection of resources for building low-latency, large scale web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what Storm-Crawler provides, we'll put it to use straight away for a simple crawl before moving on to the deployed mode of Storm.
In the second part of the session, we will then introduce metrics and index documents with Elasticsearch and Kibana and dive into data extraction. Finally, we'll cover recursive crawls and scalability. This course will be hands-on: attendees will run the code on their own machines.
This course will suit Java developers with an interest in big data, stream processing, web crawling and search. It will provide a practical introduction to both Apache Storm and Elasticsearch as well of course as StormCrawler and should not require advanced programming skills.
Duration : 2x3 hours
PS: Do you follow DigitalPebble or StormCrawler on Twitter? Announcements and updates are made there (as well as all sorts of interesting news of course!)