I will be running full-day workshops on crawling with StormCrawler. Please find the program below:
In this workshop, we will explore StormCrawler a collection of resources for building low-latency, large scale web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what Storm-Crawler provides, we'll put it to use straight away for a simple crawl before moving on to the deployed mode of Storm.
In the second part of the session, we will then introduce metrics and index documents with Elasticsearch and Kibana and dive into data extraction. Finally, we'll cover recursive crawls and scalability. This course will be hands-on: attendees will run the code on their own machines.
This course will suit Java developers with an interest in big data, stream processing, web crawling and search. It will provide a practical introduction to both Apache Storm and Elasticsearch as well of course as StormCrawler and should not require advanced programming skills.
Duration : 2x3 hours
The first workshop should be on the 2nd Feb in Berlin. I am planning to run a similar event in Bristol, UK in February or March.
The cost depends on the number of attendants. The platform for booking the Berlin one will be up shortly. Please let me know (email@example.com) if you are interested and I will keep you updated.