Monday 29 July 2013

Nutch training course

We are planning to run a 2-day training courses on Apache Nutch on the 24/25 October 2013. It will take place in Bristol, UK (the exact venue will be announced later). 

The course has been put on hold for now. Please do get in touch if you are interested and I will keep you updated as soon as we reach a sufficient number of attendees.

The course will cover pretty much everything about Nutch from installation and configuration to writing custom resources and will cover both Nutch 1.x and 2.x. The students will learn about best practices for running and managing a Nutch crawl. 

Attendees should have some knowledge of JAVA and be comfortable with command line tools to execute basic commands. Some understanding of Hadoop is a plus but not a strict requirement. The course will consist in some hands-on exercises : bring your laptop! Note that the demonstrations and exercises will be based on a Linux OS.

The program given here is an indication only and might change slightly. Feel free to suggest things that you'd like to learn during the course. 

Day 1 : NUTCH BASICS

  • Basic setup
  • Compilation and dependencies
  • Main concepts and operational steps
  • Nutch data structures
  • Parsing
  • Indexing
  • Scoring
  • Best practices for development and in production 

Day 2 : ADVANCED NUTCH

  • Plugin architecture
  • Politeness and performance
  • Metadata in Nutch
  • Advanced use cases
  • Introduction to Nutch 2.x

Please contact us on course@digitalpebble.com if you have a question or want to be kept informed of the next date for this course.