The course will cover pretty much everything about Nutch from installation and configuration to writing custom resources and will cover both Nutch 1.x and 2.x. The students will learn about best practices for running and managing a Nutch crawl.
Attendees should have some knowledge of JAVA and be comfortable with command line tools to execute basic commands. Some understanding of Hadoop is a plus but not a strict requirement. The course will consist in some hands-on exercises : bring your laptop! Note that the demonstrations and exercises will be based on a Linux OS.
The program given here is an indication only and might change slightly. Feel free to suggest things that you'd like to learn during the course.
Day 1 : NUTCH BASICS
- Basic setup
- Compilation and dependencies
- Main concepts and operational steps
- Nutch data structures
- Parsing
- Indexing
- Scoring
- Best practices for development and in production
Day 2 : ADVANCED NUTCH
- Plugin architecture
- Politeness and performance
- Metadata in Nutch
- Advanced use cases
- Introduction to Nutch 2.x
Is it free ? Would it be able as a webinar?
ReplyDeleteThe price will be announced soon but it definitely won't be free. As for a webinar the answer is no (can you imagine a 2 day long webinar?) . Attendees will be given a copy of all the material (slides, code, etc...) at the end of the course
ReplyDeleteHi Julien,
ReplyDeleteGreat to see you pushing this on.
It may be worth also getting in touch with some of the IR teams @ University of Glasgow and Strathclyde? They are IR daft there and I'm sure Web Search is very much thier cup of tea.
Thanks for heads up on user@nutch
Best
Lewis
It's great to hold such an event. It would be greater if you could provide this to the wider community of users outside the UK. Say, to tape the class and sell it online. Unfortunately, there aren't much resources on Nutch on the Web, and the documentation isn't that helpful for non-experts.
ReplyDeleteHi Arian, I've been thinking about writing a book on Nutch but so far I am not convinced that it would be financially worth the time as the audience for it is quite small. What I might do however is to reuse the material of the course towards a book later or maybe a video indeed. We'll see if there is enough of an audience for the course first.
ReplyDeleteIs there any more updates on the course?
ReplyDeletePlease contact us on course@digitalpebble.com if you have a question or want to be kept informed of the next date for this course.
ReplyDelete