DigitalPebble's Blog
Friday, 8 March 2013

Free your Nutch crawls with pluggable indexers

›
I have just committed what should be a very important new feature of the next 1.x release of Apache Nutch , namely the possibility to imple...
3 comments:
Wednesday, 5 September 2012

Using Behemoth on the CommonCrawl dataset

›
Behemoth is an open-source platform for document processing based on Hadoop which provides an excellent way to process document collection...
4 comments:
Monday, 9 July 2012

Nutch 2.0 is out (at last!)

›
Like pretty much any 2.0 release, Nutch 2.0 marks a radical change from the 1.x branch. I've mentioned 2.0 in previous posts but let...
3 comments:
Wednesday, 13 June 2012

What's new in Nutch 1.5

›
Apache Nutch 1.5 has been released last week. As with each release, this one contains a lot of changes and I will just comment on a few of ...
1 comment:
Friday, 21 October 2011

Nutch hosting and monitoring

›
We now provide hosting and monitoring services for Apache Nutch . For a fixed price, we will set up, run and monitor your Nutch crawler an...
Monday, 26 September 2011

Visualising Nutch mailing-lists traffic

›
The graph below show the traffic on the Nutch dev and user mailing lists ( http://mail-archives.apache.org/mod_mbox/nutch-user/ and http://...
Wednesday, 6 July 2011

Crawler-Commons 0.1 released

›
As announced on various mailing-lists :  The initial release of crawler-commons is available from : http://code.google.com/p/ crawler-comm...
‹
›
Home
View web version
Powered by Blogger.