Apache Mahout

Open-source project to provide scalable machine learning (Chu et al., 2006; Owen et al., 2012)
Written in Java for the Apache Hadoop MapReduce platform
Some supported ML methods:
Supervised: NB, HMM, SVM, Logist. Reg., …
Unsupervised: k-means, hierarhichal, …

from Dr. Jochen L. Leidner– Data Science: A Compact Introduction

The Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.

Latest release version 0.12.2 has

Apache Mahout Samsara Environment includes
  • Distributed Algebraic optimizer
  • R-Like DSL Scala API
  • Linear algebra operations
  • Ops are extensions to Scala
  • IScala REPL based interactive shell
  • Integrates with compatible libraries like MLLib
  • Runs on distributed Spark, H2O, and Flink
  • fastutil to speed up sparse matrix and vector computations
  • Matrix to tsv conversions for integration with Apache Zeppelin
Apache Mahout Samsara Algorithms included
  • Stochastic Singular Value Decomposition (ssvd, dssvd)
  • Stochastic Principal Component Analysis (spca, dspca)
  • Distributed Cholesky QR (thinQR)
  • Distributed regularized Alternating Least Squares (dals)
  • Collaborative Filtering: Item and Row Similarity
  • Naive Bayes Classification
  • Distributed and in-core

Apache Mahout software provides three major features:

  • A simple and extensible programming environment and framework for building scalable algorithms
  • A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink
  • Samsara, a vector math experimentation environment with R-like syntax which works at scale

13 May 2017 – Apache Mahout website beta release

13 May 2017 – Apache Mahout website beta release

Docs available here

