Apache Mahout

Open-source project to provide scalable machine learning (Chu et al., 2006; Owen et al., 2012)
Written in Java for the Apache Hadoop MapReduce platform
Some supported ML methods:
Supervised: NB, HMM, SVM, Logist. Reg., …
Unsupervised: k-means, hierarhichal, …

from Dr. Jochen L. Leidner– Data Science: A Compact Introduction


The Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.

Latest release version 0.12.2 has

Apache Mahout Samsara Environment includes
  • Distributed Algebraic optimizer
  • R-Like DSL Scala API
  • Linear algebra operations
  • Ops are extensions to Scala
  • IScala REPL based interactive shell
  • Integrates with compatible libraries like MLLib
  • Runs on distributed Spark, H2O, and Flink
  • fastutil to speed up sparse matrix and vector computations
  • Matrix to tsv conversions for integration with Apache Zeppelin
Apache Mahout Samsara Algorithms included
  • Stochastic Singular Value Decomposition (ssvd, dssvd)
  • Stochastic Principal Component Analysis (spca, dspca)
  • Distributed Cholesky QR (thinQR)
  • Distributed regularized Alternating Least Squares (dals)
  • Collaborative Filtering: Item and Row Similarity
  • Naive Bayes Classification
  • Distributed and in-core

Apache Mahout software provides three major features:

  • A simple and extensible programming environment and framework for building scalable algorithms
  • A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink
  • Samsara, a vector math experimentation environment with R-like syntax which works at scale

Read an Overview of programming a Mahout Samsara Application, learnHow To Contribute to Mahoutreport an issue, bug, or suggestion in our JIRA, see the Samsara bindings for Scala and Spark, and contact us on our mailing lists.

13 May 2017 – Apache Mahout website beta release

Docs available here

You may also like...