5-part series on introductory machine learning (Non technical)
This is an overview (with links) to a 5-part series on introductory machine learning. The set of tutorials is comprehensive, yet succinct, covering many important topics in the field (and beyond).
By Alex Castrounis | Product Leader and Technologist.
Machine learning is a very hot topic for many key reasons, and because it provides the ability to automatically obtain deep insights, recognize unknown patterns, and create high performing predictive models from data, all without requiring explicit programming instructions.
This is a summary (with links) to an article series that’s intended to be a comprehensive, in-depth guide to machine learning, and should be useful to everyone from business executives to machine learning practitioners. It covers virtually all aspects of machine learning (and many related fields) at a high level, and should serve as a sufficient introduction or reference to the terminology, concepts, tools, considerations, and techniques in the field.
The first chapter of the series starts with both a formal and informal definition of machine learning. This is followed by a discussion of the machine learning process end-to-end, the different types of machine learning, potential goals and outputs, and a categorized overview of the most widely used machine learning algorithms.
Chapter two starts with an introduction to the concept of model performance. The discussion then shifts to data selection, preprocessing, splitting, and the very interesting and critical topics of feature selectionand feature engineering. This is followed by a discussion of model selection and the associated tradeoffs, which is a key step since different models can be applied to solve the same problems, although some perform better than others.
Chapter three introduces the critical concepts of model variance, bias, and overfitting. This is followed by the related topic of model complexity and how to control it, which can have a large impact on overfitting or lack thereof. After, you’ll find a brief introduction to dimensionality reduction, and then a final discussion of model evaluation, performance, tuning, validation, ensemble learning, and resampling methods.
Chapter four is heavily focused on a deeper dive into model performanceand error analysis. Being able to determine the performance and errors associated with the model you’re using is paramount, as it helps determine if you’ve found a viable solution with acceptable tradoffs, or instead indicates that you need to make some changes. Possible changes include selecting different features and/or models, gathering more data, feature engineering, complexity reduction, leveraging ensemble methods, and so on.
Chapter five is the final chapter in the series, and gives in in-depth overview of unsupervised learning. It then discusses other fields that are highly related to machine learning, such as predictive analytics, artificial intelligence, statistical learning, and data mining. The post ends with a brief overview of machine learning as used in real world applications.
After reading the five posts in the series, you will have been thoroughly exposed to most key concepts and aspects of machine learning. In addition, you should be able to determine which areas interest you most, and thus guide further research.
Cheers, and I hope you enjoy your machine learning journey!
Bio: Alex Castrounis is a product leader, technologist, blogger, and former IndyCar/Indianapolis 500 vehicle dynamicist and race strategist. His interests and expertise include product design and management, enterprise/SaaS cloud-based software and data solutions, machine learning, artificial intelligence, and data science. Follow Alex on Twitter.