Text Mining and Visualization: Case Studies Using Open-Source Tools
Markus Hofmann, Andrew Chisholm
ISBN 9781482237573 – CAT# K23176
Series: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
- Presents programming and visual workflow tools for state-of-the-art text mining
- Describes preprocessing techniques often used during text mining, such as tokenizing, stemming, and generation of n-grams
- Illustrates numerous data mining methods, including supervised learning, cross-validation, and unsupervised clustering
- Explains techniques for sentiment analysis and topic modeling
- Discusses temporal awareness in the context of the analysis of news stories
- Explores novel ways of applying network methods to text data gathered in the context of message websites and to an analysis of tag relationships within a popular user forum
- Provides all the examples for download on a supplementary website
Text Mining and Visualization: Case Studies Using Open-Source Tools provides an introduction to text mining using some of the most popular and powerful open-source tools: KNIME, RapidMiner, Weka, R, and Python.
The contributors—all highly experienced with text mining and open-source software—explain how text data are gathered and processed from a wide variety of sources, including books, server access logs, websites, social media sites, and message boards. Each chapter presents a case study that you can follow as part of a step-by-step, reproducible example. You can also easily apply and extend the techniques to other problems. All the examples are available on a supplementary website.
The book shows you how to exploit your text data, offering successful application examples and blueprints for you to tackle your text mining tasks and benefit from open and freely available tools. It gets you up to date on the latest and most powerful tools, the data mining process, and specific text mining activities.