April 21, 2015 by Chapman and Hall/CRC
Reference – 539 Pages – 79 B/W Illustrations
ISBN 9781482234817 – CAT# K23014
Series: Chapman & Hall/CRC The R Series
Explores how computing is done for a broad range of data science problems
Includes authentic real-world data analysis projects that tie concepts into a data science workflow and illustrate the everyday activities of data scientists across a spectrum of fields
Shows how to read and transform raw data, manipulate and visualize the resulting data, and use statistical techniques to solve a problem or understand relationships between variables
Describes the use of simulation to understand stochastic processes and model interesting situations
Covers various data technologies, including databases, visualization with KML, and scraping data from Web pages with HTTP requests and text processing
Effectively Access, Transform, Manipulate, Visualize, and Reason about Data and Computation
Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions.
The book’s collection of projects, comprehensive sample solutions, and follow-up exercises encompass practical topics pertaining to data processing, including:
Non-standard, complex data formats, such as robot logs and email messages
Text processing and regular expressions
Newer technologies, such as Web scraping, Web services, Keyhole Markup Language (KML), and Google Earth
Statistical methods, such as classification trees, k-nearest neighbors, and naïve Bayes
Visualization and exploratory data analysis
Relational databases and Structured Query Language (SQL)
Large data and efficiency
Suitable for self-study or as supplementary reading in a statistical computing course, the book enables instructors to incorporate interesting problems into their courses so that students gain valuable experience and data science skills. Students learn how to acquire and work with unstructured or semistructured data as well as how to narrow down and carefully frame the questions of interest about the data.
Blending computational details with statistical and data analysis concepts, this book provides readers with an understanding of how professional data scientists think about daily computational tasks. It will improve readers’ computational reasoning of real-world data analyses.
Deborah Nolan holds the Zaffaroni Family Chair in Undergraduate Education at the University of California, Berkeley. She is a fellow of the American Statistical Association and the Institute of Mathematical Statistics. Her research has involved the empirical process, high-dimensional modeling, and, more recently, technology in education and reproducible research.
Duncan Temple Lang is the director of the Data Science Initiative at the University of California, Davis. He has been involved in the development of R and S for 20 years and has developed over 100 R packages. His research focuses on statistical computing, data technologies, meta-computing, reproducibility, and visualisation.
DOWNLOAD Chapter 5 Strategies for Analyzing a 12-Gigabyte Data Set: Airline Flight Delays:Nolan-Temple Lang Sample Chapter 5