In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming.
Analyze company data quickly and easily using Microsoft’s powerful data tools. Learn to build scalable and robust data models, clean and combine different data sources effectively, and create compelling and professional visuals.
This collection represents the full spectrum of data-related content we’ve published on O’Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with “What is data science?” and from there we’ve pursued the various threads and themes that naturally emerged. Now, roughly a year later, we can look back over all we’ve covered and identify a number of core data areas:
A practical guide to data mining using SQL and Excel Data Analysis Using SQL and Excel, 2nd Edition shows you how to leverage the two most popular tools for data query and analysis―SQL and Excel―to perform sophisticated data analysis without the need for complex and expensive data mining tools. Written by a leading expert on business data mining, this book shows you how to extract useful business information from relational databases. You'll learn the fundamental techniques before moving into the "where" and "why" of each analysis, and then learn how to design and perform these analyses using SQL and Excel.
Build a custom BimlExpress framework that generates dozens of SQL Server Integration Services (SSIS) packages in minutes. Use this framework to execute related SSIS packages in a single command. You will learn to configure SSIS catalog projects, manage catalog deployments, and monitor SSIS catalog execution and history.
Gain expertise in processing and storing data by using advanced techniques with Apache Spark
About This Book
Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan
Evaluate how Cassandra and Hbase can be used for storage
An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities
Cookie giúp chúng tôi cung cấp các dịch vụ của mình. Đồng nghĩa với việc sử dụng được dịch vụ của chúng tôi, Bạn đồng ý với việc sử dụng cookie của chúng tôi ?