In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming.
Cloud computing is changing the way IT is delivered in enterprises around the world. The world’s leading open source cloud computing platform, Cloudstack, helps you implement a cloud computing service in your enterprise or set up an infrastructure as a service (IaaS) offering for your customers. With "Apache Cloudstack Cloud Computing", learn the leading open source cloud computing platform in an easy step-by-step approach, from understanding the basics of setting up an infrastructure as a service cloud to actual deployment scenarios and extensibility features of CloudStack.
If you're an R developer looking to harness the power of big data analytics with Hadoop, then this book tells you everything you need to integrate the two. You'll end up capable of building a data analytics engine with huge potential. Overview * Write Hadoop MapReduce within R * Learn data analytics with R and the Hadoop platform * Handle HDFS data within R * Understand Hadoop streaming with R * Encode and enrich datasets into R
This collection represents the full spectrum of data-related content we’ve published on O’Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with “What is data science?” and from there we’ve pursued the various threads and themes that naturally emerged. Now, roughly a year later, we can look back over all we’ve covered and identify a number of core data areas:
Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.
Gain expertise in processing and storing data by using advanced techniques with Apache Spark
About This Book
Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan
Evaluate how Cassandra and Hbase can be used for storage
An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities
Cookie giúp chúng tôi cung cấp các dịch vụ của mình. Đồng nghĩa với việc sử dụng được dịch vụ của chúng tôi, Bạn đồng ý với việc sử dụng cookie của chúng tôi ?