Book Name: PySpark Cookbook
Author: Denny Lee, Tomasz Drabas
Publisher: Packt Publishing
File format: PDF
PySpark Cookbook Pdf Book Description:
Apache Spark is a open source platform for effective cluster computing using a powerful interface to get data parallelism and fault tolerance. You’ll Begin by studying the Apache Spark structure and the way to set up a Python environment for Spark. You will then find knowledgeable about the modules offered in PySpark and get started using these effortlessly.
Along with this, you will find how to abstract data together with RDDs and DataFrames, and comprehend that the streaming capabilities of PySpark. You will then proceed to using ML and MLlib so as to fix any issues about the machine learning capacities of PySpark and utilize GraphFrames to fix graph-processing issues. In the end, you will explore the best way to deploy your software into the cloud with the spark-submit command.