Not as developer-friendly as DataSets, as there are no compile-time checks or domain object programming.Provides query optimization through Catalyst.Choose the data abstractionĮarlier Spark versions use RDDs to abstract data, Spark 1.3, and 1.6 introduced DataFrames and DataSets, respectively. The following sections describe common Spark job optimizations and recommendations. For the best performance, monitor and review long-running and resource-consuming Spark job executions. You can speed up jobs with appropriate caching, and by allowing for data skew. The most common challenge is memory pressure, because of improper configurations (particularly wrong-sized executors), long-running operations, and tasks that result in Cartesian operations. Learn how to optimize an Apache Spark cluster configuration for your particular workload.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |