Webinar 2019 Introduction to scalable computing with Dask in Python

From SHARCNETHelp
Jump to navigationJump to search

When programming in Python, some common libraries, such as Numpy, Pandas, Scikit-Learn, etc. usually work well if the dataset fits into the existing RAM on a single machine. However, when dealing with large datasets, it could be a challenge to work around the memory constraints. This is where Dask can help. Dask is a Python task-graph based framework to parallelize operations. Dask provides Python APIs or libraries that can handle large datasets on a single multi-core machine or on a cluster. This webinar provides an introduction to Dask.