Colloquium 2023 Accelerated DataFrame with Dask-cuDF on multiple GPUs

From SHARCNETHelp
Jump to navigationJump to search

cuDF is a GPU DataFrame library in Python. It provides a Pandas-like API with accelerated performance for DataFrame operations on a single GPU. However, dealing with large datasets is limited by the memory available on a single GPU. Since Dask provides a framework for scalable computing, Dask-cuDF integrates cuDF with Dask to allow scaling a large DataFrame workload across multiple GPUs. This webinar introduces Dask-cuDF with demo examples on a multi-GPU node on the national clusters.