In this article, you will learn what cuML is, and how it can significantly speed up the training of machine learning models through GPU acceleration.

Topics we will cover include:

  • The aim and distinctive features of cuML.
  • How to prepare datasets and train a machine learning model for classification with cuML in a scikit-learn-like fashion.
  • How to easily compare results with an equivalent conventional scikit-learn model, in terms of classification accuracy and training time.

Let’s not waste any more time.

A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows

A Hands-On Introduction to cuML for GPU-Accelerated Machine Learning Workflows
Image by Editor | ChatGPT

Introduction

This article offers a hands-on Python introduction to cuML, a Python library from RAPIDS AI (an open-source suite within NVIDIA) for GPU-accelerated machine learning workflows across widely used models. In conjunction with its data science–oriented sibling, cuDF, cuML has gained popularity among practitioners who need scalable, production-ready machine learning solutions.

The hands-on tutorial below uses cuML together with cuDF for GPU-accelerated dataset management in a DataFrame format. For an introduction to cuDF, check out this related article.

About cuML: An “Accelerated Scikit-Learn”

RAPIDS cuML (short for CUDA Machine Learning) is an open-source library that accelerates scikit-learn–style machine learning on NVIDIA GPUs. It provides drop-in replacements for many popular algorithms, often reducing training and inference times on large datasets — without major code changes or a steep learning curve for those familiar with scikit-learn.

Among its three most distinctive features:

  • cuML follows a scikit-learn-like API, easing the transition from CPU to GPU for machine learning with minimal code changes
  • It covers a broad set of techniques — all GPU-accelerated — including regression, classification, ensemble methods, clustering, and dimensionality reduction
  • Through tight integration with the RAPIDS ecosystem, cuML works hand-in-hand with cuDF for data preprocessing, as well as with related libraries to facilitate end-to-end, GPU-native pipelines

Hands-On Introductory Example

To illustrate the basics of cuML for building GPU-accelerated machine learning models, we will consider a fairly large, yet easily accessible, dataset via public URL in Jason Brownlee’s repository: the adult income dataset. This is a large, slightly class-unbalanced dataset intended for binary classification tasks, namely predicting whether an adult’s income level is high (above $50K) or low (below $50K) based on a set of demographic and socio-economic features. Therefore, we aim to build a binary classification model.

IMPORTANT: To run the code below on Google Colab or a similar notebook environment, make sure you change the runtime type to GPU; otherwise, a warning will be raised indicating cuDF cannot find the specific CUDA driver library it utilizes.

We start by importing the necessary libraries for our scenario:

Note that, in addition to cuML modules and functions to split the dataset and train a logistic regression classifier, we have also imported their classical scikit-learn counterparts. While not mandatory for using cuML (as it works independently from plain scikit-learn), we are importing equivalent scikit-learn components for the sake of comparison in the rest of the example.

Next, we load the dataset into a cuDF dataframe optimized for GPU usage:

Once the data is loaded, we identify the target variable and convert it into binary (1 for high income, 0 for low income):

This dataset combines numeric features with a slight predominance of categorical ones. Most scikit-learn models — including decision trees and logistic regression — do not natively handle string-valued categorical features, so they require encoding. A similar pattern applies to cuML; hence, we will select a small number of features to train our classifier and one-hot encode the categorical ones.

So far, we have used cuML (and also cuDF) much like using classical scikit-learn along with Pandas.

Now comes the interesting part. We will split the dataset into training and test sets and train a logistic regression classifier twice, using both CUDA GPU (cuML) and standalone scikit-learn. We will then compare both the classification accuracy and the time taken to train each model. Here’s the complete code for the model training and comparison:

The results are quite interesting. They should look something like:

As we can observe, the model trained with cuML achieved very similar classification performance to its classical scikit-learn counterpart, but it trained over an order of magnitude faster: about 0.5 seconds compared to roughly 15 seconds for the scikit-learn classifier. Your exact numbers will vary with hardware, drivers, and library versions.

Wrapping Up

This article provided a gentle, hands-on introduction to the cuML library for enabling GPU-boosted construction of machine learning models for classification, regression, clustering, and more. Through a simple comparison, we showed how cuML can help build effective models with significantly enhanced training efficiency.