This page provides an overview of the workflow for training and using your own machine learning (ML) models on Vertex AI. Vertex AI offers a spectrum of training methods designed to meet your needs, from fully automated to fully custom.
- AutoML: Build high-quality models with minimal technical effort by leveraging Google's automated ML capabilities.
- Vertex AI serverless training: Run your custom training code in a fully managed, on-demand environment without worrying about infrastructure.
- Vertex AI training clusters: Run large-scale, high-performance training jobs on a dedicated cluster of accelerators reserved for your exclusive use.
- Ray on Vertex AI: Scale Python applications and ML workloads using the open-source Ray framework on a managed service.
For help on deciding which of these methods to use, see Choose a training method.
AutoML
AutoML on Vertex AI lets you build a code-free ML model based on the training data that you provide. AutoML can automate tasks like data preparation, model selection, hyperparameter tuning, and deployment for various data types and prediction tasks, which can make ML more accessible for a wide range of users.
Types of models you can build using AutoML
The types of models you can build depend on the type of data that you have. Vertex AI offers AutoML solutions for the following data types and model objectives:
| Data type | Supported objectives |
|---|---|
| Image data | Classification, object detection. |
| Tabular data | Classification/regression, forecasting. |
To learn more about AutoML, see AutoML training overview.
Run custom training code on Vertex AI
If AutoML doesn't address your needs, you can provide your own training code and run it on Vertex AI's managed infrastructure. This gives you full control and flexibility over your model's architecture and training logic, letting you use any ML framework you choose.
Vertex AI provides two primary modes for running your custom training code: a serverless, on-demand environment, or a dedicated, reserved cluster.
Vertex AI serverless training
Serverless training is a fully managed service that lets you run your custom
training application without provisioning or managing any infrastructure.
You package your code in a container, define your machine specifications
(including CPUs and GPUs), and submit it as a CustomJob.
Vertex AI handles the rest:
- Provisioning the compute resources for the duration of your job.
- Executing your training code.
- Deleting the resources after the job completes.
This pay-per-use, on-demand model is ideal for experimentation, rapid prototyping, and for production jobs that don't require assured, instantaneous capacity.
To learn more, see Create a serverless training
training clusters
For large-scale, high-performance, and mission-critical training, you can reserve a dedicated cluster of accelerators. This provides assures capacity and eliminates queues, ensuring your jobs start immediately.
While you have exclusive use of these resources, Vertex AI still handles the operational overhead of managing the cluster, including hardware maintenance and OS patching. This "managed serverful" approach gives you the power of a dedicated cluster without the management complexity.
Ray on Vertex AI
Ray on Vertex AI is a service that lets you use the open-source Ray framework for scaling AI and Python applications directly within the Vertex AI platform. Ray is designed to provide the infrastructure for distributed computing and parallel processing for your ML workflow.
Ray on Vertex AI provides a managed environment for running distributed applications using the Ray framework, offering scalability and integration with Google Cloud services.
To learn more about Ray on Vertex AI see Ray on Vertex AI overview.