这是indexloc提供的服务,不要输入任何密码
Skip to content

Document how to do SLSA for ML and highlight gaps #978

@MarkLodato

Description

@MarkLodato

There have been some questions as to what "SLSA for ML" looks like. This issue attempts to give a short synopsis so that we can hopefully agree and turn that into durable documentation.

First, Machine Learning (ML) models fit the SLSA Model at a high level:

  • Any transformation process is a "build", such as data cleaning or model training.
  • Training data and input models are "dependencies".

This is not obvious to most readers, so we should document it.

Second, ML highlights some gaps or challenges in SLSA that are not really specific to ML but may be a higher priority or more painful for ML. They include:

  • ML training processes often use specialized ML hardware, highly distributed training jobs, and/or highly iterative notebooks like Colab. These may be more work to adapt to a verifiable build architecture where there is a trusted control plane that the tenant cannot influence. Alternatively, a reproducible build architecture may be challenging due to non-determinism.
  • Training data (considered "dependencies") is critical to the ML training process, yet:
    • Standards for identifying and labeling datasets are less mature than they are for conventional software.
    • SLSA Build track does not yet level setting a minimum completeness of dependencies (Workstream: SLSA Build L4 #977).
    • SLSA does not yet have a transitive concept to describe properties of the entire ML supply chain, beyond a single build step. This is important for conventional software but critical for ML, since ML supply chains are very deep.

All of these are surmountable, but it's worth documenting.

Any thoughts in agreement or disagreement? I'll try to update this top post with the consensus. If you have other challenges, I can add them as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions