[tabular] Add initial callbacks support #4327

Innixma · 2024-07-18T00:24:01Z

Issue #, if available:

Description of changes:

Add initial callbacks support to TabularPredictor
Callbacks allow users to inject custom logic into the training process, and theoretically allow the user to completely override the training logic with their own custom logic.

Example Code:

import pandas as pd
from autogluon.tabular import TabularPredictor

from autogluon.core.callbacks import EarlyStoppingCallback


if __name__ == '__main__':
    label = 'class'
    train_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')

    callbacks = [EarlyStoppingCallback(patience=3)]
    predictor = TabularPredictor(label=label).fit(train_data=train_data, callbacks=callbacks)

Example Output:

...
User-specified callbacks (1): ['EarlyStoppingCallback']
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ...
	0.7736	 = Validation score   (accuracy)
	1.63s	 = Training   runtime
	0.02s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.7736 | Patience: 0/3 | Best Model: KNeighborsUnif (New Best)
Fitting model: KNeighborsDist ...
	0.7652	 = Validation score   (accuracy)
	0.2s	 = Training   runtime
	0.01s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.7736 | Patience: 1/3 | Best Model: KNeighborsUnif
Fitting model: LightGBMXT ...
	0.8792	 = Validation score   (accuracy)
	1.74s	 = Training   runtime
	0.0s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8792 | Patience: 0/3 | Best Model: LightGBMXT (New Best)
Fitting model: LightGBM ...
	0.8824	 = Validation score   (accuracy)
	1.37s	 = Training   runtime
	0.0s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8824 | Patience: 0/3 | Best Model: LightGBM (New Best)
Fitting model: RandomForestGini ...
	0.8612	 = Validation score   (accuracy)
	0.91s	 = Training   runtime
	0.08s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8824 | Patience: 1/3 | Best Model: LightGBM
Fitting model: RandomForestEntr ...
	0.8584	 = Validation score   (accuracy)
	1.0s	 = Training   runtime
	0.09s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8824 | Patience: 2/3 | Best Model: LightGBM
Fitting model: CatBoost ...
	0.8824	 = Validation score   (accuracy)
	6.89s	 = Training   runtime
	0.01s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8824 | Patience: 3/3 | Best Model: LightGBM
EarlyStoppingCallback: Early stopping trainer fit for level=1. Reason: No score_val improvement in the past 3 models.
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'CatBoost': 0.5, 'LightGBM': 0.45, 'KNeighborsUnif': 0.05}
	0.886	 = Validation score   (accuracy)
	0.06s	 = Training   runtime
	0.0s	 = Validation runtime
EarlyStoppingCallback: Best Score: 0.8860 | Patience: 0/3 | Best Model: WeightedEnsemble_L2 (New Best)
AutoGluon training complete, total runtime = 15.16s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 69200.6 rows/s (2500 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240730_021147")

TODO:

Add PR description
Add example callback
Add unit tests
Add docstrings
Add tutorial: Moved to a future PR, tracking issue: [tabular] Add tutorial + API pages for callbacks #4351

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2024-07-18T03:24:31Z

Job PR-4327-aeb82c1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/aeb82c1/index.html

github-actions · 2024-07-20T00:53:19Z

Job PR-4327-7c407de is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/7c407de/index.html

eddiebergman

Sorry for the unsolicited PR review, was just curious to see how callbacks get implemented!

eddiebergman · 2024-07-25T07:20:51Z

core/src/autogluon/core/callbacks/_abstract_callback.py

+        time_limit: float | None = None,
+        stack_name: str = "core",
+        level: int = 1,
+    ) -> Tuple[bool, bool]:


If you want, you can put from __future__ import annotations at the top of a file and can then use tuple[bool ,bool]

Nice! Updated

eddiebergman · 2024-07-25T07:29:49Z

core/src/autogluon/core/callbacks/_early_stopping_callback.py

+        return early_stop
+
+    def _calc_new_best(self, trainer: AbstractTrainer):
+        leaderboard = trainer.leaderboard()


Probably not critical but it seems this call to trainer.leaderboard() is not the cheapest, given it does a lot of dict parsing, dag processing and sorting.

https://github.com/Innixma/autogluon/blob/7c407de63254ff7f8a09c63be1adb495bd392229/core/src/autogluon/core/trainer/abstract_trainer.py#L3151

Alternatives:

You only seem to be using "score_val", you could get it directly: https://github.com/Innixma/autogluon/blob/7c407de63254ff7f8a09c63be1adb495bd392229/core/src/autogluon/core/trainer/abstract_trainer.py#L3169

I can't tell what's in the dict from just looking at the code but it seems like the best model can also be extracted directly from this:
https://github.com/Innixma/autogluon/blob/7c407de63254ff7f8a09c63be1adb495bd392229/core/src/autogluon/core/trainer/abstract_trainer.py#L3178-L3187

model, score = max(trainer.get_model_attributes("val_score").items(), key=lambda t: t[1]) # t = (model_name, score)

Good call! I actually needed to revamp this logic to make it work in cases where the user specified infer_limit to ensure that the model returned satisfies the infer_limit.

eddiebergman · 2024-07-25T07:31:29Z

core/src/autogluon/core/trainer/abstract_trainer.py


        # self._exceptions_list = []  # TODO: Keep exceptions list for debugging during benchmarking.

+        self.callbacks: List[callable] = []


Lowercase callable is actually not a type, it's a function. You need typing.Callable.

(...oh how I wish it was though, like list and dict are)

Good catch! I updated the PR and I created a GitHub issue to track replacement throughout the project: #4349

eddiebergman · 2024-07-25T07:31:45Z

core/src/autogluon/core/trainer/abstract_trainer.py

        level_time_modifier=0.333,
        infer_limit=None,
        infer_limit_batch_size=None,
+        callbacks: List[callable] = None,


Likewise callable

eddiebergman · 2024-07-25T07:42:23Z

core/src/autogluon/core/trainer/abstract_trainer.py

        if self.model_best is None and len(model_names_fit) != 0:
            self.model_best = self.get_model_best(can_infer=True, infer_limit=infer_limit, infer_limit_as_child=True)
-        self._time_limit = None
+        self._fit_cleanup()


Probably not worth the complexity if this is the only usage but the _fit_setup() and _fit_cleanup() can be removed in favor of a @contextmanager, i.e. with _fitting_contenxt(): , it can make the intent a bit clearer and there's one less completion your editor gives you

from contextlib import contextmanager class WhateverThisClassIs: @contextmanager def _fitting_context(self) -> Iterator[None]: # Previously `_fit_start()` self._time_train_start = time.time() self._time_train_start_last = self._time_train_start self._time_limit = time_limit # Might be able to just lift logic here and remove one # more method self.reset_callbacks() if callbacks is not None: assert isinstance(callbacks, list), f"`callbacks` must be a list. Found invalid type: `{type(callbacks)}`." else: callbacks = [] self.callbacks = callbacks yield # Previously `_fit_cleanup()` self._time_limit = None self._time_train_start = None # Likewise with lifting logic self.reset_callbacks()

This is an interesting idea. I think for now I will keep it as is, but this could be something worth adopting in future. I've made essentially zero use of yield in AutoGluon so far, mostly due to my unfamiliarity.

eddiebergman · 2024-07-25T07:43:59Z

core/src/autogluon/core/callbacks/_example_callback.py

+    ) -> Tuple[bool, bool]:
+        time_limit_trainer = trainer._time_limit
+        if time_limit_trainer is not None and trainer._time_train_start is not None:
+            time_left_total = time_limit_trainer - (time.time() - trainer._time_train_start)


Recently learnt that time.time() is not the best for measuring durations. Most systems are fine and it's rarely ever leading to a bug until it is one.

https://docs.python.org/3/library/time.html#time.monotonic

Nice find! #4342

eddiebergman · 2024-07-25T07:49:26Z

core/src/autogluon/core/trainer/abstract_trainer.py

+        if self._callback_early_stop:
+            return []


Worth considering if you want some overwrite ability for this, i.e. for testing.

Interesting idea, will probably keep as is for now and add overwrite logic if it becomes useful.

eddiebergman · 2024-07-25T08:01:03Z

core/src/autogluon/core/trainer/abstract_trainer.py

+    def _callbacks_after_fit(
+        self,
+        *,
+        model_names: List[str],
+        stack_name: str,
+        level: int,
+    ):
+        for callback in self.callbacks:
+            callback_early_stop = callback.after_fit(
+                self,
+                model_names=model_names,
+                logger=logger,
+                stack_name=stack_name,
+                level=level,
+            )
+            if callback_early_stop:
+                self._callback_early_stop = True


If there's no side effect to care about with the .after_fit(), i.e. you don't care about callbacks state after this call, then you could break early. Might not be important if callbacks are super cheap.

def _callbacks_after_fit( self, *, model_names: List[str], stack_name: str, level: int, ): for callback in self.callbacks: callback_early_stop = callback.after_fit( self, model_names=model_names, logger=logger, stack_name=stack_name, level=level, ) if callback_early_stop: self._callback_early_stop = True break

Or if you're feeling functional ;)

def _callbacks_after_fit( self, *, model_names: List[str], stack_name: str, level: int, ): should_stop_itr = ( callback.after_fit( self, model_names=model_names, logger=logger, stack_name=stack_name, level=level, ) for callback in self.callbacks ) self._callback_early_stop = any(should_stop_itr)

My thought process is that some callbacks could simply be logging related, and I wouldn't want a callback that early stops to prevent a logging callback from doing its logging.

The other thing is technically the later callbacks can check if trainer._callback_early_stop == True and skip their logic.

This is a good idea though, and I thought about it too while implementing. I've added the following self.skip_if_trainer_stopped parameter to callbacks. When true, the callback logic will be skipped when trainer._callback_early_stop == True. This should give the best of both worlds.

Sounds good! If you want to make it clearer to users, you could make a shallow class LoggingCallback, or conversely, EarlyStoppingCallback which just internally sets this flag, i.e. user never sets it. I use the pattern a good bit where the flag is actually set as a class variables, e.g. LoggingCallback sets skip = False
And EarlyStoppingCallback with skip = True

Gives you the freedom to later change this to a dict or things, or whatever, as long as LoggingCallback is never skipped and EarlyStoppimhCallback is. Also one less parameter for someone to think about

That is a great idea, updated the code, I think this is a good practice I should adopt more often to separate user-level args with developer level args (aka those that are only relevant to specify when subclassing)

Innixma · 2024-07-25T19:38:47Z

@eddiebergman unsolicited PR reviews are among my favorite kinds of code reviews :)

github-actions · 2024-07-26T00:01:18Z

Job PR-4327-7f9a6e6 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/7f9a6e6/index.html

github-actions · 2024-07-30T05:37:34Z

Job PR-4327-c3b7fbf is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/c3b7fbf/index.html

shchur · 2024-07-30T09:24:08Z

core/src/autogluon/core/callbacks/_abstract_callback.py

+        trainer: AbstractTrainer,
+        model: AbstractModel,
+        time_limit: float | None = None,
+        stack_name: str = "core",


Should this be a typing.Literal?

in this case no because stack_name can be any valid string.

shchur · 2024-07-30T09:38:42Z

tabular/src/autogluon/tabular/predictor/predictor.py

            cgroups). Otherwise, AutoGluon might wrongly assume more resources are available for fitting a model than the operating system allows,
            which can result in model training failing or being very inefficient.
+        callbacks : List[AbstractCallback], default = None
+            [Experimental] Callback support is preliminary, targeted towards developers, and is subject to change.


Nit: Should we use admonitions to highlight experimental features?

:::{warning} This is an experimental feature and may change in the future releases without warning. :::

Nice! Added

shchur · 2024-07-30T09:41:34Z

core/src/autogluon/core/trainer/abstract_trainer.py

+        self._time_limit = time_limit
+        self.reset_callbacks()
+        if callbacks is not None:
+            assert isinstance(callbacks, list), f"`callbacks` must be a list. Found invalid type: `{type(callbacks)}`."


Does it make sense to verify that callbacks are of type AbstractCallback?

Good idea! Added

shchur · 2024-07-30T09:50:36Z

core/src/autogluon/core/trainer/abstract_trainer.py

        # self._exceptions_list = []  # TODO: Keep exceptions list for debugging during benchmarking.

+        self.callbacks: List[AbstractCallback] = []
+        self._callback_early_stop = False


I'm not very familiar with the rest of the AbstractTrainer code, so this might be a bad suggestion, but would it be feasible to communicate the early stopping / interruption without altering the state of the trainer? I can imagine that in some scenarios such as distributed training it will be really tricky to reason about the state of this variable.

Tough to say, but I think I would prefer to edit the trainer state, as the intention is to not pass the trainer object to worker threads, and therefore it remains a singular source of truth. We can change it later if we find this has limitations.

github-actions · 2024-08-19T22:22:49Z

Job PR-4327-c34b62a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/c34b62a/index.html

github-actions · 2024-08-28T01:49:54Z

Job PR-4327-78d82af is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4327/78d82af/index.html

prateekdesai04 · 2024-08-28T03:31:30Z

core/src/autogluon/core/trainer/abstract_trainer.py

            )
            model_names_fit += base_model_names + aux_models
-        if self.model_best is None and len(model_names_fit) != 0:
+        if (self.model_best is None or infer_limit is not None) and len(model_names_fit) != 0:


qq: could you explain this check ?

Pseudocode breakdown of the check:

if (user specified infer limit) and (any model exists): select the best valid model based on validation score that satisfies the infer limit elif (trainer has not specified which model is best) and (any model exists): select the best valid model based on validation score

The added infer_limit check ensures that we always return a model that satisfies the infer_limit constraints. Sometimes, a non-ensemble model can get a better score while satisfying infer limit than an ensemble model, and by default only ensemble models are set to self.model_best prior to this call. This call ensures that the best model will be picked, even if it isn't an ensemble.

Innixma added API & Doc Improvements or additions to documentation enhancement New feature or request module: tabular labels Jul 18, 2024

Innixma added this to the 1.2 Release milestone Jul 18, 2024

eddiebergman reviewed Jul 25, 2024

View reviewed changes

Innixma force-pushed the callbacks branch from 7c407de to b851f76 Compare July 25, 2024 20:03

Innixma changed the title ~~[WIP] [tabular] Add initial callbacks support~~ [tabular] Add initial callbacks support Jul 30, 2024

Innixma requested review from prateekdesai04 and shchur July 30, 2024 02:19

shchur reviewed Jul 30, 2024

View reviewed changes

Innixma force-pushed the callbacks branch from c3b7fbf to a871f9f Compare August 15, 2024 23:58

Innixma added 13 commits August 27, 2024 18:08

Add initial callbacks support

d09abff

Add early_stopping_callback

6c8197f

Added more logging to EarlyStoppingCallback

95a1329

Added callback logging in Trainer

8b60ce8

Update callbacks

29c3b62

linting

2ae46c2

Add docstring

aac79ff

Update callbacks

468b45c

Add EarlyStoppingEnsembleCallback

e874008

Add enable_callbacks dynamic stacking argument

4a6162e

linting

088fd6f

callable -> typing.Callable

97255b0

minor edits

419174a

Innixma added 8 commits August 27, 2024 18:08

Add unit tests

e407907

Improve type hints typing.Callable -> AbstractCallback

fe9a7b9

address comments

118857d

Added TODO

8eb3953

typing update

63f5ccc

Skip weighted ensemble fit in callback if no new models

dc787f2

Address comments

dafff24

Fix import error

fc7a18a

Innixma force-pushed the callbacks branch from c34b62a to fc7a18a Compare August 27, 2024 18:10

Fix unit tests

78d82af

prateekdesai04 approved these changes Aug 28, 2024

View reviewed changes

Innixma merged commit 049c69e into autogluon:master Aug 28, 2024

This was referenced Sep 16, 2024

Tabular: Add Trainer callback support #2947

Closed

[tabular] Fix refit crash #4474

Merged

Innixma mentioned this pull request Oct 10, 2024

Add support for callback early stopping in simulations autogluon/tabarena#81

Open

Innixma deleted the callbacks branch April 16, 2025 21:12


		# self._exceptions_list = [] # TODO: Keep exceptions list for debugging during benchmarking.

		self.callbacks: List[callable] = []

[tabular] Add initial callbacks support #4327

[tabular] Add initial callbacks support #4327

Uh oh!

Conversation

Innixma commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2024

Uh oh!

github-actions bot commented Jul 20, 2024

Uh oh!

eddiebergman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Innixma commented Jul 25, 2024

Uh oh!

github-actions bot commented Jul 26, 2024

Uh oh!

github-actions bot commented Jul 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 19, 2024

Uh oh!

Innixma commented Jul 18, 2024 •

edited

Loading

eddiebergman Jul 25, 2024 •

edited

Loading

eddiebergman Jul 25, 2024 •

edited

Loading

eddiebergman Jul 25, 2024 •

edited

Loading

Innixma Aug 28, 2024 •

edited

Loading