这是indexloc提供的服务,不要输入任何密码
Skip to content

Error in merge_accumulators when using keras metrics on dataflow #158

@zywind

Description

@zywind

System information

  • Have I written custom code (as opposed to using a stock example script
    provided in TensorFlow Model Analysis)
    : Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): GCP Dataflow Apache Beam Python 3.7 SDK 2.39.0
  • TensorFlow Model Analysis installed from (source or binary): binary
  • TensorFlow Model Analysis version (use command below): 0.33
  • Python version: 3.7
  • Jupyter Notebook version: Jupyter lab 3.2.8
  • Exact command to reproduce:

I am using TFX's evaluator

eval_config = tfma.EvalConfig(
  model_specs=model_specs,
  metrics_specs=tfma.metrics.specs_from_metrics([
      tf.keras.metrics.AUC(curve='ROC', name='ROCAUC'),
      tf.keras.metrics.AUC(curve='PR', name='PRAUC'),
      tf.keras.metrics.Precision(),
      tf.keras.metrics.Recall(),
      tf.keras.metrics.BinaryAccuracy(),
    ]),
  slicing_specs=slicing_specs
)

evaluator = Evaluator(
  eval_config=eval_config,
  model=model,
  examples=transform_examples,
)

context.run(evaluator)

Describe the problem

Running the same evaluation using Beam's DirectRunner locally will not cause any error, but whenever I run it on dataflow and when dataflow spawns more than one worker, I get an error like so:

output.with_value(self.phased_combine_fn.apply(output.value)): File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/combiners.py", line 882, in merge_only return self.combine_fn.merge_accumulators(accumulators) File "/home/sandbox/.pex/install/apache_beam-2.39.0-cp37-cp37m-linux_x86_64.whl.06f7ceb62380d1c704d774a5096a04f953de60c9/apache_beam-2.39.0-cp37-cp37m-linux_x86_64.whl/apache_beam/transforms/combiners.py", line 665, in merge_accumulators a in zip(self._combiners, zip(*accumulators_batch)) File "/home/sandbox/.pex/install/apache_beam-2.39.0-cp37-cp37m-linux_x86_64.whl.06f7ceb62380d1c704d774a5096a04f953de60c9/apache_beam-2.39.0-cp37-cp37m-linux_x86_64.whl/apache_beam/transforms/combiners.py", line 665, in a in zip(self._combiners, zip(*accumulators_batch)) File "/usr/local/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/tf_metric_wrapper.py", line 560, in merge_accumulators for metric_index in range(len(self._metrics[output_name])): TypeError: 'NoneType' object is not subscriptable

Based on the dataflow log, the failing steps were:

  • ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/CombineMetricsPerSlice/CombinePerKey(PreCombineFn)/Combine
  • ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/CombineMetricsPerSlice/CombinePerKey(PreCombineFn)/GroupByKey
  • ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/CombineMetricsPerSlice/CombinePerKey(PostCombineFn)/GroupByKey

I see that you have this commit, which appears to be addressing this problem, but it is immediately rolled back. I wonder if you have had similar issues and what would you recommend to fix the error.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions