This repository was archived by the owner on Nov 11, 2022. It is now read-only.
Version 1.9.0
- Added the
ValueProviderinterface for use in pipeline options. Making an option of typeValueProvider<T>instead ofTallows its value to be supplied at runtime (rather than pipeline construction time) and enables Dataflow templates. Support forValueProviderhas been added toTextIO,PubSubIO, andBigQueryIOand can be added to arbitrary PTransforms as well. - Added the ability to automatically save profiling information to Google Cloud Storage using the
--saveProfilesToGcspipeline option. For more information on profiling pipelines executed by theDataflowPipelineRunner, see issue #72. - Deprecated the
--enableProfilingAgentpipeline option that saved profiles to the individual worker disks. For more information on profiling pipelines executed by theDataflowPipelineRunner, see issue #72. - Changed
FileBasedSourceto throw an exception when reading from a file pattern that has no matches. Pipelines will now fail at runtime rather than silently reading no data in this case. This change affectsTextIO.ReadorAvroIO.Readwhen configuredwithoutValidation. - Enhanced
Codervalidation in theDirectPipelineRunnerto catch coders that cannot properly encode and decode their input. - Improved display data throughout core transforms, including properly handling arrays in
PipelineOptions. - Improved performance for pipelines using the
DataflowPipelineRunnerin streaming mode. - Improved scalability of the
InProcessRunner, enabling testing with larger datasets. - Improved the cleanup of temporary files created by
TextIO,AvroIO, and otherFileBasedSourceimplementations. - Modified the default version range in the archetypes to exclude beta releases of Dataflow SDK for Java, version 2.0.0 and later.