Skip to content

Commit ca70eaa

Browse files
authored
Add DiskProvisionedIops/ThroughputMibps pipeline options for the Java and GO SDKs and update Go client libraries (#38349)
* Restore Java and Go changes for disk provisioned IOPS and throughput * Add CHANGES.md entry for disk provisioned IOPS and throughput * restore go changes * initialize options map in dataflow job to prevent nil pointer exceptions * go fmt * add testDiskProvisionedOptionsConfig unit test * Update pr id in changes.md
1 parent 8fc2b6b commit ca70eaa

16 files changed

Lines changed: 319 additions & 1021 deletions

File tree

CHANGES.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969

7070
## New Features / Improvements
7171

72+
* Added support for setting disk provisioned IOPS and throughput in Dataflow runner via `--diskProvisionedIops` and `--diskProvisionedThroughputMibps` pipeline options (Java/Go) ([#38349](https://github.com/apache/beam/issues/38349)).
7273
* X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
7374
* TriggerStateMachineRunner changes from BitSetCoder to SentinelBitSetCoder to
7475
encode finished bitset. SentinelBitSetCoder and BitSetCoder are state
@@ -104,6 +105,7 @@
104105

105106
## Highlights
106107

108+
107109
## I/Os
108110

109111
* DebeziumIO (Java): added `OffsetRetainer` interface and `FileSystemOffsetRetainer` implementation to persist and restore CDC offsets across pipeline restarts, and exposed `withStartOffset` / `withOffsetRetainer` on `DebeziumIO.Read` and the cross-language `ReadBuilder` ([#28248](https://github.com/apache/beam/issues/28248)).
@@ -2429,4 +2431,4 @@ Schema Options, it will be removed in version `2.23.0`. ([BEAM-9704](https://iss
24292431

24302432
## Highlights
24312433

2432-
- For versions 2.19.0 and older release notes are available on [Apache Beam Blog](https://beam.apache.org/blog/).
2434+
- For versions 2.19.0 and older release notes are available on [Apache Beam Blog](https://beam.apache.org/blog/).

buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -742,7 +742,7 @@ class BeamModulePlugin implements Plugin<Project> {
742742
google_api_common : "com.google.api:api-common", // google_cloud_platform_libraries_bom sets version
743743
google_api_services_bigquery : "com.google.apis:google-api-services-bigquery:v2-rev20251012-2.0.0", // [bomupgrader] sets version
744744
google_api_services_cloudresourcemanager : "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20250606-2.0.0", // [bomupgrader] sets version
745-
google_api_services_dataflow : "com.google.apis:google-api-services-dataflow:v1b3-rev20260118-$google_clients_version",
745+
google_api_services_dataflow : "com.google.apis:google-api-services-dataflow:v1b3-rev20260405-$google_clients_version",
746746
google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1-rev20240130-$google_clients_version",
747747
google_api_services_pubsub : "com.google.apis:google-api-services-pubsub:v1-rev20220904-$google_clients_version",
748748
google_api_services_storage : "com.google.apis:google-api-services-storage:v1-rev20260204-2.0.0", // [bomupgrader] sets version

runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,13 @@ public Job translate(List<DataflowPackage> packages) {
489489
if (options.getDiskSizeGb() > 0) {
490490
workerPool.setDiskSizeGb(options.getDiskSizeGb());
491491
}
492+
if (options.getDiskProvisionedIops() != null && options.getDiskProvisionedIops() > 0) {
493+
workerPool.setDiskProvisionedIops(options.getDiskProvisionedIops());
494+
}
495+
if (options.getDiskProvisionedThroughputMibps() != null
496+
&& options.getDiskProvisionedThroughputMibps() > 0) {
497+
workerPool.setDiskProvisionedThroughputMibps(options.getDiskProvisionedThroughputMibps());
498+
}
492499
AutoscalingSettings settings = new AutoscalingSettings();
493500
if (options.getAutoscalingAlgorithm() != null) {
494501
settings.setAlgorithm(options.getAutoscalingAlgorithm().getAlgorithm());

runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,20 @@ public String getAlgorithm() {
193193

194194
void setWorkerDiskType(String value);
195195

196+
/** Provisioned IOPS for the worker disk. */
197+
@Description("Provisioned IOPS for the worker disk.")
198+
@Nullable
199+
Long getDiskProvisionedIops();
200+
201+
void setDiskProvisionedIops(Long value);
202+
203+
/** Provisioned throughput in MiB/s for the worker disk. */
204+
@Description("Provisioned throughput in MiB/s for the worker disk.")
205+
@Nullable
206+
Long getDiskProvisionedThroughputMibps();
207+
208+
void setDiskProvisionedThroughputMibps(Long value);
209+
196210
/**
197211
* Specifies whether worker pools should be started with public IP addresses.
198212
*

runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -757,6 +757,37 @@ public void testDiskSizeGbConfig() throws IOException {
757757
assertEquals(diskSizeGb, job.getEnvironment().getWorkerPools().get(0).getDiskSizeGb());
758758
}
759759

760+
@Test
761+
public void testDiskProvisionedOptionsConfig() throws IOException {
762+
final Long diskProvisionedIops = 1000L;
763+
final Long diskProvisionedThroughputMibps = 100L;
764+
765+
DataflowPipelineOptions options = buildPipelineOptions();
766+
options.setDiskProvisionedIops(diskProvisionedIops);
767+
options.setDiskProvisionedThroughputMibps(diskProvisionedThroughputMibps);
768+
769+
Pipeline p = buildPipeline(options);
770+
p.traverseTopologically(new RecordingPipelineVisitor());
771+
SdkComponents sdkComponents = createSdkComponents(options);
772+
RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p, sdkComponents, true);
773+
Job job =
774+
DataflowPipelineTranslator.fromOptions(options)
775+
.translate(
776+
p,
777+
pipelineProto,
778+
sdkComponents,
779+
DataflowRunner.fromOptions(options),
780+
Collections.emptyList())
781+
.getJob();
782+
783+
assertEquals(1, job.getEnvironment().getWorkerPools().size());
784+
assertEquals(
785+
diskProvisionedIops, job.getEnvironment().getWorkerPools().get(0).getDiskProvisionedIops());
786+
assertEquals(
787+
diskProvisionedThroughputMibps,
788+
job.getEnvironment().getWorkerPools().get(0).getDiskProvisionedThroughputMibps());
789+
}
790+
760791
/** A composite transform that returns an output that is unrelated to the input. */
761792
private static class UnrelatedOutputCreator
762793
extends PTransform<PCollection<Integer>, PCollection<Integer>> {

runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,4 +322,13 @@ public void destroy() {
322322
TimeoutException.class, () -> DefaultGcpRegionFactory.getRegionFromGcloudCli(1L));
323323
}
324324
}
325+
326+
@Test
327+
public void testDiskProvisionedIopsAndThroughput() {
328+
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
329+
options.setDiskProvisionedIops(1000L);
330+
options.setDiskProvisionedThroughputMibps(100L);
331+
assertEquals(Long.valueOf(1000), options.getDiskProvisionedIops());
332+
assertEquals(Long.valueOf(100), options.getDiskProvisionedThroughputMibps());
333+
}
325334
}

sdks/go.mod

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ go 1.26.0
2525
toolchain go1.26.2
2626

2727
require (
28-
cloud.google.com/go/bigquery v1.72.0
29-
cloud.google.com/go/bigtable v1.41.0
30-
cloud.google.com/go/datastore v1.21.0
28+
cloud.google.com/go/bigquery v1.74.0
29+
cloud.google.com/go/bigtable v1.42.0
30+
cloud.google.com/go/datastore v1.22.0
3131
cloud.google.com/go/profiler v0.4.3
3232
cloud.google.com/go/pubsub v1.50.1
33-
cloud.google.com/go/spanner v1.87.0
33+
cloud.google.com/go/spanner v1.88.0
3434
cloud.google.com/go/storage v1.59.2
3535
github.com/aws/aws-sdk-go-v2 v1.41.5
3636
github.com/aws/aws-sdk-go-v2/config v1.32.7
@@ -56,12 +56,12 @@ require (
5656
github.com/xitongsys/parquet-go-source v0.0.0-20241021075129-b732d2ac9c9b
5757
go.mongodb.org/mongo-driver v1.17.9
5858
golang.org/x/net v0.52.0
59-
golang.org/x/oauth2 v0.35.0
59+
golang.org/x/oauth2 v0.36.0
6060
golang.org/x/sync v0.20.0
6161
golang.org/x/sys v0.42.0
6262
golang.org/x/text v0.35.0
63-
google.golang.org/api v0.257.0
64-
google.golang.org/genproto v0.0.0-20250922171735-9219d122eba9
63+
google.golang.org/api v0.276.0
64+
google.golang.org/genproto v0.0.0-20260319201613-d00831a3d3e7
6565
google.golang.org/grpc v1.80.0
6666
google.golang.org/protobuf v1.36.11
6767
gopkg.in/yaml.v2 v2.4.0
@@ -77,13 +77,13 @@ require (
7777

7878
require (
7979
cel.dev/expr v0.25.1 // indirect
80-
cloud.google.com/go/auth v0.17.0 // indirect
80+
cloud.google.com/go/auth v0.20.0 // indirect
8181
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
8282
cloud.google.com/go/monitoring v1.24.3 // indirect
8383
cloud.google.com/go/pubsub/v2 v2.0.0 // indirect
8484
dario.cat/mergo v1.0.2 // indirect
8585
filippo.io/edwards25519 v1.1.1 // indirect
86-
github.com/GoogleCloudPlatform/grpc-gcp-go/grpcgcp v1.5.3 // indirect
86+
github.com/GoogleCloudPlatform/grpc-gcp-go/grpcgcp v1.6.0 // indirect
8787
github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.31.0 // indirect
8888
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.54.0 // indirect
8989
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.54.0 // indirect
@@ -124,8 +124,8 @@ require (
124124
go.einride.tech/aip v0.73.0 // indirect
125125
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
126126
go.opentelemetry.io/contrib/detectors/gcp v1.39.0 // indirect
127-
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.63.0 // indirect
128-
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 // indirect
127+
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.67.0 // indirect
128+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 // indirect
129129
go.opentelemetry.io/otel v1.43.0 // indirect
130130
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0 // indirect
131131
go.opentelemetry.io/otel/metric v1.43.0 // indirect
@@ -141,7 +141,7 @@ require (
141141
cloud.google.com/go v0.123.0 // indirect
142142
cloud.google.com/go/compute/metadata v0.9.0 // indirect
143143
cloud.google.com/go/iam v1.5.3 // indirect
144-
cloud.google.com/go/longrunning v0.7.0 // indirect
144+
cloud.google.com/go/longrunning v0.8.0 // indirect
145145
github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c // indirect
146146
github.com/Microsoft/go-winio v0.6.2 // indirect
147147
github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40 // indirect
@@ -175,8 +175,8 @@ require (
175175
github.com/google/pprof v0.0.0-20250602020802-c6617b811d0e // indirect
176176
github.com/google/renameio/v2 v2.0.0 // indirect
177177
github.com/google/s2a-go v0.1.9 // indirect
178-
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
179-
github.com/googleapis/gax-go/v2 v2.15.0 // indirect
178+
github.com/googleapis/enterprise-certificate-proxy v0.3.14 // indirect
179+
github.com/googleapis/gax-go/v2 v2.21.0 // indirect
180180
github.com/gorilla/handlers v1.5.2 // indirect
181181
github.com/gorilla/mux v1.8.1 // indirect
182182
github.com/inconshreveable/mousetrap v1.1.0 // indirect

0 commit comments

Comments
 (0)