Skip to content

otel: add tcp metrics#12652

Open
AgraVator wants to merge 13 commits intogrpc:masterfrom
AgraVator:tcp-metrics
Open

otel: add tcp metrics#12652
AgraVator wants to merge 13 commits intogrpc:masterfrom
AgraVator:tcp-metrics

Conversation

@AgraVator
Copy link
Copy Markdown
Contributor

@AgraVator AgraVator commented Feb 11, 2026

Implements A80

@AgraVator AgraVator closed this Feb 11, 2026
@AgraVator AgraVator reopened this Feb 11, 2026
@AgraVator AgraVator marked this pull request as ready for review February 16, 2026 16:14
java.util.List<String> labelValues = getLabelValues(channel);
try {
if (channel.getClass().getName().equals(epollSocketChannelClassName)) {
Class<?> tcpInfoClass = Class.forName(epollTcpInfoClassName);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of reflection in hot exporting path. So, a couple of questions.

  1. Why do we choose reflectin over typecasting and calling actual methods? Reducing dependency surface?
  2. Is there a way to optimise this like caching value on per channel level? Something like a creating a runnable per channel. Not sure how it'd be possible to share this data for the final collection on channel inactive.

Method rttMethod = tcpInfoClass.getMethod("rtt");

long totalRetrans = (Long) totalRetransMethod.invoke(info);
int retransmits = (Integer) retransmitsMethod.invoke(info);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these cumulative metrics or are these total retransmits since the connection start? If latter, the current logic is incorrect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAh I see, then I think we should not be using a counter for this... Will clarify in the gRFC

@AgraVator AgraVator requested a review from ejona86 February 26, 2026 09:54
Copy link
Copy Markdown
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"otel" is a very surprising prefix to use for the PR/commit, as opentelemetry isn't actually changed at all.

AgraVator and others added 6 commits March 9, 2026 14:21
The backoff timer is only used when serializeRetries=true, and that
exists to match the old/current pick_first's behavior as closely as
possible. InternalSubchannel.updateAddresses() would take no action when
in TRANSIENT_FAILURE; it would update the addresses and just wait for
the backoff timer to expire.

Note that this only impacts serializeRetries=true; in the other cases we
do want to start trying to the new addresses immediately, because the
backoff timers are in the subchannels.

Note that this change was also important because requestConnection() can
be directly triggered by the user with channel.getState(true), and that
shouldn't defeat the backoff timer.
Since we're only supporting API levels 23+, all the supported Android
versions handle multidex natively, and without any bugs to workaround.

Also bump some minSdkVersion that didn't get updated in fa7b52b so
that multiDex is actually enabled by default.

See also b/476359563
… and 'core: Wait for backoff timer on address update in pick_first'
@AgraVator AgraVator requested a review from ejona86 March 11, 2026 09:52
Copy link
Copy Markdown
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description's "Implements A80" doesn't link to the gRFC; it just links to this PR.

@AgraVator AgraVator requested a review from ejona86 March 18, 2026 08:52
@AgraVator AgraVator requested a review from ejona86 March 25, 2026 12:56
}
}
});
this.tcpMetrics = new TcpMetrics(metricRecorder);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this next to the other assignments. Once we register listeners, we can get callbacks on those listeners. That's not that high of a risk with Netty, but seems needless. Also, this line is not at all obvious after that big Http2ConnectionAdapter implementation, without even an empty line.

List<? extends ServerStreamTracer.Factory> streamTracerFactories) {
return buildTransportServers(streamTracerFactories);
List<? extends ServerStreamTracer.Factory> streamTracerFactories,
io.grpc.MetricRecorder metricRecorder) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an import?


static final class EpollInfo {
final Class<?> channelClass;
final Class<?> infoClass;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field looks unused.


@Before
public void setUp() throws Exception {
TcpMetrics.epollInfo = TcpMetrics.loadEpollInfo();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in an @After. The last test that runs will leave the test value around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants