Somewhat related, I'm running into a gRPC latency issue in https://github.com/gr...

littlecranky67 · 2025-07-23T17:47:21 1753292841

sounds like classic tcp congestion window scaling delay. Sounds like your payload exceeds 10x initcwnd.

lacop · 2025-07-23T18:06:22 1753293982

Doesn't initcwnd only apply as the initial value? I don't care that the first request on the gRPC channel is slow, but subsequent requests on the same channel reuse the TCP connection and should have larger window size. This works as long as the channel is actively being used, but after short inactivity (few hundred ms, unsure exactly) something appears to revert back.

littlecranky67 · 2025-07-23T18:13:26 1753294406

Yes, in case of hot tcp connections congestion control should not be the issue.

lacop · 2025-07-23T18:44:02 1753296242

Yeah that was my understanding too, hence I filed the bug (actually duplicate of older bug that was closed because poster didn't provide reproduction).

Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.

charleshn · 2025-07-23T23:47:46 1753314466

Could you check the value of your kernel's net.ipv4.tcp_slow_start_after_idle sysctl, and if it's non zero set it to 0?

lacop · 2025-07-24T10:50:37 1753354237

That seems to work, thank you!

Now latency is just RTT + server time + payloadsize/bandwidth, not multiple times RTT: https://github.com/grpc/grpc-go/issues/8436#issuecomment-311...

I was not aware of this setting, it's pretty unfortunate this is a system-level setting that can't be overridden on application layer, and the idle timeout can't be changed either. Will have to figure out how to safely make this change on the k8s service this is affecting...

perfgeek · 2025-07-26T04:00:29 1753502429

As you can imagine, when a TCP connection is first established, it has no knowledge of the conditions on the network. Thus we have slow start. At the same time, when a TCP connection goes idle, it's information about the conditions on the network become increasingly stale. Thus we have slow start after idle. In the Linux stack at least, being idle longer than the RTT (perhaps the computed RTO) is interpreted as meaning the TCP connection's idea of network conditions is no longer valid.

An application won't know anything about background specifics of the network to which the system on which it is running is attached. A system administrator might. In that sense at least, it is reasonable that it is a system tunable rather than a per-connection setsockopt().

littlecranky67 · 2025-07-24T06:43:16 1753339396

This sounds exactly like the culprit. I didn't knew there is a slow start after idle and it is set to 1 (active) by default.

I wonder if I should change this to 0 on my default desktop machines for all connections.

eivanov89 · 2025-07-23T18:10:11 1753294211

That's indeed interesting, thank you for sharing.