refactor(l1): unify discovery servers into single DiscoveryServer actor#6449
refactor(l1): unify discovery servers into single DiscoveryServer actor#6449ElFantasma wants to merge 11 commits intomainfrom
Conversation
Lines of code reportTotal lines added: Detailed view |
🤖 Codex Code Review
I did not find an EVM/consensus-facing issue in this PR; the clear problems are in discovery correctness and discovery-path robustness. I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR consolidates DiscoveryMultiplexer, discv4::DiscoveryServer, and discv5::DiscoveryServer into a single DiscoveryServer actor, reducing inter-actor messaging overhead and removing ~696 lines while keeping the same wire-level semantics. The refactor is well-structured with good test coverage. Confidence Score: 5/5Safe to merge; all findings are P2 style and quality issues with no protocol behavior regressions. No P0 or P1 regressions identified. The block_in_place concern is a design quality issue on multi-threaded runtimes. The IP voting non-update is likely a pre-existing incomplete feature since the PR claims no behavioral changes. Both findings are P2. discv5/server.rs (IP vote updater wiring), discovery/discv5_handlers.rs (deprecated RNG), discovery/server.rs (block_in_place pattern)
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/discovery/server.rs | Core unified actor; clean startup and routing logic; sync route_packet dispatch requires block_in_place workaround |
| crates/networking/p2p/discovery/discv4_handlers.rs | Discv4 handlers ported faithfully; block_in_place+block_on bridging is the only concern |
| crates/networking/p2p/discovery/discv5_handlers.rs | Discv5 handlers; one deprecated rand::thread_rng() call; IP-vote path collects votes but never applies them |
| crates/networking/p2p/discv4/server.rs | Correctly reduced to Discv4State data struct and Discv4Message; no behavioral issues |
| crates/networking/p2p/discv5/server.rs | Discv5State data struct; finalize_ip_vote_round always receives None updater so IP update never fires |
| crates/networking/p2p/discovery/mod.rs | Shared constants and DiscoveryConfig; initial_lookup_interval field always overridden in network.rs |
| crates/networking/p2p/network.rs | Spawns single DiscoveryServer; correctly overrides initial_lookup_interval from context |
| test/tests/p2p/discovery/discv5_server_tests.rs | Tests migrated to unified DiscoveryServer; good coverage of rate limiting and PONG-triggered ENR updates |
Sequence Diagram
sequenceDiagram
participant UDP as UDP Socket
participant DS as DiscoveryServer
participant V4 as Discv4Handlers
participant V5 as Discv5Handlers
participant PT as PeerTable
UDP->>DS: RawPacket via UdpFramed stream
DS->>DS: route_packet(data, from)
alt discv4 packet (keccak hash match)
DS->>DS: Discv4Packet::decode(data)
DS->>V4: discv4_process_message(msg)
V4->>PT: insert_if_new / record_ping / get_contact
V4->>UDP: send_to Pong/Neighbors/ENRResponse
else discv5 packet
DS->>DS: Discv5Packet::decode(data, node_id)
DS->>V5: discv5_handle_packet(msg)
V5->>PT: get_session_info / set_session_info
V5->>UDP: send_to WhoAreYou/Handshake/Ordinary
end
Note over DS: Shared periodic timers
DS->>V4: discv4_revalidate every 1s
DS->>V5: discv5_revalidate every 1s
DS->>V4: discv4_lookup adaptive interval
DS->>V5: discv5_lookup adaptive interval
DS->>DS: prune every 5s
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/discv4_handlers.rs
Line: 26-31
Comment:
**`block_in_place` + `block_on` inside an already-async handler**
This sync wrapper calls `block_in_place` to enter async, but `handle_raw_packet` (the actor handler) is already `async`. `block_in_place` blocks the OS thread for the entire handler duration and panics on a `current_thread` Tokio runtime. Since `#[send_handler]` supports `async`, making `route_packet` and its callees `async fn` and awaiting them would eliminate `block_in_place` entirely. The same pattern appears in `discv5_handle_packet` (discv5_handlers.rs lines 41-46).
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/discv5/server.rs
Line: 140-169
Comment:
**IP auto-detection votes are collected but never applied**
`finalize_ip_vote_round` is designed to take an `IpUpdater` to write the winning IP back, but every call site passes `None`: `record_ip_vote` (line 134) and `cleanup_stale_entries` (line 104). The function docstring says 'When called from the unified server, `local_node` is passed so the IP can be updated', and the test comment says 'the server's prune/periodic handler would do that' — but no such periodic call with an `IpUpdater` exists in `discovery/server.rs`. The PONG-based IP self-detection feature silently collects votes but never updates the node's external IP.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/discv5_handlers.rs
Line: 149-150
Comment:
**`rand::thread_rng()` is deprecated**
All other RNG usages in this file use `OsRng`, but this site still calls the deprecated `rand::thread_rng()`. Suggested fix:
```suggestion
let ephemeral_key = SecretKey::new(&mut OsRng);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/mod.rs
Line: 27
Comment:
**`initial_lookup_interval` is always overridden in `network::start_network`**
`start_network` unconditionally overwrites `config.initial_lookup_interval` with `context.initial_lookup_interval`, making the default set in `DiscoveryConfig::default()` dead code for every production caller. Consider removing the field from `DiscoveryConfig` or adding a doc-comment that `start_network` always overrides it.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "refactor(l1): unify discovery servers in..." | Re-trigger Greptile
| pub(crate) fn discv4_process_message(&mut self, msg: Discv4Message) { | ||
| let _ = tokio::task::block_in_place(|| { | ||
| tokio::runtime::Handle::current().block_on(self.discv4_process_message_inner(msg)) | ||
| }) | ||
| .inspect_err(|e| error!(protocol = "discv4", err=?e, "Error Handling Discovery message")); | ||
| } |
There was a problem hiding this comment.
block_in_place + block_on inside an already-async handler
This sync wrapper calls block_in_place to enter async, but handle_raw_packet (the actor handler) is already async. block_in_place blocks the OS thread for the entire handler duration and panics on a current_thread Tokio runtime. Since #[send_handler] supports async, making route_packet and its callees async fn and awaiting them would eliminate block_in_place entirely. The same pattern appears in discv5_handle_packet (discv5_handlers.rs lines 41-46).
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/discv4_handlers.rs
Line: 26-31
Comment:
**`block_in_place` + `block_on` inside an already-async handler**
This sync wrapper calls `block_in_place` to enter async, but `handle_raw_packet` (the actor handler) is already `async`. `block_in_place` blocks the OS thread for the entire handler duration and panics on a `current_thread` Tokio runtime. Since `#[send_handler]` supports `async`, making `route_packet` and its callees `async fn` and awaiting them would eliminate `block_in_place` entirely. The same pattern appears in `discv5_handle_packet` (discv5_handlers.rs lines 41-46).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in 506ec7f — made route_packet, route_to_discv4, and route_to_discv5 async, removing the block_in_place + block_on bridge entirely. The #[send_handler] on handle_raw_packet is already async, so the handlers now just .await directly.
| fn finalize_ip_vote_round(&mut self, updater: Option<&mut IpUpdater<'_>>) { | ||
| let winner = self | ||
| .ip_votes | ||
| .iter() | ||
| .map(|(ip, voters)| (*ip, voters.len())) | ||
| .max_by_key(|(_, count)| *count); | ||
|
|
||
| if let Some((winning_ip, vote_count)) = winner { | ||
| // Only update if we have minimum votes and IP differs | ||
| if vote_count >= IP_VOTE_THRESHOLD && winning_ip != self.local_node.ip { | ||
| info!( | ||
| protocol = "discv5", | ||
| old_ip = %self.local_node.ip, | ||
| new_ip = %winning_ip, | ||
| votes = vote_count, | ||
| "External IP detected via PONG voting, updating local ENR" | ||
| ); | ||
| self.update_local_ip(winning_ip); | ||
| } | ||
| if let Some((winning_ip, vote_count)) = winner | ||
| && let Some(updater) = updater | ||
| && vote_count >= IP_VOTE_THRESHOLD | ||
| && winning_ip != updater.local_node.ip | ||
| { | ||
| info!( | ||
| protocol = "discv5", | ||
| old_ip = %updater.local_node.ip, | ||
| new_ip = %winning_ip, | ||
| votes = vote_count, | ||
| "External IP detected via PONG voting, updating local ENR" | ||
| ); | ||
| update_local_ip( | ||
| updater.local_node, | ||
| updater.local_node_record, | ||
| updater.signer, | ||
| winning_ip, | ||
| ); | ||
| } | ||
|
|
||
| // Reset for next round | ||
| self.ip_votes.clear(); | ||
| self.ip_vote_period_start = Some(Instant::now()); | ||
| self.first_ip_vote_round_completed = true; |
There was a problem hiding this comment.
IP auto-detection votes are collected but never applied
finalize_ip_vote_round is designed to take an IpUpdater to write the winning IP back, but every call site passes None: record_ip_vote (line 134) and cleanup_stale_entries (line 104). The function docstring says 'When called from the unified server, local_node is passed so the IP can be updated', and the test comment says 'the server's prune/periodic handler would do that' — but no such periodic call with an IpUpdater exists in discovery/server.rs. The PONG-based IP self-detection feature silently collects votes but never updates the node's external IP.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/discv5/server.rs
Line: 140-169
Comment:
**IP auto-detection votes are collected but never applied**
`finalize_ip_vote_round` is designed to take an `IpUpdater` to write the winning IP back, but every call site passes `None`: `record_ip_vote` (line 134) and `cleanup_stale_entries` (line 104). The function docstring says 'When called from the unified server, `local_node` is passed so the IP can be updated', and the test comment says 'the server's prune/periodic handler would do that' — but no such periodic call with an `IpUpdater` exists in `discovery/server.rs`. The PONG-based IP self-detection feature silently collects votes but never updates the node's external IP.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in 506ec7f — removed IpUpdater entirely. record_ip_vote and cleanup_stale_entries now return Option<IpAddr> (the winning IP to apply). The caller in discv5_handle_pong and the prune() handler apply the update directly where they have access to self.local_node / self.local_node_record / self.signer.
| let ephemeral_key = SecretKey::new(&mut rand::thread_rng()); | ||
| let ephemeral_pubkey = ephemeral_key.public_key(secp256k1::SECP256K1).serialize(); |
There was a problem hiding this comment.
rand::thread_rng() is deprecated
All other RNG usages in this file use OsRng, but this site still calls the deprecated rand::thread_rng(). Suggested fix:
| let ephemeral_key = SecretKey::new(&mut rand::thread_rng()); | |
| let ephemeral_pubkey = ephemeral_key.public_key(secp256k1::SECP256K1).serialize(); | |
| let ephemeral_key = SecretKey::new(&mut OsRng); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/discv5_handlers.rs
Line: 149-150
Comment:
**`rand::thread_rng()` is deprecated**
All other RNG usages in this file use `OsRng`, but this site still calls the deprecated `rand::thread_rng()`. Suggested fix:
```suggestion
let ephemeral_key = SecretKey::new(&mut OsRng);
```
How can I resolve this? If you propose a fix, please make it concise.| pub struct DiscoveryConfig { | ||
| pub discv4_enabled: bool, | ||
| pub discv5_enabled: bool, | ||
| pub initial_lookup_interval: f64, |
There was a problem hiding this comment.
initial_lookup_interval is always overridden in network::start_network
start_network unconditionally overwrites config.initial_lookup_interval with context.initial_lookup_interval, making the default set in DiscoveryConfig::default() dead code for every production caller. Consider removing the field from DiscoveryConfig or adding a doc-comment that start_network always overrides it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/discovery/mod.rs
Line: 27
Comment:
**`initial_lookup_interval` is always overridden in `network::start_network`**
`start_network` unconditionally overwrites `config.initial_lookup_interval` with `context.initial_lookup_interval`, making the default set in `DiscoveryConfig::default()` dead code for every production caller. Consider removing the field from `DiscoveryConfig` or adding a doc-comment that `start_network` always overrides it.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Acknowledged — the default in DiscoveryConfig::default() is mainly useful for tests and standalone usage. In production, start_network always overrides it from P2PContext::initial_lookup_interval. Changed the override pattern to use struct update syntax (DiscoveryConfig { initial_lookup_interval: context.initial_lookup_interval, ..config }) to make the intent clearer.
🤖 Claude Code ReviewHere is my review of PR #6449: Review:
|
| Severity | Item |
|---|---|
| Bug | IP update via PONG voting dead code — finalize_ip_vote_round never called with Some(updater) |
| Medium | block_in_place in sync packet handler blocks Tokio worker thread |
| Medium | Invariant panics on discv5 state must exist — single point of failure |
| Low | pub visibility on live cryptographic session state |
| Nit | get_lookup_interval unnecessarily takes &mut self |
| Nit | Inconsistent ? vs % in tracing log fields |
Automated review by Claude (Anthropic) · sonnet · custom prompt
|
Addressed all review feedback in 506ec7f: Fixed:
Acknowledged but not changing:
|
…tbound_key, consistent log format
| Ok(()) | ||
| } | ||
|
|
||
| pub fn new_for_test( |
There was a problem hiding this comment.
This probably shouldn't be here.
There was a problem hiding this comment.
Fixed in 672fa03 — moved the test helper behind #[cfg(any(test, feature = "test-utils"))]. The ethrex-test crate now enables the test-utils feature on its ethrex-p2p dependency. Not in the production build path anymore.
| discv4: None, | ||
| discv5: Some(Discv5State::default()), |
There was a problem hiding this comment.
Only has discv5 but the name or the (nonexisting) documentation doesn't mention it.
There was a problem hiding this comment.
Fixed in 672fa03 — renamed to new_for_discv5_test and added a docstring making it explicit that only discv5 state is initialized. The config now also explicitly sets discv4_enabled: false, discv5_enabled: true.
| pub(crate) initial_lookup_interval: f64, | ||
| pub(crate) config: DiscoveryConfig, |
There was a problem hiding this comment.
Isn't initial_lookup_interval already in the DiscoveryConfig?
There was a problem hiding this comment.
Fixed in 672fa03 — removed the standalone initial_lookup_interval field and now reads it from self.config.initial_lookup_interval directly.
Motivation
Resolves #5990. The discovery protocol uses four GenServers (DiscoveryMultiplexer + Discv4Server + Discv5Server + PeerTable) with significant inter-GenServer communication overhead and ~30-40% code duplication between v4 and v5 servers. Since PeerTable is already shared and the multiplexer already discriminates packets, merging the three discovery GenServers into a single DiscoveryServer simplifies the architecture.
Description
Merges
DiscoveryMultiplexer,discv4::DiscoveryServer, anddiscv5::DiscoveryServerinto a single unifieddiscovery::DiscoveryServeractor.Architecture: 4 actors → 2 actors (DiscoveryServer + PeerTable)
Key changes:
discovery/server.rs— unified actor struct, protocol, startup, periodic task handlers, packet discriminationdiscovery/discv4_handlers.rs— all discv4 handler logic asimpl DiscoveryServermethodsdiscovery/discv5_handlers.rs— all discv5 handler logic asimpl DiscoveryServermethodsdiscv4/server.rs— replaced old actor withDiscv4Statedata struct +Discv4Messagetypediscv5/server.rs— replaced old actor withDiscv5Statedata struct +Discv5Messagetypenetwork.rs— spawns singleDiscoveryServerinstead of 3 actorsdiscovery/multiplexer.rs— no longer neededINITIAL_LOOKUP_INTERVAL_MS,LOOKUP_INTERVAL_MS,lookup_interval_functiontodiscovery/mod.rs(shared by discv4, discv5, and RLPx initiator)DiscoveryConfignow includesinitial_lookup_intervalfieldNo behavioral changes — same protocols, same packet formats, same peer discovery semantics. Net -696 lines.
Closes #5990
Checklist
All items addressed — no Store schema changes.