Skip to content

Conversation

@codemaestro64
Copy link

What this PR does

Adds Python (py-libp2p v0.2.9) to the existing hole-punch-interop test suite.

Implemented

  • /libp2p/dcutr/0.1.0 protocol
  • CONNECTSYNC message exchange (JSON)
  • Observed-address exchange (host.get_network().get_addrs())
  • Simultaneous direct dial (100 ms window)
  • Verification – ping RTT < 100 ms → direct connection, otherwise fallback to relay
  • Bidirectional – Python initiator ↔ Go receiver and Go initiator ↔ Python receiver
  • NAT simulation via Docker + iptables masquerade (entrypoint)

Output

"python-v0.2.9 x go-v0.42 (dcutr,tcp,noise)",success
"go-v0.42 x python-v0.2.9 (dcutr,tcp,noise)",success

@dhuseby dhuseby added the hole-punching Related to hole punching interop label Nov 6, 2025
@seetadev seetadev marked this pull request as ready for review November 9, 2025 21:00
@sumanjeet0012
Copy link
Contributor

@codemaestro64 The CI/CD pipeline is failing due to a minor issue.

From the logs, we can see the following:

Dockerfile:6

4 | COPY pyproject.toml .
5 | RUN pip install --no-cache-dir -e .
6 | >>> COPY hole_punch_test.py .
7 | VOLUME /results
8 | CMD ["python", "hole_punch_test.py"]

ERROR: failed to build: failed to solve: failed to compute cache key: failed to calculate checksum of ref 06a078c6-7ffc-447f-b88c-6fd49120bdc5::7hh376udkbb2bncz241saf56a: "/hole_punch_test.py": not found

The issue is that the actual file name is 'hole_punch.py', but the Dockerfile references 'hole_punch_test.py'. This mismatch is causing the build error.

To resolve this, you can either:

  1. Rename the file to hole_punch_test.py, or
  2. Update the Dockerfile to use hole_punch.py.

The incorrect references in the Dockerfile are:

  • COPY hole_punch_test.py .
  • CMD ["python", "hole_punch_test.py"]

@acul71
Copy link
Contributor

acul71 commented Nov 24, 2025

@sumanjeet0012 @codemaestro64

Python Implementation Missing Features - Hole Punch Interop

Overview

This document analyzes what the Python implementation in hole-punch-interop/impl/python/v0.4.0 is missing compared to the requirements specified in hole-punch-interop/README.md and the expected test flow.

Critical Missing Features

1. Binary Name Requirement

Status: ❌ MISSING

  • Requirement: Docker containers MUST have a binary called hole-punch-client in their $PATH
  • Current State: The Python implementation runs python hole_punch.py directly via start.sh
  • Fix Needed: Create a wrapper script or symlink named hole-punch-client that executes the Python script

2. Redis Integration

Status: ❌ MISSING

  • Requirement: Clients must connect to Redis at redis:6379 and:
    • Block until RELAY_TCP_ADDRESS or RELAY_QUIC_ADDRESS is available
    • Listener must push its PeerId to LISTEN_CLIENT_PEER_ID after making a reservation
    • Dialer must block on LISTEN_CLIENT_PEER_ID availability
  • Current State: Python code expects RELAY_MULTIADDR and TARGET_ID environment variables directly, completely bypassing Redis
  • Fix Needed: Implement Redis client using redis Python library with blocking operations (blpop with timeout 0)

3. Test Flow Implementation

Status: ❌ INCORRECT

The Python implementation doesn't follow the required test flow:

Missing Steps:

  1. Redis Connection: No Redis client initialization
  2. Relay Address Discovery: Should wait for RELAY_TCP_ADDRESS/RELAY_QUIC_ADDRESS from Redis, not env vars
  3. Identify Protocol: Should use identify to discover external address (mentioned in README as SHOULD)
  4. Relay Reservation: Listener should listen on relay and make a reservation (not just connect)
  5. PeerId Publishing: Listener should push PeerId to Redis key LISTEN_CLIENT_PEER_ID after reservation
  6. PeerId Waiting: Dialer should block on LISTEN_CLIENT_PEER_ID from Redis, not use TARGET_ID env var

4. Output Format

Status: ❌ INCORRECT

  • Requirement: RTT MUST be printed to stdout in JSON format:
    { "rtt_to_holepunched_peer_millis": 12 }
  • Current State: Writes to /results/results.csv in CSV format
  • Fix Needed: Print JSON to stdout instead of writing CSV file

5. Mode Values

Status: ❌ INCORRECT

  • Requirement: MODE env variable should be listen or dial
  • Current State: Python code checks for initiator or receiver
  • Fix Needed: Update mode checking to use listen/dial

6. Transport Support

Status: ❌ MISSING

  • Requirement: Must support both tcp and quic transports
  • Current State: Only TCP is implemented (hardcoded)
  • Fix Needed: Add QUIC transport support and select based on TRANSPORT env variable

7. Required System Tools

Status: ❌ MISSING

  • Requirement: MUST have dig, curl, jq and tcpdump installed
  • Current State: Dockerfile only installs iptables and libgmp-dev
  • Fix Needed: Add these packages to Dockerfile:
    RUN apt-get update && apt-get install -y \
        git build-essential iptables libgmp-dev \
        dnsutils curl jq tcpdump \
        && rm -rf /var/lib/apt/lists/*

8. TCP_NODELAY

Status: ❌ MISSING

  • Requirement: Implementations MUST set TCP_NODELAY for the TCP transport
  • Current State: Not implemented in py-libp2p. The TCP transport (libp2p/transport/tcp/tcp.py) uses trio.open_tcp_stream() and trio.serve_tcp() which don't configure TCP_NODELAY. No socket option configuration exists in the codebase.
  • Fix Needed:
    • Option 1: Modify py-libp2p's TCP transport to set TCP_NODELAY on sockets. The TrioTCPStream class has access to stream.socket, so it's possible to add setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) after socket creation.
    • Option 2: If modifying py-libp2p is not possible, create a wrapper that sets TCP_NODELAY on connections after they're established (though this is less ideal).
  • Note: Checked /home/luca/PNL_Launchpad_Curriculum/Libp2p/py-libp2p - no TCP_NODELAY support found.

9. Connection Keep-Alive

Status: ❌ UNCLEAR

  • Requirement: Implementations MUST make sure connections are being kept alive
  • Current State: Not explicitly configured
  • Fix Needed: Ensure keep-alive is enabled on TCP connections

10. 0RTT Negotiation

Status: ❌ UNCLEAR

  • Requirement: Dialer and listener both MUST use 0RTT negotiation for protocols
  • Current State: Not clear if implemented
  • Fix Needed: Verify and ensure 0RTT is used for protocol negotiation

11. Protocol Version

Status: ⚠️ POTENTIALLY INCORRECT

  • Current State: Uses /libp2p/dcutr/1.0.0
  • Note: README mentions /libp2p/dcutr/0.1.0 but Rust implementation uses /libp2p/dcutr/1.0.0 - need to verify correct version

12. Muxer Selection

Status: ❌ INCORRECT

  • Requirement: For TCP, MUST use noise + yamux to upgrade the connection
  • Current State: Uses muxer_preference="MPLEX" (mplex instead of yamux)
  • Fix Needed: Change to yamux for TCP transport

13. Logging Requirements

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Logs MUST go to stderr, RTT json MUST go to stdout
  • Current State: Uses Python logging which may go to stderr by default, but RTT goes to CSV file instead of stdout
  • Fix Needed: Ensure logging goes to stderr, RTT JSON goes to stdout

14. Listener Behavior

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Listener MUST NOT early-exit but wait to be killed by test runner
  • Current State: Uses await asyncio.Event().wait() which should work, but needs verification
  • Fix Needed: Ensure listener waits indefinitely until killed

15. Error Handling

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Implementations SHOULD exit early with a non-zero exit code if anything goes wrong
  • Current State: Some error handling exists but may not exit with proper codes
  • Fix Needed: Ensure all error paths exit with non-zero codes

Summary

The Python implementation is missing the core Redis-based orchestration mechanism and doesn't follow the required test flow. It appears to be using a simplified approach that bypasses the standard test infrastructure. To be compliant, it needs:

  1. Redis integration - Most critical missing piece
  2. Correct test flow - Following the step-by-step process in README
  3. Binary wrapper - hole-punch-client in PATH
  4. Output format - JSON to stdout instead of CSV
  5. Transport support - Add QUIC
  6. System dependencies - Add required tools
  7. Protocol compliance - yamux instead of mplex, TCP_NODELAY, etc.

Reference Implementation

The Rust implementation in hole-punch-interop/impl/rust/v0.53/rust-libp2p-.../hole-punching-tests/src/main.rs provides a complete reference for how the test flow should be implemented, including:

  • Redis client with blocking operations
  • Proper test flow following all steps
  • Correct output format
  • Transport selection
  • Error handling

@dhuseby
Copy link
Contributor

dhuseby commented Dec 16, 2025

This PR needs to be updated to the new hole punch test framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hole-punching Related to hole punching interop

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants