Skip to content

Coordinated lifecycle ends immediately despite all required participants connecting (C API, Windows/MinGW) #351

Description

@KBARMAN11

Summary

A participant using OperationMode::Coordinated with a TimeSyncService simulation step handler completes its lifecycle almost immediately after StartLifecycle(), even when a sil-kit-system-controller is running and has confirmed all required participants are connected. WaitForLifecycleToComplete() returns right away instead of blocking for the simulation duration, and the simulation step handler is never (or barely) invoked.

This reproduces with the C API on Windows, built with MinGW-w64/GCC, using two minimal, otherwise-independent participants.

Environment

  • SIL Kit version: 5.0.5 (precompiled Windows package from GitHub releases)
  • OS: Windows 10/11
  • Compiler: GCC (MinGW-w64, UCRT runtime)
  • API used: C API (silkit/capi/*.h)
  • Registry: sil-kit-registry.exe, started fresh, default config
  • System Controller: sil-kit-system-controller.exe <ParticipantA> <ParticipantB>

Steps to reproduce

  1. Start a fresh registry:
    sil-kit-registry.exe --listen-uri silkit://localhost:8500
    
  2. Start the system controller, declaring two required participants:
    sil-kit-system-controller.exe ParticipantA ParticipantB
    
  3. Build and run two minimal C participants (source below), each:
    • Creates a participant connected to the registry
    • Creates a SilKit_LifecycleService with operationMode = SilKit_OperationMode_Coordinated
    • Creates a SilKit_TimeSyncService and registers a simulation step handler via SilKit_TimeSyncService_SetSimulationStepHandler (initial step size 1ms)
    • Calls SilKit_LifecycleService_StartLifecycle()
    • Calls SilKit_LifecycleService_WaitForLifecycleToComplete()

Expected behavior

  • The system controller log shows both participants connecting, then: All required participants connected to initiate the coordinated simulation start.
  • Each participant's simulation step handler is invoked repeatedly, with now advancing in 1ms increments.
  • WaitForLifecycleToComplete() blocks until the simulation is explicitly stopped (e.g. via Ctrl+C on the system controller).

Actual behavior

  • System controller log confirms both participants connect and that the coordinated simulation start was initiated.
  • Almost immediately afterward, the registry/system controller logs show the participant(s) disconnecting (sometimes repeatedly reconnecting/disconnecting in a loop).
  • The simulation step handler is called zero or very few times.
  • WaitForLifecycleToComplete() returns almost instantly, and the process prints "simulation ended" within a fraction of a second of starting, instead of remaining in the running state.

This was confirmed with a fully minimal reproduction (~40 lines per participant, no FreeRTOS, no CAN controllers, no custom RTE layer — just lifecycle + time sync) after first observing the same behavior in a larger application (vECU with a CAN controller). Removing all application logic did not change the outcome.

Minimal repro source (per participant, name changed between A/B)

#include "silkit/capi/SilKit.h"
#include "silkit/capi/Participant.h"
#include "silkit/capi/Orchestration.h"
#include <stdio.h>
#include <string.h>
#include <windows.h>

void SimStep(void* context, SilKit_TimeSyncService* ts,
             SilKit_NanosecondsTime now, SilKit_NanosecondsTime duration)
{
    (void)context; (void)ts; (void)duration;
    printf("[STEP] sim_time=%llums\n", (unsigned long long)(now/1000000ULL));
    fflush(stdout);
    Sleep(50);
}

int main(void) {
    SilKit_ParticipantConfiguration* cfg = NULL;
    SilKit_ParticipantConfiguration_FromString(&cfg, "{}");

    SilKit_Participant* p = NULL;
    SilKit_Participant_Create(&p, cfg, "ParticipantA",
                               "silkit://localhost:8500");
    printf("[OK] Connected\n"); fflush(stdout);

    SilKit_LifecycleConfiguration lc;
    memset(&lc, 0, sizeof(lc));
    lc.operationMode = SilKit_OperationMode_Coordinated;

    SilKit_LifecycleService* lifecycle = NULL;
    SilKit_LifecycleService_Create(&lifecycle, p, &lc);

    SilKit_TimeSyncService* ts = NULL;
    SilKit_TimeSyncService_Create(&ts, lifecycle);
    SilKit_TimeSyncService_SetSimulationStepHandler(
        ts, NULL, &SimStep, 1000000ULL);

    printf("[OK] Starting lifecycle...\n"); fflush(stdout);
    SilKit_LifecycleService_StartLifecycle(lifecycle);

    printf("[OK] StartLifecycle returned, waiting for completion...\n");
    fflush(stdout);

    SilKit_LifecycleService_WaitForLifecycleToComplete(lifecycle, NULL);

    printf("[DONE] Simulation ended.\n");
    return 0;
}

Build flags used:

gcc -O0 -g3 -Wall -IC:/SilKit/SilKit/include src/test.c -o test.exe -LC:/SilKit/SilKit/lib -lSilKit -lwinmm -lws2_32

Logs

System controller:

Coordinated simulation start requires 2 participants: 'ParticipantA', 'ParticipantB'
Participant 'ParticipantA' connected.
Waiting for participant: 'ParticipantB'
Participant 'ParticipantB' connected.
All required participants connected to initiate the coordinated simulation start.
Participant 'ParticipantB' has disconnected.
Participant 'ParticipantB' connected.
Participant 'ParticipantB' has disconnected.

Participant stdout:

[OK] Connected
[OK] Starting lifecycle...
[OK] StartLifecycle returned, waiting for completion...
[DONE] Simulation ended.

(No [STEP] lines printed, or only one before exit.)

Additional notes

  • The same participant, switched to SilKit_OperationMode_Autonomous (no system controller, no time sync), connects and runs indefinitely without issue — sending/receiving CAN frames correctly within its own process.
  • This was tested across multiple full clean restarts of the registry and system controller (all SIL Kit processes killed via taskkill between attempts) to rule out stale participant-name state, with the same result each time.
  • I searched existing issues for "coordinated", "lifecycle", "WaitForLifecycleToComplete", and related terms and did not find an existing report matching this behavior — apologies in advance if this turns out to be a duplicate or known limitation of the C API on Windows/MinGW specifically.

Question

Is SetSimulationStepHandler (synchronous variant) plus WaitForLifecycleToComplete expected to work as described above when used purely from the C API on a MinGW-built Windows binary? Or is there an additional required call (e.g. around CompleteSimulationStep, or a different handler registration order relative to StartLifecycle) that isn't reflected in the current Developer Guide examples, which are C++-only?

Happy to provide full --log trace output, a zipped minimal repro project, or test against a different SIL Kit version if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions