Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c72964b
feat(opcua): event subscription primitive with generation counter
mfaferek93 Apr 25, 2026
bb582d7
feat(opcua): AlarmConditionType state machine, poller wiring, ack/con…
mfaferek93 Apr 25, 2026
c57feef
test(opcua): test_alarm_server fixture, docker integration, CI workfl…
mfaferek93 Apr 25, 2026
d00a033
test(opcua): CTest smoke wrapper for test_alarm_server fixture
mfaferek93 Apr 25, 2026
b86aade
test(opcua): exercise SOVD ack/confirm + cover shelve/disable/reconne…
mfaferek93 Apr 25, 2026
ba8e956
fix(opcua,test): unblock alarm test FIFO + pre-write discovery manifest
mfaferek93 Apr 25, 2026
9aa26ac
fix(opcua,test): dump container logs in cleanup trap before removing …
mfaferek93 Apr 25, 2026
1a0273c
fix(opcua,test): keep docker stdin alive + stage gateway_params for b…
mfaferek93 Apr 25, 2026
9233289
debug(opcua): trace NodeId + status in add_event_monitored_item
mfaferek93 Apr 25, 2026
424e2bc
debug(opcua): use deep-copy NodeId + AlarmConditionType filter type +…
mfaferek93 Apr 25, 2026
a7b09e9
fix(opcua,#386): wire SOVD ack/confirm E2E + cover shelve/disable/rec…
mfaferek93 Apr 25, 2026
6fead31
style(opcua): apply clang-format-18 to diagnostic stderr logs
mfaferek93 Apr 25, 2026
2dae7c7
chore(opcua,#386): post-review quick wins (idempotence, scenario name…
mfaferek93 Apr 26, 2026
12b3406
fix(opcua,#386): operator-visible warn when ConditionRefresh is rejected
mfaferek93 Apr 26, 2026
74b8460
test(opcua,#386): unit cover the call_method per-arg result classifier
mfaferek93 Apr 26, 2026
b490c2a
test(opcua,#386): cover the missing AlarmStateMachine transition cells
mfaferek93 Apr 26, 2026
f66bb5b
fix(opcua,#386): per-MI active flag for event MI removal (Copilot rev…
mfaferek93 Apr 26, 2026
9e7d2c8
fix(opcua,#386): Copilot review feedback batch (observability + hygiene)
mfaferek93 Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/opcua-plugin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,3 +223,32 @@ jobs:
run: |
docker rm -f gateway openplc 2>/dev/null || true
docker network rm plc-demo 2>/dev/null || true

integration-alarms:
name: Integration (AlarmConditionType)
# Issue #386: tests the native OPC-UA AlarmCondition subscription bridge
# against the test_alarm_server fixture (open62541 with FULL ns0 + alarms
# ON). Independent of the OpenPLC threshold-mode integration above; runs
# in parallel.
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install jq + asyncua (smoke test prerequisite)
run: |
sudo apt-get update
sudo apt-get install -y jq python3-pip
pip3 install --break-system-packages asyncua

- name: Run alarm integration suite
run: bash src/ros2_medkit_plugins/ros2_medkit_opcua/docker/scripts/run_alarm_tests.sh

- name: Dump container logs on failure
if: failure()
run: |
for c in alarm-test-server alarm-test-gateway; do
echo "=== ${c} logs ==="
docker logs "${c}" 2>&1 | tail -80 || true
done
10 changes: 10 additions & 0 deletions src/ros2_medkit_plugins/ros2_medkit_opcua/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@
Changelog for package ros2_medkit_opcua
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forthcoming
-----------
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin not in the top-level docs aggregator.

The substantive content added in this PR (this CHANGELOG entry + 162 lines of design/index.rst) does not render on https://selfpatch.github.io/ros2_medkit/ because:

  1. docs/changelog.rst aggregates 13 packages via .. include:: - ros2_medkit_opcua is not among them.
  2. docs/design/index.rst toctree lists 14 packages including ros2_medkit_graph_provider/index (precedent for plugins) but not the OPCUA plugin.

Quick fix:

# docs/changelog.rst
.. include:: ../src/ros2_medkit_plugins/ros2_medkit_opcua/CHANGELOG.rst

# docs/design/ros2_medkit_opcua/index.rst -> symlink to ../../src/ros2_medkit_plugins/ros2_medkit_opcua/design/index.rst
# docs/design/index.rst toctree: add `ros2_medkit_opcua/index`

* Native OPC-UA Part 9 ``AlarmConditionType`` event subscription. The plugin now subscribes to vendor-defined alarms (Siemens S7-1500 ``Program_Alarm`` / ProDiag, Beckhoff TF6100, CodeSys 3.5+, Rockwell via FactoryTalk Linx) and bridges each event into the SOVD fault lifecycle. Configured via a new top-level ``event_alarms:`` block in the node map YAML; mutually exclusive per entry with the existing threshold-based ``alarm`` form. (issue #386)
* New SOVD operations on entities that host alarm sources: ``acknowledge_fault`` invokes the inherited ``Acknowledge`` method on the live ``ConditionId`` (i=9111, EventId tracked per Part 9 §5.7.3); ``confirm_fault`` invokes ``Confirm`` (i=9113). Both accept an optional ``comment`` rendered as ``LocalizedText`` on the server.
* ``OpcuaClient`` gains ``add_event_monitored_item`` / ``remove_event_monitored_item`` / ``call_method`` and a generation counter that filters callbacks fired from defunct subscriptions after a reconnect. Heap-owned ``EventCallbackContext`` resolves the open62541pp / raw-C lifetime hazard.
* Header-only ``AlarmStateMachine`` mapping ``EnabledState x ShelvingState x ActiveState x AckedState x ConfirmedState x BranchId`` to SOVD ``CONFIRMED / HEALED / CLEARED / Suppressed``. Full transition table documented in ``design/index.rst``.
* ``ConditionRefresh`` (Server method i=3875) is invoked on subscribe and on every reconnect, with ``RefreshStartEvent`` / ``RefreshEndEvent`` bracketing tracked for diagnostics.
* New ``test_alarm_server`` fixture (open62541-based, full namespace 0 + alarms enabled) emits AlarmConditionType events on stdin commands; integration test ``run_alarm_tests.sh`` runs in CI alongside the existing OpenPLC threshold suite. The fixture builds by default via the workspace ``colcon build`` (gated on ``MEDKIT_OPCUA_BUILD_ALARM_SERVER`` which defaults to ON; ``ExternalProject_Add`` rebuilds open62541 with ``UA_NAMESPACE_ZERO=FULL`` and alarms ON, with a serial sub-build to dodge the upstream ``-j`` race on ``namespace0_generated.c``).
* New CTest wrapper ``test_alarm_server_smoke`` boots the fixture on an ephemeral port and runs the asyncua smoke test against it; skips with CTest exit 77 (treated as pass) when ``asyncua`` is not importable, so iterating on plugin code without the Python dependency does not fail the suite.

0.4.0 (2026-04-11)
------------------
* Initial release
Expand Down
97 changes: 97 additions & 0 deletions src/ros2_medkit_plugins/ros2_medkit_opcua/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,103 @@ if(BUILD_TESTING)
)
medkit_set_test_domain(test_opcua_client)

# Issue #386: pure-function state machine tests. Header-only target -
# no opcua dependency at link time so it runs fast and is sanitizer
# clean independent of the open62541pp build flavour.
ament_add_gtest(test_alarm_state_machine
test/test_alarm_state_machine.cpp
)
target_include_directories(test_alarm_state_machine PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
)
medkit_set_test_domain(test_alarm_state_machine)

# ---- test_alarm_server fixture ------------------------------------------
# Standalone OPC-UA server emitting AlarmConditionType events for
# integration testing of native alarm subscriptions (issue #386).
#
# The plugin's main open62541pp build is configured with
# UA_NAMESPACE_ZERO=REDUCED and UA_ENABLE_SUBSCRIPTIONS_ALARMS_CONDITIONS=OFF
# because the runtime client path uses neither. Re-enabling alarms there
# would force every consumer of open62541pp to pull in the full namespace 0
# (~5MB of generated source) and the EXPERIMENTAL A&C subsystem.
#
# Instead we build a second open62541 statically via ExternalProject_Add
# using the source already on disk from FetchContent, pinned to FULL ns0
# and alarms ON. The fixture binary links only against this private copy.
# Defaults ON because the fixture is part of the standard test suite for
# issue #386. The earlier `-j` race on open62541 namespace0_generated.c is
# fixed by forcing a serial sub-build (BUILD_COMMAND below).
option(MEDKIT_OPCUA_BUILD_ALARM_SERVER
"Build the OPC-UA AlarmCondition test fixture server (issue #386)" ON)
if(MEDKIT_OPCUA_BUILD_ALARM_SERVER)
find_package(Threads REQUIRED)
include(ExternalProject)
set(_alarm_o62_src "${open62541pp_SOURCE_DIR}/3rdparty/open62541")
set(_alarm_o62_install "${CMAKE_BINARY_DIR}/_alarm_open62541")
externalproject_add(alarm_open62541_ep
SOURCE_DIR "${_alarm_o62_src}"
PREFIX "${CMAKE_BINARY_DIR}/_alarm_open62541_ep"
INSTALL_DIR "${_alarm_o62_install}"
CMAKE_ARGS
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=${_alarm_o62_install}
-DBUILD_SHARED_LIBS=OFF
-DUA_ENABLE_SUBSCRIPTIONS_EVENTS=ON
-DUA_ENABLE_SUBSCRIPTIONS_ALARMS_CONDITIONS=ON
-DUA_NAMESPACE_ZERO=FULL
-DUA_BUILD_EXAMPLES=OFF
-DUA_BUILD_TOOLS=OFF
-DUA_FORCE_WERROR=OFF
-DCMAKE_POSITION_INDEPENDENT_CODE=ON
-DCMAKE_C_FLAGS=-w
# Serial build avoids a -j race in open62541's Ninja-style codegen
# where a parallel writer can clobber namespace0_generated.c.o.d
# before the .d file is materialized. The build is one-shot, the
# 30-60s overhead is well worth predictability.
BUILD_COMMAND ${CMAKE_COMMAND} --build <BINARY_DIR>
BUILD_BYPRODUCTS "${_alarm_o62_install}/lib/libopen62541.a"
UPDATE_DISCONNECTED 1
)
set(_alarm_o62_lib "${_alarm_o62_install}/lib/libopen62541.a")
set(_alarm_o62_include "${_alarm_o62_install}/include")
file(MAKE_DIRECTORY "${_alarm_o62_include}")

add_executable(test_alarm_server
test/fixtures/test_alarm_server/test_alarm_server.cpp)
add_dependencies(test_alarm_server alarm_open62541_ep)
target_include_directories(test_alarm_server SYSTEM PRIVATE
"${_alarm_o62_include}")
target_link_libraries(test_alarm_server PRIVATE
"${_alarm_o62_lib}" Threads::Threads)
set_target_properties(test_alarm_server PROPERTIES
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}")
target_compile_options(test_alarm_server PRIVATE -w)

install(TARGETS test_alarm_server
RUNTIME DESTINATION lib/${PROJECT_NAME})
install(PROGRAMS
test/fixtures/test_alarm_server/smoke_test.py
test/fixtures/test_alarm_server/run_ctest.py
DESTINATION lib/${PROJECT_NAME})

# CTest wrapper that boots test_alarm_server on an ephemeral port and runs
# the asyncua-based smoke test against it. Skips with CTest exit 77 when
# asyncua is not installed, so iterating on plugin code without the python
# dep does not fail the suite. CI installs asyncua and observes a real
# pass / fail.
find_package(Python3 REQUIRED COMPONENTS Interpreter)
add_test(NAME test_alarm_server_smoke
COMMAND "${Python3_EXECUTABLE}"
"${CMAKE_CURRENT_SOURCE_DIR}/test/fixtures/test_alarm_server/run_ctest.py"
"${CMAKE_BINARY_DIR}/test_alarm_server"
"${CMAKE_CURRENT_SOURCE_DIR}/test/fixtures/test_alarm_server/smoke_test.py")
set_tests_properties(test_alarm_server_smoke PROPERTIES
LABELS "integration"
SKIP_RETURN_CODE 77
TIMEOUT 60)
endif()

ros2_medkit_relax_vendor_warnings()
endif()

Expand Down
20 changes: 20 additions & 0 deletions src/ros2_medkit_plugins/ros2_medkit_opcua/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,28 @@ nodes:
message: Tank level below minimum
threshold: 100.0
above_threshold: false # Alarm when value < threshold

# Native OPC-UA AlarmConditionType events (issue #386). Subscribes to alarms
# defined inside the PLC (Siemens Program_Alarm / ProDiag, Beckhoff TF6100,
# CodeSys 3.5+, Rockwell via FactoryTalk Linx). Mutually exclusive per entry
# with the threshold-based alarm form above.
event_alarms:
- alarm_source: "ns=4;s=Alarms.Overpressure"
entity_id: tank_process
fault_code: PLC_OVERPRESSURE
```

The plugin auto-registers `acknowledge_fault` and `confirm_fault` operations
on every entity that has at least one `event_alarms` entry. Invoke them with:

```bash
curl -X POST http://localhost:8080/api/v1/apps/tank_process/operations/acknowledge_fault/executions \
-H 'Content-Type: application/json' \
-d '{"fault_code":"PLC_OVERPRESSURE","comment":"operator on radio"}'
```

See `design/index.rst` for the full state machine table and vendor matrix.

### Gateway Parameters

```yaml
Expand Down
166 changes: 164 additions & 2 deletions src/ros2_medkit_plugins/ros2_medkit_opcua/design/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,168 @@ Untracked (open the issue if you hit the pain):

- Hot-reload of the node map without restarting the plugin
- Complex OPC-UA type support (structures, arrays, enums)
- Native ``AlarmCondition`` event subscription as a complement to
threshold polling
- Vendor information model bindings (Euromap 77, Siemens DI, PA-DIM)


Native ``AlarmConditionType`` event subscription (issue #386)
=============================================================

The plugin subscribes to native OPC-UA Part 9 ``AlarmConditionType`` events
emitted by vendor PLCs, in addition to the threshold-based polling path. Both
modes coexist in a single ``node_map.yaml`` (different YAML keys) and feed the
same ``fault_manager`` service.

Configuration
-------------

Two YAML forms describe alarms; an entry can use one but never both:

.. code-block:: yaml

nodes:
# Threshold-based (existing): polls the scalar value and raises a fault
# when it crosses the configured threshold.
- node_id: 'ns=2;i=2'
entity_id: tank_process
data_name: tank_temperature
data_type: float
alarm:
fault_code: TANK_OVERHEAT
severity: ERROR
threshold: 80.0
above_threshold: true

event_alarms:
# Native AlarmConditionType (new): subscribes to events emitted from
# the source NodeId and bridges them through the state machine below.
- alarm_source: 'ns=4;s=Alarms.Overpressure'
entity_id: tank_process
fault_code: PLC_OVERPRESSURE
severity_override: ERROR # optional - else derived from event Severity
message: 'Tank overpressure' # optional - else event Message field

State machine
-------------

Inputs from each event payload (positional ``EventFilter`` select clauses):

- ``EnabledState.Id`` - bool
- ``ShelvingState.CurrentState.Id`` - NodeId; non-Unshelved => suppressed
- ``ActiveState.Id`` - bool
- ``AckedState.Id`` - bool
- ``ConfirmedState.Id`` - bool
- ``BranchId`` - NodeId; non-null means historical branch (Part 9 §5.5.2.12)

Decision order, first match wins:

+-----+--------------------------------------+---------------------------------------+
| # | Condition | Outcome |
+=====+======================================+=======================================+
| 1 | ``BranchId != null`` | history-only (no SOVD update) |
+-----+--------------------------------------+---------------------------------------+
| 2 | ``EnabledState == false`` | clear if was active, else no-op |
+-----+--------------------------------------+---------------------------------------+
| 3 | ``ShelvingState != Unshelved`` | clear if was active, else no-op |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc-code mismatch on rule 3.

Doc says ShelvingState != Unshelved -> Suppressed. Implementation in opcua_poller.cpp:382-389 only treats specific NodeIds (i=2930 TimedShelved, i=2932 OneShotShelved) as suppressing - a null/missing/unknown Id is treated as Unshelved (with a deliberate code comment explaining why). Reader of the doc would expect the opposite for unset Ids.

Update to: ShelvingState in {TimedShelved (i=2930), OneShotShelved (i=2932)}. Note explicitly that null/unset Id is treated as Unshelved.

+-----+--------------------------------------+---------------------------------------+
| 4 | ``ActiveState == true`` | ``CONFIRMED`` (idempotent) |
+-----+--------------------------------------+---------------------------------------+
| 5a | ``ActiveState == false`` and | ``HEALED`` (latched, awaiting ack) |
| | not (``Acked`` and ``Confirmed``) | |
+-----+--------------------------------------+---------------------------------------+
| 5b | ``ActiveState == false``, | ``CLEARED`` |
| | ``Acked == true``, | |
| | ``Confirmed == true`` | |
+-----+--------------------------------------+---------------------------------------+

``Retain`` is intentionally NOT used for state determination. Per Part 9
§5.5.2.10 it controls visibility during ``ConditionRefresh`` bursts only;
lifecycle is driven entirely by Active / Acked / Confirmed. The SOVD
``PREFAILED`` state has no native equivalent and is reserved for the
threshold-polling pre-trigger path.

Severity mapping
----------------

OPC-UA severity is a 1-1000 scalar. The plugin maps it to selfpatch's SOVD
severity buckets:

- 1-200 -> ``INFO``
- 201-500 -> ``WARNING``
- 501-800 -> ``ERROR``
- 801-1000 -> ``CRITICAL``

This is the selfpatch convention, **not** IEC 62682 - that spec defines a
1-1000 priority scale but no normative band names. ``severity_override`` on
an ``event_alarms`` entry takes precedence when set.

ConditionRefresh
----------------

After creating event monitored items the plugin invokes ``ConditionRefresh``
(Server object ``i=2253``, method ``i=3875``) so the server pushes any
condition that fired before the subscription started. The same call fires
on every successful reconnect.

The bracketing ``RefreshStartEventType`` (i=2787) and ``RefreshEndEventType``
(i=2788) are recognized and used to set a diagnostic flag; live notifications
arriving during the burst are applied normally because the state machine is
driven by per-condition ``ConditionId`` and runs idempotently.

Acknowledge / Confirm round-trip
--------------------------------

Two SOVD operations appear on every entity that has at least one event-mode
alarm declared:

- ``POST /apps/{entity}/operations/acknowledge_fault/executions``
- ``POST /apps/{entity}/operations/confirm_fault/executions``

Body:

.. code-block:: json

{ "fault_code": "PLC_OVERPRESSURE", "comment": "operator on radio" }

The plugin resolves ``(entity_id, fault_code)`` to the live ``ConditionId``
maintained by the poller, then calls the inherited
``AcknowledgeableConditionType`` method (``Acknowledge`` ``i=9111`` or
``Confirm`` ``i=9113``) on that NodeId. The latest ``EventId`` ``ByteString``
captured from the most recent notification is passed as the first argument;
without it servers return ``BadEventIdUnknown`` (Part 9 §5.7.3).

Vendor matrix
-------------

+----------------------+---------------------+-------------------------------+
| Vendor / runtime | AlarmConditionType | Notes |
+======================+=====================+===============================+
| Siemens S7-1500 | yes (FW V2.9+) | ProDiag, Program_Alarm, |
| | | system diagnostics |
+----------------------+---------------------+-------------------------------+
| Beckhoff TwinCAT 3 | yes (TF6100) | ``Confirm`` propagates to |
| | | PLC code; ``Ack`` does not |
+----------------------+---------------------+-------------------------------+
| Rockwell ControlLogix| yes (via FactoryTalk| Tag-based alarms bridged by |
| | Linx FW 16.20+) | the gateway |
+----------------------+---------------------+-------------------------------+
| CodeSys 3.5+ | yes (alarm manager | Custom severity mapping |
| | provider library) | |
+----------------------+---------------------+-------------------------------+
| OpenPLC v3 | no | Scalar variables only; |
| | | use threshold mode |
+----------------------+---------------------+-------------------------------+

Out of scope
------------

- ``ShelvingState`` write operations (``TimedShelve`` / ``OneShotShelve`` /
``Unshelve``). The plugin reads the state to suppress active alarms but
does not yet expose operator UI to set it.
- OPC-UA branch reasoning beyond ``BranchId``-based suppression.
Re-fires are tracked via ``fault_manager`` ``occurrence_count`` plus the
``/faults/stream`` SSE history.
- Auto-discovery of alarm sources via ``Server.GeneratedEvents`` browse
(tracked in #368 alongside scalar auto-discovery).
- ``Quality`` (StatusCode) propagation to a SOVD ``status_quality`` field.
Requires an additive field on the ``ReportFault.srv`` schema; tracked
separately.
Loading
Loading