π Paper: https://arxiv.org/abs/2512.12078
- Γgney Lopes Roth Ferraz
- Sidnei Barbieri
- Murray Evangelista de Souza
- LourenΓ§o A. Pereira JΓΊnior
This repository accompanies the paper:
The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation
It provides an experimental environment to evaluate procedural completeness of STIX-based threat intelligence and its ability to reproduce APT campaign behaviors in a controlled laboratory environment.
The repository includes:
- A Docker-based adversary emulation environment
- Automated generation of attacker commands
- Integration with MITRE Caldera
- Tools for evaluating the procedural gaps in structured CTI
The environment uses multiple Docker containers connected through static networks.
| Network | Subnet | Host | IP |
|---|---|---|---|
| caldera-kali-network | 172.20.0.0/24 | caldera | 172.20.0.10 |
| caldera-kali-network | 172.20.0.0/24 | kali | 172.20.0.20 |
| kali-nginx-network | 172.21.0.0/24 | kali | 172.21.0.10 |
| kali-nginx-network | 172.21.0.0/24 | nginx | 172.21.0.20 |
| nginx-db-network | 172.22.0.0/24 | nginx | 172.22.0.10 |
| nginx-db-network | 172.22.0.0/24 | db | 172.22.0.20 |
.
βββ architecture.png # Network architecture diagram
β
βββ docker/ # Docker environment and network configuration
β βββ docker-compose.yml
β βββ kali-data/ # Persistent data for Kali container
β βββ .docker/ # Container build contexts
β βββ caldera/ # MITRE Caldera container
β βββ db/ # Database container
β βββ kali/ # Kali attacker container
β βββ nginx/ # Target web server container
β
βββ sticks/ # STIX analysis and campaign generation framework
β βββ config/ # Configuration files
β βββ data/ # Dataset and intermediate artifacts
β βββ lib/ # Core libraries
β βββ tools/ # Utility scripts
β βββ main.py # Main execution entry point
β βββ requirements.txt # Python dependencies
β βββ README.md # STIX framework documentation
Before running the system, install the required Python dependencies:
pip install -r requirements.txtOpen the file:
config/config.py
and configure the following variables.
Insert your GitHub token:
GITHUB_TOKEN = "your_token_here"You can generate one here:
https://github.com/settings/tokens
If you want to generate attacker commands using Azure, configure:
AZURE_SECRET_KEY
AZURE_ENDPOINT
AZURE_DEPLOYMENTOtherwise, these fields can remain empty.
cd docker
docker-compose up --buildWait until all containers are built and running.
Move to the sticks directory and run:
python tools/empty_caldera.py
python main.py initOpen your browser and navigate to:
http://localhost:8888
Login credentials:
user: red
password: admin
Then go to:
Operations
to observe the running campaigns: APT41 β DUST ATT&CK Campaign C0010 ATT&CK Campaign C0026 CostaRicto Operation MidnightEclipse Operation Outer Space Salesforce Data Exfiltration ShadowRay
The purpose of this framework is to:
- Evaluate structural completeness of STIX CTI datasets
- Measure the procedural requirements needed for campaign execution
- Identify missing operational knowledge required to reproduce APT campaigns
- Support automated adversary emulation experiments
The framework implements a pipeline that converts structured CTI (STIX campaigns) into executable adversary behaviors in a controlled emulation environment.
The workflow is divided into three stages:
- Stage 1 performs automated structural modeling of CTI data.
- Stage 2 introduces a human-in-the-loop translation from abstract behavioral descriptions to minimal executable steps.
- Stage 3 integrates these steps into an adversary emulation environment.
The diagram below is conceptual rather than an implementation diagram and does not assume any specific serialization.
flowchart TD
A[STIX Campaign Dataset]
B[Stage 1<br>Structural Modeling<br><br>Automated parsing of CTI data<br>Technique extraction<br>Relationship modeling]
C[Stage 2<br>Human-in-the-Loop Translation<br><br>Abstract behavior β executable steps<br>Command identification<br>Dependency resolution]
D[Stage 3<br>Adversary Emulation<br><br>Integration into emulation environment<br>Campaign execution<br>Experimental evaluation]
A --> B
B --> C
C --> D
In the first stage, the framework performs automated structural modeling of cyber threat intelligence data.
The system parses STIX campaign datasets and extracts relevant structural elements, including:
- Techniques
- Relationships
- Indicators
- Infrastructure
- Malware
- Campaign metadata
This process constructs an intermediate representation of the campaign by identifying:
- attack techniques
- dependencies between actions
- structural relationships within the campaign
The result is a machine-readable model of adversary activity derived directly from CTI data.
The second stage introduces a human-in-the-loop translation process.
Structured CTI often describes adversary behavior at an abstract level (e.g., techniques or tactics) rather than providing precise execution procedures. In this stage, an analyst translates these abstract behavioral descriptions into minimal executable steps.
This step includes:
- identifying the concrete commands required to reproduce a technique
- resolving implicit environmental assumptions
- defining execution order and dependencies
- specifying required privileges or infrastructure
The outcome is a set of minimal operational procedures that can be executed by an adversary emulation platform.
In the final stage, the translated procedures are integrated into an adversary emulation environment.
The generated steps are converted into executable components and deployed within a controlled laboratory infrastructure built with Docker containers.
The environment includes multiple hosts representing a simplified enterprise architecture, such as:
- attacker node
- command and control infrastructure
- intermediary services
- backend systems
Campaign execution is then orchestrated using an adversary emulation platform, allowing the system to observe whether the CTI-derived behaviors can be successfully reproduced.
This stage enables empirical evaluation of procedural completeness in structured CTI datasets.
If you use this repository in your research, please cite:
@article{ferraz2025procedural,
title={The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation},
author={Ferraz, Γgney Lopes Roth and Barbieri, Sidnei and de Souza, Murray Evangelista and Pereira JΓΊnior, LourenΓ§o A.},
year={2025},
journal={arXiv preprint arXiv:2512.12078}
}GNU GPLv3.
