Skip to content

Commit c4089da

Browse files
authored
Merge branch 'develop' into isovalent_batch_1
2 parents b50fe13 + 984e650 commit c4089da

File tree

9 files changed

+321
-5
lines changed

9 files changed

+321
-5
lines changed

.github/workflows/appinspect.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
steps:
99
- name: Check out repository (PR)
1010
if: ${{ github.event_name == 'pull_request' }}
11-
uses: actions/checkout@v5
11+
uses: actions/checkout@v6
1212
with:
1313
ref: refs/pull/${{ github.event.pull_request.number }}/merge
1414

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
runs-on: ubuntu-latest
1111
steps:
1212
- name: Check out the repository code
13-
uses: actions/checkout@v5
13+
uses: actions/checkout@v6
1414

1515
- uses: actions/setup-python@v6
1616
with:

.github/workflows/datasource-dependabot.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111

1212
steps:
1313
- name: Checkout repository
14-
uses: actions/checkout@v5
14+
uses: actions/checkout@v6
1515
with:
1616
ref: 'develop'
1717
token: ${{ secrets.DATA_SOURCES_DEPENDABOT }}

.github/workflows/labeler.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
pull-requests: write
1010
runs-on: ubuntu-latest
1111
steps:
12-
- uses: actions/checkout@v5
12+
- uses: actions/checkout@v6
1313
with:
1414
repository: "splunk/security_content"
1515
- uses: actions/labeler@v6

.github/workflows/unit-testing.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
steps:
1010
#For fork PRs, always check out security_content and the PR target in security content!
1111
- name: Check out the repository code
12-
uses: actions/checkout@v5
12+
uses: actions/checkout@v6
1313
with:
1414
repository: 'splunk/security_content' #this should be the TARGET repo of the PR. we hardcode it for now
1515
ref: ${{ github.base_ref }}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
name: LLM Model File Creation
2+
id: 23e5b797-378d-45d6-ab3e-d034ca12a99b
3+
version: 1
4+
date: '2025-11-12'
5+
author: Rod Soto
6+
status: production
7+
type: Hunting
8+
description: |
9+
Detects the creation of Large Language Model (LLM) files on Windows endpoints by monitoring file creation events for specific model file formats and extensions commonly used by local AI frameworks.
10+
This detection identifies potential shadow AI deployments, unauthorized model downloads, and rogue LLM infrastructure by detecting file creation patterns associated with quantized models (.gguf, .ggml), safetensors model format files, and Ollama Modelfiles.
11+
These file types are characteristic of local inference frameworks such as Ollama, llama.cpp, GPT4All, LM Studio, and similar tools that enable running LLMs locally without cloud dependencies.
12+
Organizations can use this detection to identify potential data exfiltration risks, policy violations related to unapproved AI usage, and security blind spots created by decentralized AI deployments that bypass enterprise governance and monitoring.
13+
data_source:
14+
- Sysmon EventID 11
15+
search: |
16+
| tstats `security_content_summariesonly` count
17+
min(_time) as firstTime
18+
max(_time) as lastTime
19+
from datamodel=Endpoint.Filesystem
20+
where Filesystem.file_name IN (
21+
"*.gguf*",
22+
"*ggml*",
23+
"*Modelfile*",
24+
"*safetensors*"
25+
)
26+
by Filesystem.action Filesystem.dest Filesystem.file_access_time Filesystem.file_create_time
27+
Filesystem.file_hash Filesystem.file_modify_time Filesystem.file_name Filesystem.file_path
28+
Filesystem.file_acl Filesystem.file_size Filesystem.process_guid Filesystem.process_id
29+
Filesystem.user Filesystem.vendor_product
30+
| `drop_dm_object_name(Filesystem)`
31+
| `security_content_ctime(firstTime)`
32+
| `security_content_ctime(lastTime)`
33+
| `llm_model_file_creation_filter`
34+
how_to_implement: |
35+
To successfully implement this search, you need to be ingesting logs with file creation events from your endpoints.
36+
Ensure that the Endpoint data model is properly populated with filesystem events from EDR agents or Sysmon Event ID 11.
37+
The logs must be processed using the appropriate Splunk Technology Add-ons that are specific to the EDR product.
38+
The logs must also be mapped to the `Filesystem` node of the `Endpoint` data model.
39+
Use the Splunk Common Information Model (CIM) to normalize the field names and speed up the data modeling process.
40+
known_false_positives: |
41+
Legitimate creation of LLM model files by authorized developers, ML engineers, and researchers during model training, fine-tuning, or experimentation. Approved AI/ML sandboxes and lab environments where model file creation is expected. Automated ML pipelines and workflows that generate or update model files as part of their normal operation. Third-party applications and services that manage or cache LLM model files for legitimate purposes.
42+
references:
43+
- https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon
44+
- https://www.ibm.com/think/topics/shadow-ai
45+
- https://www.splunk.com/en_us/blog/artificial-intelligence/splunk-technology-add-on-for-ollama.html
46+
- https://blogs.cisco.com/security/detecting-exposed-llm-servers-shodan-case-study-on-ollama
47+
tags:
48+
analytic_story:
49+
- Suspicious Local LLM Frameworks
50+
asset_type: Endpoint
51+
mitre_attack_id:
52+
- T1543
53+
product:
54+
- Splunk Enterprise
55+
- Splunk Enterprise Security
56+
- Splunk Cloud
57+
security_domain: endpoint
58+
tests:
59+
- name: True Positive Test
60+
attack_data:
61+
- data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/suspicious_behaviour/local_llms/sysmon_local_llms.log
62+
source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
63+
sourcetype: XmlWinEventLog
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: Local LLM Framework DNS Query
2+
id: d7ceffc5-a45e-412b-b9fa-2ba27c284503
3+
version: 1
4+
date: '2025-11-12'
5+
author: Rod Soto
6+
status: production
7+
type: Hunting
8+
description: |
9+
Detects DNS queries related to local LLM models on endpoints by monitoring Sysmon DNS query events (Event ID 22) for known LLM model domains and services.
10+
Local LLM frameworks like Ollama, LM Studio, and GPT4All make DNS calls to repositories such as huggingface.co and ollama.ai for model downloads, updates, and telemetry.
11+
These queries can reveal unauthorized AI tool usage or data exfiltration risks on corporate networks.
12+
data_source:
13+
- Sysmon EventID 22
14+
search: |
15+
`sysmon`
16+
EventCode=22
17+
QueryName IN (
18+
"*huggingface*",
19+
"*ollama*",
20+
"*jan.ai*",
21+
"*gpt4all*",
22+
"*nomic*",
23+
"*koboldai*",
24+
"*lmstudio*",
25+
"*modelscope*",
26+
"*civitai*",
27+
"*oobabooga*",
28+
"*replicate*",
29+
"*anthropic*",
30+
"*openai*",
31+
"*openrouter*",
32+
"*api.openrouter*",
33+
"*aliyun*",
34+
"*alibabacloud*",
35+
"*dashscope.aliyuncs*"
36+
)
37+
NOT Image IN (
38+
"*\\MsMpEng.exe",
39+
"C:\\ProgramData\\*",
40+
"C:\\Windows\\System32\\*",
41+
"C:\\Windows\\SysWOW64\\*"
42+
)
43+
| stats count
44+
min(_time) as firstTime
45+
max(_time) as lastTime
46+
by src Image process_name QueryName query_count answer answer_count reply_code_id vendor_product
47+
| `security_content_ctime(firstTime)`
48+
| `security_content_ctime(lastTime)`
49+
| `local_llm_framework_dns_query_filter`
50+
how_to_implement: |
51+
Ensure Sysmon is deployed across Windows endpoints and configured to capture DNS query events (Event ID 22). Configure Sysmon's XML configuration file to log detailed command-line arguments, parent process information, and full process image paths. Ingest Sysmon event logs into Splunk via the Splunk Universal Forwarder or Windows Event Log Input, ensuring they are tagged with `sourcetype=XmlWinEventLog:Microsoft-Windows-Sysmon/Operational`. Verify the `sysmon` macro exists in your Splunk environment and correctly references the Sysmon event logs. Create or update the `unauthorized_local_llm_framework_usage_filter` macro in your detections/filters folder to exclude approved systems, authorized developers, sanctioned ML/AI workstations, or known development/lab environments as needed. Deploy this hunting search to your Splunk Enterprise Security or Splunk Enterprise instance and schedule it to run on a regular cadence to detect unauthorized LLM model DNS queries and shadow AI activities. Correlate findings with endpoint asset inventory and user identity data to prioritize investigation.
52+
known_false_positives: |
53+
Legitimate DNS queries to LLM model hosting platforms by authorized developers, ML engineers, and researchers during model training, fine-tuning, or experimentation. Approved AI/ML sandboxes and lab environments where LLM model downloads are expected. Automated ML pipelines and workflows that interact with LLM model hosting services as part of their normal operation. Third-party applications and services that access LLM model platforms for legitimate purposes.
54+
references:
55+
- https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon
56+
- https://www.splunk.com/en_us/blog/artificial-intelligence/splunk-technology-add-on-for-ollama.html
57+
- https://blogs.cisco.com/security/detecting-exposed-llm-servers-shodan-case-study-on-ollama
58+
tags:
59+
analytic_story:
60+
- Suspicious Local LLM Frameworks
61+
asset_type: Endpoint
62+
mitre_attack_id:
63+
- T1590
64+
product:
65+
- Splunk Enterprise
66+
- Splunk Enterprise Security
67+
- Splunk Cloud
68+
security_domain: endpoint
69+
tests:
70+
- name: True Positive Test
71+
attack_data:
72+
- data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/suspicious_behaviour/local_llms/sysmon_dns.log
73+
source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
74+
sourcetype: XmlWinEventLog
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
name: Windows Local LLM Framework Execution
2+
id: a3f8e2c9-7d4b-4e1f-9c6a-2b5d8f3e1a7c
3+
version: 1
4+
date: '2025-11-20'
5+
author: Rod Soto, Splunk
6+
status: production
7+
type: Hunting
8+
description: |
9+
The following analytic detects execution of unauthorized local LLM frameworks (Ollama, LM Studio, GPT4All, Jan, llama.cpp, KoboldCPP, Oobabooga, NutStudio) and Python-based AI/ML libraries (HuggingFace Transformers, LangChain) on Windows endpoints by leveraging process creation events.
10+
It identifies cases where known LLM framework executables are launched or command-line arguments reference AI/ML libraries.
11+
This activity is significant as it may indicate shadow AI deployments, unauthorized model inference operations, or potential data exfiltration through local AI systems.
12+
If confirmed malicious, this could lead to unauthorized access to sensitive data, intellectual property theft, or circumvention of organizational AI governance policies.
13+
data_source:
14+
- Sysmon EventID 1
15+
- Windows Event Log Security 4688
16+
- CrowdStrike ProcessRollup2
17+
search: |
18+
| tstats `security_content_summariesonly` count
19+
min(_time) as firstTime
20+
max(_time) as lastTime
21+
from datamodel=Endpoint.Processes
22+
where
23+
(
24+
Processes.process_name IN (
25+
"gpt4all.exe",
26+
"jan.exe",
27+
"kobold.exe",
28+
"koboldcpp.exe",
29+
"llama-run.exe",
30+
"llama.cpp.exe",
31+
"lmstudio.exe",
32+
"nutstudio.exe",
33+
"ollama.exe",
34+
"oobabooga.exe",
35+
"text-generation-webui.exe"
36+
)
37+
OR
38+
Processes.original_file_name IN (
39+
"ollama.exe",
40+
"lmstudio.exe",
41+
"gpt4all.exe",
42+
"jan.exe",
43+
"llama-run.exe",
44+
"koboldcpp.exe",
45+
"nutstudio.exe"
46+
)
47+
OR
48+
Processes.process IN (
49+
"*\\gpt4all\\*",
50+
"*\\jan\\*",
51+
"*\\koboldcpp\\*",
52+
"*\\llama.cpp\\*",
53+
"*\\lmstudio\\*",
54+
"*\\nutstudio\\*",
55+
"*\\ollama\\*",
56+
"*\\oobabooga\\*",
57+
"*huggingface*",
58+
"*langchain*",
59+
"*llama-run*",
60+
"*transformers*"
61+
)
62+
OR
63+
Processes.parent_process_name IN (
64+
"gpt4all.exe",
65+
"jan.exe",
66+
"kobold.exe",
67+
"koboldcpp.exe",
68+
"llama-run.exe",
69+
"llama.cpp.exe",
70+
"lmstudio.exe",
71+
"nutstudio.exe",
72+
"ollama.exe",
73+
"oobabooga.exe",
74+
"text-generation-webui.exe"
75+
)
76+
)
77+
by Processes.action Processes.dest Processes.original_file_name Processes.parent_process
78+
Processes.parent_process_exec Processes.parent_process_guid Processes.parent_process_id
79+
Processes.parent_process_name Processes.parent_process_path Processes.process
80+
Processes.process_exec Processes.process_guid Processes.process_hash Processes.process_id
81+
Processes.process_integrity_level Processes.process_name Processes.process_path Processes.user
82+
Processes.user_id Processes.vendor_product
83+
| `drop_dm_object_name(Processes)`
84+
| eval Framework=case(
85+
match(process_name, "(?i)ollama") OR match(process, "(?i)ollama"), "Ollama",
86+
match(process_name, "(?i)lmstudio") OR match(process, "(?i)lmstudio") OR match(process, "(?i)lm-studio"), "LM Studio",
87+
match(process_name, "(?i)gpt4all") OR match(process, "(?i)gpt4all"), "GPT4All",
88+
match(process_name, "(?i)kobold") OR match(process, "(?i)kobold"), "KoboldCPP",
89+
match(process_name, "(?i)jan") OR match(process, "(?i)jan"), "Jan AI",
90+
match(process_name, "(?i)nutstudio") OR match(process, "(?i)nutstudio"), "NutStudio",
91+
match(process_name, "(?i)llama") OR match(process, "(?i)llama"), "llama.cpp",
92+
match(process_name, "(?i)oobabooga") OR match(process, "(?i)oobabooga") OR match(process, "(?i)text-generation-webui"), "Oobabooga",
93+
match(process, "(?i)transformers") OR match(process, "(?i)huggingface"), "HuggingFace/Transformers",
94+
match(process, "(?i)langchain"), "LangChain",
95+
1=1, "Other"
96+
)
97+
| `security_content_ctime(firstTime)`
98+
| `security_content_ctime(lastTime)`
99+
| table action dest Framework original_file_name parent_process parent_process_exec
100+
parent_process_guid parent_process_id parent_process_name parent_process_path
101+
process process_exec process_guid process_hash process_id process_integrity_level
102+
process_name process_path user user_id vendor_product
103+
| `windows_local_llm_framework_execution_filter`
104+
how_to_implement: |
105+
The detection is based on data that originates from Endpoint Detection
106+
and Response (EDR) agents. These agents are designed to provide security-related
107+
telemetry from the endpoints where the agent is installed. To implement this search,
108+
you must ingest logs that contain the process GUID, process name, and parent process.
109+
Additionally, you must ingest complete command-line executions. These logs must
110+
be processed using the appropriate Splunk Technology Add-ons that are specific to
111+
the EDR product. The logs must also be mapped to the `Processes` node of the `Endpoint`
112+
data model. Use the Splunk Common Information Model (CIM) to normalize the field
113+
names and speed up the data modeling process.
114+
known_false_positives: Legitimate development, data science, and AI/ML workflows where
115+
authorized developers, researchers, or engineers intentionally execute local LLM
116+
frameworks (Ollama, LM Studio, GPT4All, Jan, NutStudio) for model experimentation,
117+
fine-tuning, or prototyping. Python developers using HuggingFace Transformers or
118+
LangChain for legitimate AI/ML projects. Approved sandbox and lab environments where
119+
framework testing is authorized. Open-source contributors and hobbyists running
120+
frameworks for educational purposes. Third-party applications that bundle or invoke
121+
LLM frameworks as dependencies (e.g., IDE plugins, analytics tools, chatbot integrations).
122+
System administrators deploying frameworks as part of containerized services or
123+
orchestrated ML workloads. Process name keyword overlap with unrelated utilities
124+
(e.g., "llama-backup", "janimation"). Recommended tuning — baseline approved frameworks
125+
and users by role/department, exclude sanctioned dev/lab systems via the filter
126+
macro, correlate with user identity and peer group anomalies before escalating to
127+
incident response.
128+
references:
129+
- https://splunkbase.splunk.com/app/8024
130+
- https://www.ibm.com/think/topics/shadow-ai
131+
- https://www.splunk.com/en_us/blog/artificial-intelligence/splunk-technology-add-on-for-ollama.html
132+
- https://blogs.cisco.com/security/detecting-exposed-llm-servers-shodan-case-study-on-ollama
133+
- https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon
134+
tags:
135+
analytic_story:
136+
- Suspicious Local LLM Frameworks
137+
asset_type: Endpoint
138+
mitre_attack_id:
139+
- T1543
140+
product:
141+
- Splunk Enterprise
142+
- Splunk Enterprise Security
143+
- Splunk Cloud
144+
security_domain: endpoint
145+
tests:
146+
- name: True Positive Test - Sysmon
147+
attack_data:
148+
- data: https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/suspicious_behaviour/local_llms/sysmon_local_llms.log
149+
source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
150+
sourcetype: XmlWinEventLog
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Suspicious Local LLM Frameworks
2+
id: 0b4396a1-aeff-412e-b39e-4e26457c780d
3+
version: 1
4+
date: '2025-11-12'
5+
author: Rod Soto, Splunk
6+
status: production
7+
description: |
8+
Leverage advanced Splunk searches to detect and investigate suspicious activities targeting possibly unauthorized local LLM frameworks. This analytic story addresses discovery and detection of unauthorized local LLM frameworks and related shadow AI artifacts.
9+
narrative: |
10+
This analytic story addresses the growing security challenge of Shadow AI - the deployment and use of unauthorized Large Language Model (LLM) frameworks and AI tools within enterprise environments without proper governance, oversight, or security controls.
11+
12+
Shadow AI deployments pose significant risks including data exfiltration through local model inference (where sensitive corporate data is processed by unmonitored AI systems), intellectual property leakage, policy violations, and creation of security blind spots that bypass enterprise data loss prevention and monitoring solutions.
13+
14+
Local LLM frameworks such as Ollama, LM Studio, GPT4All, Jan, llama.cpp, and KoboldCPP enable users to download and run powerful language models entirely on their endpoints, processing sensitive information without cloud-based safeguards or enterprise visibility. These detections monitor process execution patterns, file creation activities (model files with .gguf, .ggml, safetensors extensions), DNS queries to model repositories, and network connections to identify unauthorized AI infrastructure.
15+
16+
By correlating Windows Security Event Logs (Event ID 4688), Sysmon telemetry (Events 1, 11, 22), and behavioral indicators, security teams can detect shadow AI deployments early, investigate the scope of unauthorized model usage, assess data exposure risks, and enforce AI governance policies to prevent covert model manipulation, persistent endpoint compromise, and uncontrolled AI experimentation that bypasses established security frameworks.
17+
references:
18+
- https://splunkbase.splunk.com/app/8024
19+
- https://www.ibm.com/think/topics/shadow-ai
20+
- https://www.splunk.com/en_us/blog/artificial-intelligence/splunk-technology-add-on-for-ollama.html
21+
- https://blogs.cisco.com/security/detecting-exposed-llm-servers-shodan-case-study-on-ollama
22+
tags:
23+
category:
24+
- Adversary Tactics
25+
product:
26+
- Splunk Enterprise
27+
- Splunk Enterprise Security
28+
- Splunk Cloud
29+
usecase: Advanced Threat Detection

0 commit comments

Comments
 (0)