refactor: Overhaul Ansible scripts to provide separate environments#222
refactor: Overhaul Ansible scripts to provide separate environments#222martyngigg merged 47 commits intomainfrom
Conversation
This used to be done by the provisioning step
Simplify discovering variables
Simplifed: - Moved all containers to host networking - Removed docker_configure role - Split out Keycloak bootstrapping
Create clients and scopes in Keycloak Implement user federation in Keycloak
Our deployment is not big enough to have multiple projects. It also simplifies the catalog connections.
|
Important Review skippedToo many files! This PR contains 168 files, which is 18 over the limit of 150. You can disable this status message by setting the
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py (1)
378-390:⚠️ Potential issue | 🟡 MinorRedundant duplicate permission assignment.
The
server_adminusers are assigned{"server": ["admin"]}permissions twice - once on line 382 and again within the second call on lines 386-389. The second call should only include the project permissions.♻️ Proposed fix to remove redundant assignment
if server_admin is not None and identity_provider is not None: # Server admins are human users and will be provisioned the first time they log in server.assign_permissions( identity_provider.oidc_ids(server_admin, server.access_token), - entities={"server": ["admin"]}, - ) - server.assign_permissions( - identity_provider.oidc_ids(server_admin, server.access_token), entities={ "server": ["admin"], "project": ["project_admin"], }, )
🤖 Fix all issues with AI agents
In @.gitattributes:
- Line 1: The .gitattributes entry has a typo in the path string: replace
"infra/ansibler/group_vars/all/vault.yml" with the correct path
"infra/ansible/group_vars/all/vault.yml" so the diff=ansible-vault attribute
applies to the actual vault file; update the string in the .gitattributes line
(the path literal in the diff entry) and commit the change.
In `@elt-common/src/elt_common/iceberg/trino.py`:
- Around line 34-37: The code currently coerces env values using field.type(val)
which mis-parses boolean strings (e.g., bool("false") is True); update the
coercion in the try/except block in elt_common.iceberg.trino (the code using
field.type) to special-case booleans: if field.type is bool, parse val.lower()
against a canonical true set like {"true","1","yes","y","t"} (return False for
other values), otherwise call field.type(val) as before, and keep the existing
except TypeError fallback to return val.
In `@infra/ansible/inventories/dev/group_vars/lakekeeper.yml`:
- Line 7: The dev inventory sets lakekeeper_bootstrap_credentials using the
wrong variable name; replace the singular vault_keycloak_client_secret with the
plural vault_keycloak_client_secrets in the assignment to
lakekeeper_bootstrap_credentials so it matches other roles (Keycloak, ELT,
Trino) and the actual variable defined elsewhere; ensure the rest of the string
interpolation remains intact: lakekeeper_bootstrap_credentials: "ansible:{{
vault_keycloak_client_secrets['ansible'] }}".
In `@infra/ansible/roles/elt/tasks/main.yml`:
- Around line 46-56: The task "Ensure AWS configuration is present" uses
ansible.builtin.copy to write AWS credentials with mode "u=rw,o=r,g=r" which
leaves the file world-readable; change the file permissions to be owner-only
readable/writable (e.g., set mode to 0600 or "u=rw,go=") in the
ansible.builtin.copy task so the credentials at "{{ ansible_env['HOME']
}}/.aws/credentials" are not readable by group or others.
- Around line 137-149: The cron job name is static ("iceberg-maintenance")
causing each loop iteration over elt_warehouses to overwrite the previous entry;
update the ansible.builtin.cron task so the cron "name" is unique per warehouse
(e.g., include the loop_var warehouse.name or warehouse.id) in the Ensure
Iceberg table maintenance jobs exist task, keeping the same job, state, schedule
and loop, so each warehouse gets its own cron entry.
- Around line 73-95: The tasks "Ensure secrets directories exist" and "Generate
secrets" set modes that allow group/other read; change the directory mode in the
ansible.builtin.file task for "{{ elt_secrets_root_dir }}/{{ warehouse.name }}"
to owner-only (e.g. u=rwx,g=,o=) and change the template dest file mode in the
ansible.builtin.template task for "{{ elt_secrets_root_dir }}/{{ warehouse.name
}}/envvars" to owner-only (e.g. u=rw,g=,o=) so secrets are accessible only by
the owner; update the mode strings in those two tasks (referenced by their task
names) accordingly.
In `@infra/ansible/roles/keycloak/defaults/main.yml`:
- Around line 1-12: Update the pinned Keycloak image tag to a patched release
(26.3.5 or later) to mitigate the reported CVEs; modify the keycloak_image
default value (variable name: keycloak_image) in
infra/ansible/roles/keycloak/defaults/main.yml to reference
quay.io/keycloak/keycloak:26.3.5 (or a newer safe patch) and ensure any
deployment manifests or image references that use
keycloak_image_optimized_name/keycloak_image are updated consistently so the
runtime uses the upgraded image.
In `@infra/ansible/roles/keycloak/tasks/main.yml`:
- Around line 47-72: This task "Bootstrap Keycloak admin" (container name
keycloak_bootstrap, command "bootstrap-admin user" using
keycloak_bootstrap_admin_user and KC_BOOTSTRAP_ADMIN_PASSWORD) must be made
idempotent: add a pre-check that queries Keycloak for the admin user (e.g.,
curl/kcadm inside a short container run) and skip this task if the user exists,
or add a targeted failed_when to ignore the specific "user with username exists"
error (KC-SERVICES0010) so re-runs do not fail; implement the check/conditional
using the same community.docker.docker_container task context (or an immediately
preceding command task) and gate the bootstrap-admin task with when: or a
failed_when that matches the KC-SERVICES0010 message.
In `@infra/ansible/roles/keycloak/tasks/setup-ldap.yml`:
- Around line 35-43: In the LDAP mapper config block in setup-ldap.yml remove
the malformed key `read.only"` (extra quote) and the duplicate `read.only: true`
entry so only a single valid `read.only: true` remains; edit the `config:`
mapping (the lines around `ldap.attribute`, `user.model.attribute`,
`is.mandatory.in.ldap`, `attribute.force.default`, `is.binary.attribute`,
`read.only`, `write.only`) to delete the bad line and ensure keys are correctly
spelled and not duplicated.
In `@infra/ansible/roles/lakekeeper/tasks/main.yml`:
- Around line 82-92: The command is using a literal block scalar (|) which
embeds newlines into the command string; change the docker/container task's
command parameter to use a list form (an array of command and args) rather than
the multiline scalar so each argument is a separate list item (e.g., split the
current lines for "uv run", "/opt/work/bootstrap-warehouse.py",
"--project-name", "{{ lakekeeper_project_name }}", etc.), matching the other
docker_container tasks and avoiding embedded \n; alternatively replace the
scalar with a folded scalar (>- ) if you must keep a single string, but prefer
converting the command to a YAML list in this task to stay consistent with the
codebase conventions.
In `@infra/ansible/roles/minio/tasks/main.yml`:
- Around line 27-31: The env block has invalid YAML using KEY=VALUE; update the
env mapping to proper key: value pairs and quote interpolations: replace the
current env entries with keys MINIO_ROOT_USER, MINIO_ROOT_PASSWORD and
MINIO_BROWSER_REDIRECT_URL as YAML mappings and set their values to "{{
minio_root_user }}", "{{ minio_root_password }}" and "https://{{
top_level_domain }}/{{ minio_console_base_path }}" respectively so the env
mapping parses correctly.
In `@infra/ansible/roles/postgres/tasks/main.yml`:
- Around line 29-36: The task "Create postgres secrets" writes the DB password
to /secrets/postgres-passwd with mode u=r,g=r,o=r making it world-readable;
change the ansible.builtin.copy invocation to set mode to "0600" (or "u=rw,go=")
so only root can read/write, keep owner: root and group: root, and add no_log:
true to the task to prevent the secret from appearing in logs; ensure you
reference the same task name and the variable postgres_db_passwd when updating
the task.
In `@infra/ansible/roles/redis/tasks/main.yml`:
- Around line 14-16: Fix the IPv6 bind typo and avoid waiting on a non-existent
container HEALTHCHECK: update the redis-server command invocation (the command:
array in the task) to use "--bind 127.0.0.1 ::1" (remove the erroneous leading
dash before ::1) so Redis receives a correct IPv6 loopback address, and change
the task's state from "healthy" to "started" (or add a proper HEALTHCHECK to the
container image) so the playbook does not block waiting for a health status;
locate this in the ansible task that defines the "command" and "state" for
redis-server.
In `@infra/ansible/roles/traefik/tasks/static-config.yml`:
- Around line 2-21: The task "Ensure Traefik directories exist" ignores per-item
modes because a top-level mode is hardcoded; change the task to use the loop
variable's mode (traefik_dir.mode) instead of the hardcoded string so each loop
item (e.g., the certs entry with mode "u=rwx,g=,o=") is applied; keep
loop_control loop_var: traefik_dir and ensure the file module uses path: "{{
traefik_dir.path }}" and mode: "{{ traefik_dir.mode }}" (and leave owner/group
as needed).
In `@infra/ansible/roles/traefik/templates/etc/traefik/dynamic/lakekeeper.yml.j2`:
- Around line 1-22: Fix the router name typo and guard against an empty
lakekeeper group: rename the router from "lakeeeper" to "lakekeeper" so it
matches the service name "lakekeeper", and wrap or conditionalize the service
server URL generation that uses groups['lakekeeper'][0] (or check
groups['lakekeeper'] | length > 0 / groups.get('lakekeeper')) so the template
does not attempt to index into an empty list; ensure the
router/service/middleware blocks are only rendered when a non-empty
groups['lakekeeper'] exists.
In `@infra/ansible/roles/traefik/templates/etc/traefik/dynamic/trino.yml.j2`:
- Around line 14-15: The Traefik upstream currently uses the inventory hostname
groups['trino'][0], which may not resolve inside the container; update the
dynamic/trino.yml.j2 template to build the backend URL using hostvars and
ansible_host as a fallback — e.g., derive a host variable from
hostvars[groups['trino'][0]]['ansible_host'] (fall back to groups['trino'][0] if
ansible_host is unset) and then use that host with trino_http_port to form the
url; ensure references to groups['trino'][0], hostvars and ansible_host are used
so routing falls back correctly when DNS/hosts are unavailable.
In `@infra/ansible/roles/trino/tasks/main.yml`:
- Around line 64-78: The inline trino_rules definition in the copy task
generates an overly broad catalogue regex "(.*)" for users "elt" and "superset";
update this by extracting the catalog pattern into a configurable variable
(e.g., trino_catalog_patterns or trino_rules) and provide a sane default in role
defaults (defaults/main.yml), then reference that variable in the trino_rules
used by the copy task; also add a short comment above the variable explaining
the intent and security implications so operators can narrow the regex or
override per-environment.
In `@infra/local/warehouses/trino/dev_isis_cleaned.properties`:
- Line 3: The property iceberg.rest-catalog.warehouse was changed to the
hard-coded value "dev_isis_cleaned", which can cause cross-environment
collisions; revert this to an environment-scoped value or templated placeholder
(e.g., use an env var like ${ENV_WAREHOUSE} or a build-time token) and update
the code that reads this property to fall back to the intended per-environment
default if unset; specifically change the iceberg.rest-catalog.warehouse
assignment back to an env-driven value and verify any consumers that construct
Lakekeeper project/catalog or S3 prefixes use the environment identifier to keep
prefixes unique across environments.
In `@infra/local/warehouses/trino/dev_isis_raw.properties`:
- Line 3: The property iceberg.rest-catalog.warehouse is hard-coded to
"dev_isis_raw", which can cause catalog collisions across environments; change
this to use an environment-scoped value or template (e.g., interpolate an ENV
var) or add an environment prefix/suffix so each environment has a unique
warehouse name, or else explicitly document and enforce isolation by configuring
separate Lakekeeper projects or distinct MinIO buckets for each environment to
guarantee no collisions.
🟡 Minor comments (15)
infra/ansible/roles/postgres/tasks/main.yml-22-22 (1)
22-22:⚠️ Potential issue | 🟡 MinorFix grammatical error in task name.
Same issue as above - "Create secrets directory exists" should be "Ensure secrets directory exists".
Proposed fix
-- name: Create secrets directory exists +- name: Ensure secrets directory existsinfra/ansible/roles/postgres/tasks/main.yml-13-13 (1)
13-13:⚠️ Potential issue | 🟡 MinorFix grammatical error in task name.
The task name "Create postgres data directory exists" is grammatically incorrect.
Proposed fix
-- name: Create postgres data directory exists +- name: Ensure postgres data directory existsinfra/ansible/roles/trino/defaults/main.yml-6-6 (1)
6-6:⚠️ Potential issue | 🟡 MinorSet
trino_shared_secretfrom inventory or vault rather than defaulting to empty string.The
internal-communication.shared-secretconfiguration is used for Trino's internal node-to-node communication security. Defaulting to an empty string disables this authentication layer. Rather than relying on developers to remember to override this value, require it to be explicitly set via inventory/vault or use Ansible assertion to fail on empty values in production deployments.infra/ansible/roles/trino/defaults/main.yml-4-4 (1)
4-4:⚠️ Potential issue | 🟡 MinorConsider updating Trino image to a more recent version.
trinodb/trino:477is a valid, officially released image (released 24 September 2025), but it is now 5 months outdated as of February 2026. Verify whether using this older release is intentional or if a more recent version should be used.infra/ansible/roles/traefik/templates/etc/traefik/dynamic/traefik.yml.j2-16-16 (1)
16-16:⚠️ Potential issue | 🟡 MinorBcrypt hash regenerates on each playbook run, causing unnecessary configuration reloads.
The
password_hash('bcrypt')filter generates a new random salt on each invocation when one isn't supplied. This means the file will change on every Ansible run even if the password hasn't changed. Since Traefik watches dynamic config files for changes, this triggers reloads despite no actual configuration change.Use a fixed salt (22 characters) to ensure idempotent hashes:
users: admin:{{ vault_traefik_admin_passwd | password_hash('bcrypt', rounds=10, ident='2y', salt='abcdefghijklmnopqrstuv') }}Alternatively, pre-compute the hash in the vault if the password is static.
infra/ansible/roles/elt/templates/cron/iceberg_maintenance.sh.j2-18-18 (1)
18-18:⚠️ Potential issue | 🟡 MinorQuote variable expansion to handle paths with spaces.
The variable
$ELT_SECRETSshould be quoted to prevent word splitting if the path contains spaces.🛡️ Proposed fix
-set -o allexport; source $ELT_SECRETS; set +o allexport +set -o allexport; source "$ELT_SECRETS"; set +o allexportinfra/ansible/roles/elt/templates/cron/iceberg_maintenance.sh.j2-24-24 (1)
24-24:⚠️ Potential issue | 🟡 MinorUse
"$@"instead of$*for argument forwarding.Using
$*without quotes causes all arguments to be concatenated and re-split on whitespace. Use"$@"to preserve argument boundaries correctly.🛡️ Proposed fix
uvx \ --with "${uv_git_ref}[iceberg-maintenance]" \ - python -m elt_common.iceberg.maintenance $* + python -m elt_common.iceberg.maintenance "$@"infra/ansible/roles/base/tasks/main.yml-13-13 (1)
13-13:⚠️ Potential issue | 🟡 MinorQuote the file mode to ensure correct octal interpretation.
Ansible interprets unquoted numeric values as decimal integers.
mode: 644is interpreted as decimal 644, not octal 0644, which would result in incorrect permissions (decimal 644 = octal 1204, yielding-w----xr--). Use a quoted string or explicit octal notation.🔧 Proposed fix
- mode: 644 + mode: "0644"infra/ansible/roles/elt/tasks/main.yml-80-84 (1)
80-84:⚠️ Potential issue | 🟡 MinorBug: Missing Jinja2 templating in loop label.
Line 83 uses
label: warehouse.namewhich will display the literal string "warehouse.name" instead of the actual warehouse name. This should use Jinja2 templating like line 95 does.🐛 Proposed fix
loop_control: loop_var: warehouse - label: warehouse.name + label: "{{ warehouse.name }}"docs-devel/deployment/prerequisites.md-18-20 (1)
18-20:⚠️ Potential issue | 🟡 MinorUse British English “licence” here.
Minor wording polish for en‑GB spelling.
✏️ Suggested wording update
-*Note: Terraform no longer has an open-source license. [OpenTofu](https://opentofu.org/) is a* +*Note: Terraform no longer has an open‑source licence. [OpenTofu](https://opentofu.org/) is a*infra/ansible/group_vars/keycloak.yml-2-2 (1)
2-2:⚠️ Potential issue | 🟡 MinorFix minor typo in comment.
Line 2: “Eveything” → “Everything”.🛠️ Proposed fix
-# Eveything is assigned to the realm defined by keycloak_realm.name in all/all.yml +# Everything is assigned to the realm defined by keycloak_realm.name in all/all.ymldocs-devel/deployment/index.md-34-39 (1)
34-39:⚠️ Potential issue | 🟡 MinorFix malformed endpoint placeholders.
Line 36–39 include backslashes inside the angle-bracket placeholders, which will render incorrectly.🛠️ Proposed fix
-- Keycloak: <https://\domain\>/iceberg> -- Lakekeeper: <https://\<domain\>/authn> -- Superset instances: -- - <https://\<domain\>/workspace/accelerator> +- Keycloak: <https://<domain>/iceberg> +- Lakekeeper: <https://<domain>/authn> +- Superset instances: + - <https://<domain>/workspace/accelerator>infra/ansible/roles/superset_farm/templates/superset-cli.sh.j2-3-14 (1)
3-14:⚠️ Potential issue | 🟡 MinorQuote variables and use
"$@"for argument forwarding.Unquoted variables risk breaking when values contain spaces or special characters. Using
$*instead of"$@"fails to preserve argument boundaries.Suggested fix
-CONTAINER_IMAGE={{ superset_farm_image }} -SUPERSET_CONFIG_MOUNT={{ instance_work_dir }}/docker/pythonpath -ENV_FILE={{ instance_work_dir }}/docker/.env +CONTAINER_IMAGE="{{ superset_farm_image }}" +SUPERSET_CONFIG_MOUNT="{{ instance_work_dir }}/docker/pythonpath" +ENV_FILE="{{ instance_work_dir }}/docker/.env" @@ - --env-file $ENV_FILE \ - --volume ${SUPERSET_CONFIG_MOUNT}:/app/docker/pythonpath:ro \ + --env-file "$ENV_FILE" \ + --volume "${SUPERSET_CONFIG_MOUNT}:/app/docker/pythonpath:ro" \ @@ - ${CONTAINER_IMAGE} \ - superset $* + "${CONTAINER_IMAGE}" \ + superset "$@"infra/ansible/roles/trino/tasks/main.yml-88-101 (1)
88-101:⚠️ Potential issue | 🟡 MinorEscape usernames in the password file regexp.
Usernames containing regex metacharacters (e.g.,
.or+) can cause unintended matches in thelineinfileregexp pattern. Theregex_escapefilter is the correct Ansible approach for this.Proposed fix
- regexp: "^{{ user.key }}" + regexp: "^{{ user.key | regex_escape }}"infra/ansible/terraform/main.tf-47-68 (1)
47-68:⚠️ Potential issue | 🟡 MinorGuard
merge()against emptysecurity_groups.If
var.security_groupsis passed as an empty map,merge([...]...)expands to zero arguments and Terraform errors during plan. Seeding with an empty map avoids this edge case.Proposed fix
locals { - sec_groups_map = merge([ - for sg_name, sg_config in var.security_groups : { - for port in sg_config.ports_ingress : "${sg_name}-${port}" => { - sec_group_id = openstack_networking_secgroup_v2.sec_group[sg_name].id - port_ingress = port - } - } - ]...) + sec_groups_map = merge( + {}, + [for sg_name, sg_config in var.security_groups : { + for port in sg_config.ports_ingress : "${sg_name}-${port}" => { + sec_group_id = openstack_networking_secgroup_v2.sec_group[sg_name].id + port_ingress = port + } + }]... + ) }
🧹 Nitpick comments (21)
infra/ansible/roles/postgres/tasks/main.yml (1)
8-11: Consider documenting or isolating the system package override.The
--break-system-packagesflag bypasses PEP 668 protections on externally managed Python environments. Whilst this is a common workaround for Ansible on modern Debian/Ubuntu systems, it can lead to conflicts with system packages.Consider using a virtual environment or documenting why this approach is acceptable for this deployment.
infra/ansible/roles/superset_farm/files/docker/pythonpath/superset_config.py (1)
150-165: GuardREDIS_DBparsing against empty/invalid values.
int(os.getenv("REDIS_DB", "0"))will throw if the env var is set to an empty string. A small guard keeps boot deterministic while still allowing overrides.Proposed change
-REDIS_DB = int(os.getenv("REDIS_DB", "0")) +REDIS_DB = int(os.getenv("REDIS_DB") or "0")infra/local/superset/docker/pythonpath/superset_config.py (1)
103-120: GuardREDIS_DBparsing against empty/invalid values.Same as the ansible config, an empty
REDIS_DBenv var would cause a startup failure.Proposed change
-REDIS_DB = int(os.getenv("REDIS_DB", "0")) +REDIS_DB = int(os.getenv("REDIS_DB") or "0")infra/ansible/roles/superset_farm/templates/docker/docker-init.sh.j2 (1)
54-58: Consider quoting the database name argument.The
--database_nameargument value is unquoted, which could cause issues ifanalytic_db.namecontains spaces or special shell characters. The--urivalue is already correctly quoted.💡 Suggested fix
superset set-database-uri \ - --database_name {{ analytic_db.name }} --uri "{{ analytic_db.db_url }}" + --database_name "{{ analytic_db.name }}" --uri "{{ analytic_db.db_url }}"infra/ansible/roles/redis/tasks/main.yml (1)
2-7: Working directory created but not mounted for persistence.The task creates
redis_work_diron the host (lines 2-7), but the container does not mount this directory. Redis data will not persist across container restarts. If persistence is intended, consider adding a volume mount.♻️ Proposed fix to add volume mount
network_mode: host restart_policy: unless-stopped + volumes: + - "{{ redis_work_dir }}:/data"Also applies to: 9-20
infra/ansible/roles/elt/templates/cron/iceberg_maintenance.sh.j2 (1)
9-12: Usage message does not reflect additional arguments.The usage message only mentions
<git_url>, but the script forwards additional arguments via$*to the Python module. Consider updating the usage to clarify this.📝 Proposed fix
if [ $# -eq 0 ]; then - echo "Usage $0 <git_url>" + echo "Usage: $0 <git_url> [maintenance_args...]" exit 1 fiinfra/ansible/roles/superset_farm/templates/docker/dotenv.j2 (1)
23-24: Misleading comment about SECRET_KEY.The comment suggests manually setting the secret key, but it's already templated from
instance.value.secret_key. Consider updating the comment to reflect that the value is managed via Ansible variables.📝 Proposed fix
-# Make sure you set this to a unique secure random value on production +# Secret key managed via Ansible variables - ensure a unique secure random value in production SUPERSET_SECRET_KEY={{ instance.value.secret_key }}infra/ansible/roles/superset_farm/tasks/setup-instance.yml (1)
11-12: Add a name to theset_facttask for consistency and clarity.All other tasks in this file have descriptive names. Adding a name improves readability and makes Ansible output easier to follow during execution.
♻️ Proposed fix
-- ansible.builtin.set_fact: +- name: Set instance working directory + ansible.builtin.set_fact: instance_work_dir: "{{ superset_farm_work_dir_prefix }}/{{ instance.key }}"infra/ansible/terraform/README.md (1)
1-1: Consider expanding the README with essential documentation.This README is a placeholder. Given that Terraform provisioning is a significant addition to the project, consider documenting:
- Prerequisites (Terraform/OpenTofu version, cloud provider credentials)
- Available environments and their configuration files
- Basic usage commands (
terraform init,plan,apply)- How the generated inventory integrates with Ansible
infra/ansible/roles/traefik/templates/etc/traefik/dynamic/tls.yml.j2 (1)
1-4: Consider parameterising the certificate file paths.The certificate paths are hardcoded. Using Jinja2 variables would provide greater flexibility across different environments and domains.
Proposed refactor using variables
tls: certificates: - - certFile: /certs/cert__domain.pem - keyFile: /certs/key__domain.pem + - certFile: /certs/{{ traefik_tls_cert_file | default('cert__domain.pem') }} + keyFile: /certs/{{ traefik_tls_key_file | default('key__domain.pem') }}infra/ansible/roles/traefik/tasks/main.yml (1)
2-8: Consider usinginclude_tasksinstead ofimport_taskswith conditional logic for clearer intent.When using
ansible.builtin.import_tasks, thewhencondition is applied to each task within the imported file at parse time. Since the intent is to conditionally include or exclude entire configuration files based ontraefik_container_only,include_tasksis more semantically appropriate—it evaluates the condition at runtime and either includes or excludes the entire file, rather than conditionally skipping individual tasks.Proposed refactor
-- ansible.builtin.import_tasks: static-config.yml +- ansible.builtin.include_tasks: static-config.yml when: not (traefik_container_only | default(False)) -- ansible.builtin.import_tasks: container.yml +- ansible.builtin.include_tasks: container.yml -- ansible.builtin.import_tasks: dynamic-config.yml +- ansible.builtin.include_tasks: dynamic-config.yml when: not (traefik_container_only | default(False))infra/ansible/terraform/.gitignore (1)
5-5: Remove.terraform.lock.hclfrom.gitignoreto ensure reproducible provider versions.The
.terraform.lock.hclfile records exact provider versions and checksums. HashiCorp's official documentation recommends committing this file to version control so that all team members and CI/CD pipelines use identical provider versions, enabling consistent and reproducible deployments.Proposed change
.terraform/ -.terraform.lock.hclinfra/ansible/roles/superset_farm/defaults/main.yml (1)
2-4: Consider pinning the Superset image by digest.Tags can drift; digest pinning improves reproducibility and security.
🔒 Example pinning by digest
-superset_farm_image: ghcr.io/isisneutronmuon/analytics-data-platform-superset:a4f0e8011f34785d759269dceedfa2693db81c19 +superset_farm_image: ghcr.io/isisneutronmuon/analytics-data-platform-superset@sha256:<digest>infra/ansible/roles/traefik/templates/etc/traefik/dynamic/superset_farm.yml.j2 (1)
13-17: Consider load-balancing across all superset_farm hosts.
Only the first host is used, so additional instances never receive traffic.♻️ Suggested server list expansion
- servers: - - url: "http://{{ groups['superset_farm'][0] }}:{{ props.http_port }}" + servers: + {% for host in groups['superset_farm'] %} + - url: "http://{{ host }}:{{ props.http_port }}" + {% endfor %}infra/ansible/roles/traefik/templates/etc/traefik/dynamic/datastore.yml.j2 (1)
24-32: Include all datastore hosts in the load balancer.
Only the first datastore host is used, so additional hosts never receive traffic.♻️ Suggested server list expansion
minioapi: loadBalancer: servers: - - url: "http://{{ groups['datastore'][0] }}:{{ minio_api_port }}" + {% for host in groups['datastore'] %} + - url: "http://{{ host }}:{{ minio_api_port }}" + {% endfor %} minioconsole: loadBalancer: servers: - - url: "http://{{ groups['datastore'][0] }}:{{ minio_console_port }}" + {% for host in groups['datastore'] %} + - url: "http://{{ host }}:{{ minio_console_port }}" + {% endfor %}infra/ansible/terraform/ansible-inventory.tmpl (1)
6-9: Consider restructuring to avoid index-based group/IP pairing for clarity.Currently,
ansible_groupsandansible_ipsare created from the same source collection in parallel, so the index alignment is safe. However, this pattern relies on list ordering and could introduce subtle bugs if the code is refactored separately in future. A map-based approach would make the group-to-IP relationship explicit and eliminate index alignment entirely.Suggested refactoring (requires map output)
-%{ for index, group in ansible_groups ~} - -[${ansible_groups[index]}] -${ansible_ips[index]} -%{ endfor ~} +%{ for group, ips in ansible_group_ips ~} +[${group}] +%{ for ip in ips ~} +${ip} +%{ endfor ~} +%{ endfor ~}infra/ansible/roles/traefik/templates/etc/traefik/dynamic/lakekeeper.yml.j2 (1)
4-4: Fix the router name typo for clarity.The router name looks like an accidental misspelling, which makes dashboards/metrics harder to interpret.
✏️ Rename the router
- lakeeeper: + lakekeeper:infra/ansible/roles/minio/defaults/main.yml (1)
2-2: Pin the MinIO image digest for reproducibility and supply-chain safety.The tag
quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Zis available. Consider pinning the digest instead of the tag to prevent tag drift and ensure all deployments use the exact same image content.🔒 Digest pinning
-minio_image: quay.io/minio/minio:RELEASE.2025-09-07T16-13-09Z +minio_image: quay.io/minio/minio@sha256:14cea493d9a34af32f524e538b8346cf79f3321eff8e708c1e2960462bd8936einfra/ansible/roles/keycloak/tasks/setup-realm.yml (1)
31-48: Consider adding defaults for future-proofing, though all current clients define these fields.All four currently defined clients (ansible, trino, elt, lakekeeper-ui) include both
optional_client_scopes_additionalandattributes. However, if future clients omit these fields, the task will fail. Adding defaults makes the template more defensive:Proposed improvement
- optional_client_scopes: "{{ keycloak_client_optional_scopes_minimum + item.optional_client_scopes_additional }}" - attributes: "{{ item.attributes }}" + optional_client_scopes: "{{ keycloak_client_optional_scopes_minimum + (item.optional_client_scopes_additional | default([])) }}" + attributes: "{{ item.attributes | default({}) }}"infra/ansible/terraform/main.tf (1)
100-110: Consider usingvm.network[0].fixed_ip_v4for explicit IP reference.While
access_ip_v4is auto-populated by the provider, it may be unreliable when networks are attached via Neutron ports. This code uses direct network attachment (not port-based), soaccess_ip_v4should populate reliably; however, usingnetwork[0].fixed_ip_v4provides more explicit and defensive control over the inventory IP source.Suggested adjustment (single-NIC case)
- ansible_ips = [for vm in openstack_compute_instance_v2.vm : vm.access_ip_v4] + ansible_ips = [for vm in openstack_compute_instance_v2.vm : vm.network[0].fixed_ip_v4]infra/ansible/terraform/variables.tf (1)
36-45: Add validation for CIDR and port ranges.Input variable validation blocks in Terraform (stable since v0.13.0) enable early detection of misconfiguration, preventing late failures during apply. The suggested validation blocks use
cidrnetmask()to validate CIDR syntax andalltrue()with port range checks (1–65535) to validate port entries, following established Terraform patterns.Suggested validation blocks
variable "network_subnet_cidr" { description = "CIDR of network subnet" type = string + validation { + condition = can(cidrnetmask(var.network_subnet_cidr)) + error_message = "network_subnet_cidr must be a valid CIDR." + } } variable "security_groups" { description = "Details of the ports to open on each compute instance" type = map(object({ ports_ingress = list(number) })) + validation { + condition = alltrue([ + for sg in var.security_groups : alltrue([ + for p in sg.ports_ingress : p >= 1 && p <= 65535 + ]) + ]) + error_message = "ports_ingress entries must be valid TCP ports (1-65535)." + } }
Summary
In order to provide separate QA/Dev environments the Ansible scripts have been refactored. Provisioning as been removed from Ansible and a new Terraform project takes care of this.