Skip to content

Parallelize support bundle collection and add Neo4j diagnostics#246

Open
josho-sysdig wants to merge 1 commit intomasterfrom
support_bundle_parallel
Open

Parallelize support bundle collection and add Neo4j diagnostics#246
josho-sysdig wants to merge 1 commit intomasterfrom
support_bundle_parallel

Conversation

@josho-sysdig
Copy link
Copy Markdown

Refactor get_support_bundle.sh to dispatch independent collection tasks as concurrent background jobs, reducing wall time on representative clusters (~40 pods, ~120 containers) from ~10m 36s to ~3m 32s with the default MAX_JOBS=6 setting.

Parallelization changes:

  • Introduce three job-control helpers: run_bg (fork+register), throttle (cap concurrency via wait -n / sleep 0.1 fallback for Bash <4.3), and wait_all (harvest PIDs, report failures, clean up temp output files)
  • Add MAX_JOBS variable (default 6, overridable via --max-jobs flag or environment variable) to prevent API server overload
  • Refactor container log + support-file collection into collect_container_logs(), launched one background job per container
  • Refactor per-node JSON manifest collection into collect_node_info(), launched one background job per node
  • Refactor per-resource-type manifest collection into collect_resource_manifests(), launched one background job per type
  • Refactor Cassandra stats/storage, Elasticsearch stats, PostgreSQL, MySQL, Kafka, and Zookeeper storage into dedicated functions, each dispatched as a background job
  • Move kubectl cluster-info dump to a background job so it runs concurrently with log collection
  • Directory creation and pod/node/container discovery remain serial to avoid race conditions

New collection: Neo4j diagnostics

  • Add collect_neo4j_stats() to retrieve cluster server and database status via cypher-shell, reading the ingestion_admin password from the neo4jdb-user-secrets or neo4j-user-secrets Kubernetes secret
  • Outputs neo4j//cypher_show_servers.txt and neo4j//cypher_show_databases.txt; writes a skip log if the secret is unavailable

README.md rewrite:

  • Replace minimal usage stub with full man-page-style documentation covering NAME, SYNOPSIS, DESCRIPTION, OPTIONS, ENVIRONMENT, OUTPUT (annotated directory tree), EXAMPLES, and PARALLEL PROCESSING sections
  • Document all flags including the new --max-jobs option
  • Add neo4j/ output directory to the archive structure reference
  • Include concurrency control design notes and performance benchmark table

Refactor get_support_bundle.sh to dispatch independent collection tasks
as concurrent background jobs, reducing wall time on representative
clusters (~40 pods, ~120 containers) from ~10m 36s to ~3m 32s with the
default MAX_JOBS=6 setting.

Parallelization changes:
- Introduce three job-control helpers: run_bg (fork+register), throttle
  (cap concurrency via wait -n / sleep 0.1 fallback for Bash <4.3), and
  wait_all (harvest PIDs, report failures, clean up temp output files)
- Add MAX_JOBS variable (default 6, overridable via --max-jobs flag or
  environment variable) to prevent API server overload
- Refactor container log + support-file collection into
  collect_container_logs(), launched one background job per container
- Refactor per-node JSON manifest collection into collect_node_info(),
  launched one background job per node
- Refactor per-resource-type manifest collection into
  collect_resource_manifests(), launched one background job per type
- Refactor Cassandra stats/storage, Elasticsearch stats, PostgreSQL,
  MySQL, Kafka, and Zookeeper storage into dedicated functions, each
  dispatched as a background job
- Move kubectl cluster-info dump to a background job so it runs
  concurrently with log collection
- Directory creation and pod/node/container discovery remain serial to
  avoid race conditions

New collection: Neo4j diagnostics
- Add collect_neo4j_stats() to retrieve cluster server and database
  status via cypher-shell, reading the ingestion_admin password from the
  neo4jdb-user-secrets or neo4j-user-secrets Kubernetes secret
- Outputs neo4j/<pod>/cypher_show_servers.txt and
  neo4j/<pod>/cypher_show_databases.txt; writes a skip log if the
  secret is unavailable

README.md rewrite:
- Replace minimal usage stub with full man-page-style documentation
  covering NAME, SYNOPSIS, DESCRIPTION, OPTIONS, ENVIRONMENT, OUTPUT
  (annotated directory tree), EXAMPLES, and PARALLEL PROCESSING sections
- Document all flags including the new --max-jobs option
- Add neo4j/ output directory to the archive structure reference
- Include concurrency control design notes and performance benchmark table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant