Parallelize support bundle collection and add Neo4j diagnostics#246
Open
josho-sysdig wants to merge 1 commit intomasterfrom
Open
Parallelize support bundle collection and add Neo4j diagnostics#246josho-sysdig wants to merge 1 commit intomasterfrom
josho-sysdig wants to merge 1 commit intomasterfrom
Conversation
Refactor get_support_bundle.sh to dispatch independent collection tasks as concurrent background jobs, reducing wall time on representative clusters (~40 pods, ~120 containers) from ~10m 36s to ~3m 32s with the default MAX_JOBS=6 setting. Parallelization changes: - Introduce three job-control helpers: run_bg (fork+register), throttle (cap concurrency via wait -n / sleep 0.1 fallback for Bash <4.3), and wait_all (harvest PIDs, report failures, clean up temp output files) - Add MAX_JOBS variable (default 6, overridable via --max-jobs flag or environment variable) to prevent API server overload - Refactor container log + support-file collection into collect_container_logs(), launched one background job per container - Refactor per-node JSON manifest collection into collect_node_info(), launched one background job per node - Refactor per-resource-type manifest collection into collect_resource_manifests(), launched one background job per type - Refactor Cassandra stats/storage, Elasticsearch stats, PostgreSQL, MySQL, Kafka, and Zookeeper storage into dedicated functions, each dispatched as a background job - Move kubectl cluster-info dump to a background job so it runs concurrently with log collection - Directory creation and pod/node/container discovery remain serial to avoid race conditions New collection: Neo4j diagnostics - Add collect_neo4j_stats() to retrieve cluster server and database status via cypher-shell, reading the ingestion_admin password from the neo4jdb-user-secrets or neo4j-user-secrets Kubernetes secret - Outputs neo4j/<pod>/cypher_show_servers.txt and neo4j/<pod>/cypher_show_databases.txt; writes a skip log if the secret is unavailable README.md rewrite: - Replace minimal usage stub with full man-page-style documentation covering NAME, SYNOPSIS, DESCRIPTION, OPTIONS, ENVIRONMENT, OUTPUT (annotated directory tree), EXAMPLES, and PARALLEL PROCESSING sections - Document all flags including the new --max-jobs option - Add neo4j/ output directory to the archive structure reference - Include concurrency control design notes and performance benchmark table
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor get_support_bundle.sh to dispatch independent collection tasks as concurrent background jobs, reducing wall time on representative clusters (~40 pods, ~120 containers) from ~10m 36s to ~3m 32s with the default MAX_JOBS=6 setting.
Parallelization changes:
New collection: Neo4j diagnostics
README.md rewrite: