Hi Team,
first release: v34.0.0rc4 (or earlier)
current release: 38.1.0
Our database dates back 2+years and has grown a lot. I checked the sizes of some of the tables using this command:
SELECT
schemaname AS schema,
relname AS table_name,
n_live_tup AS row_count
FROM
pg_stat_user_tables
ORDER BY
n_live_tup DESC;
Results were staggering:
"vulnerabilities_impactedpackageaffecting" 735403239 =735 mln (!!!)
"vulnerabilities_advisory" 54630236 = 54 mln
"vulnerabilities_advisoryrelatedalias" 54622255 = 54 mln
"vulnerabilities_packagechangelog" 34332358 = 34 mln
"vulnerabilities_vulnerability_severities" 32511124 = 32 mln
"vulnerabilities_vulnerabilityseverity" 32221132 = 32 mln
"vulnerabilities_impactedpackage" 25634100 = 25 mln
"vulnerabilities_impactedpackagefixedby" 8558740 = 8mln
We are running VulnerableCode on OpenShift and decided that we wanted a regular updates of the importers and improvers, so we created CronJobs that would import/improve just one thing each (for example python manage.py import nvd_importer_v2). With the new version we have 137 pods running running at different times of day. While this was a lot but sort of manageable (at least we thought) we have starting noticing problems our database was under enormous load especially when improvers run. While I know that this is caused by parallel runs there is one specific improver that causes our nodes to crash and push the database beyond CPU limits - unfurl_version_range_v2advisor it was running usually for 4 days straight and using north of 44 GiB of RAM on OpenShift and 10-12 CPUs on PostgreSQL side.
With this we tried running importers and improvers as it says in documentation import --all and improve --all but those takes days or weeks to do.
Most recent importer --all run:
CommandError: 8 failed!: apache_kafka_importer_v2,mattermost_importer_v2,gitlab_importer,pypa_importer,npm_importer,nginx_importer,pysec_importer,vulnerabilities.importers.istio.IstioImporter
and it still took 26 hrs. and that is just importers on their own.
We haven't been able to do full run of the improve -all as of yet as I have seen some of those improvers can run for 3-4 days each.
So the question here arises, what to do with this as with this version it is unusable for our production instances.
what to do?
- Delete the whole thing and start over with new db
- Run importers once every 26 hrs, and then start improve --all which will take most likely weeks
- Skip most resource intensive improvers and just do the rest?
I would love to hear your suggestions on possible solutions, or how you run it?
Hi Team,
first release: v34.0.0rc4 (or earlier)
current release: 38.1.0
Our database dates back 2+years and has grown a lot. I checked the sizes of some of the tables using this command:
Results were staggering:
We are running VulnerableCode on OpenShift and decided that we wanted a regular updates of the importers and improvers, so we created CronJobs that would import/improve just one thing each (for example python manage.py import nvd_importer_v2). With the new version we have 137 pods running running at different times of day. While this was a lot but sort of manageable (at least we thought) we have starting noticing problems our database was under enormous load especially when improvers run. While I know that this is caused by parallel runs there is one specific improver that causes our nodes to crash and push the database beyond CPU limits - unfurl_version_range_v2advisor it was running usually for 4 days straight and using north of 44 GiB of RAM on OpenShift and 10-12 CPUs on PostgreSQL side.
With this we tried running importers and improvers as it says in documentation import --all and improve --all but those takes days or weeks to do.
Most recent importer --all run:
CommandError: 8 failed!: apache_kafka_importer_v2,mattermost_importer_v2,gitlab_importer,pypa_importer,npm_importer,nginx_importer,pysec_importer,vulnerabilities.importers.istio.IstioImporterand it still took 26 hrs. and that is just importers on their own.
We haven't been able to do full run of the improve -all as of yet as I have seen some of those improvers can run for 3-4 days each.
So the question here arises, what to do with this as with this version it is unusable for our production instances.
what to do?
I would love to hear your suggestions on possible solutions, or how you run it?