Performance issues when running unfurl_version_range_v2advisor (and others)

Hi Team,

first release: v34.0.0rc4 (or earlier)
current release: 38.1.0

Our database dates back 2+years and has grown a lot. I checked the sizes of some of the tables using this command:

```console
SELECT
    schemaname AS schema,
    relname AS table_name,
    n_live_tup AS row_count
FROM
    pg_stat_user_tables
ORDER BY
    n_live_tup DESC;
```
Results were staggering:
```
"vulnerabilities_impactedpackageaffecting"	735403239 =735 mln (!!!)
"vulnerabilities_advisory"	54630236  = 54 mln
"vulnerabilities_advisoryrelatedalias"	54622255 = 54 mln
"vulnerabilities_packagechangelog"	34332358 =  34 mln
"vulnerabilities_vulnerability_severities"	32511124 =  32 mln
"vulnerabilities_vulnerabilityseverity"	32221132 =  32 mln
"vulnerabilities_impactedpackage"	        25634100 =  25 mln
"vulnerabilities_impactedpackagefixedby"	8558740 =  8mln
```
We are running VulnerableCode on OpenShift and decided that we wanted a regular updates of the importers and improvers, so we created  CronJobs that would import/improve just one thing each (for example python manage.py import nvd_importer_v2). With the new version we have 137 pods running running at different times of day. While this was a lot but sort of manageable (at least we thought) we have starting noticing problems our database was under enormous load especially when improvers run. While I know that this is caused by parallel runs there is one specific improver that causes our nodes to crash and push the database beyond CPU limits -  unfurl_version_range_v2advisor it was running usually for 4 days straight and using north of 44 GiB of RAM on OpenShift and 10-12 CPUs on PostgreSQL side.

With this we tried running importers and improvers as it says in documentation import --all and improve --all but those takes days or weeks to do.

Most recent importer --all run:
```console
CommandError: 8 failed!: apache_kafka_importer_v2,mattermost_importer_v2,gitlab_importer,pypa_importer,npm_importer,nginx_importer,pysec_importer,vulnerabilities.importers.istio.IstioImporter
```
and it still took 26 hrs. and that is just importers on their own.

We haven't been able to do full run of the improve -all as of yet as I have seen some of those improvers can run for 3-4 days each.

So the question here arises, what to do with this as with this version it is unusable for our production instances.

what to do?
1. Delete the whole thing and start over with new db
2. Run importers once every 26 hrs, and then start improve --all which will take most likely weeks
3. Skip most resource intensive improvers and just do the rest?

I would love to hear your suggestions on possible solutions, or how you run it?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance issues when running unfurl_version_range_v2advisor (and others) #2333

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Performance issues when running unfurl_version_range_v2advisor (and others) #2333

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions