Skip to content

Add batching parameter to salesforce_sync:* tasks#797

Draft
fspeirs wants to merge 1 commit intomainfrom
fs-batch-salesforce-backfill
Draft

Add batching parameter to salesforce_sync:* tasks#797
fspeirs wants to merge 1 commit intomainfrom
fs-batch-salesforce-backfill

Conversation

@fspeirs
Copy link
Copy Markdown
Contributor

@fspeirs fspeirs commented Apr 29, 2026

In preparation for completing the back-fill of School data into Salesforce, this commit adds a LIMIT env var to the salesforce_sync:* rake tasks to avoid overwhelming Salesforce when we sync the back-fill.

The LIMIT env var can be set to any value, or 0 for no limit. Absent, it defaults to enqueueing 25 jobs.

What's changed?

  • Wrapped each task in a simple counter that checks for a Salesforce:: record and, if not found, enqueues a task.
  • Aborts when count reaches LIMIT.

Steps to perform after deploying to production

To complete the back-fill:

LIMIT=10 bundle exec rake salesforce_sync:{school, role, contact}

This commit adds a LIMIT env var to the salesforce_sync:* rake tasks to
avoid overwhelming Salesforce when we sync the back-fill.
Copilot AI review requested due to automatic review settings April 29, 2026 08:32
@cla-bot cla-bot Bot added the cla-signed label Apr 29, 2026
@github-actions
Copy link
Copy Markdown

Test coverage

89.44% line coverage reported by SimpleCov.
Run: https://github.com/RaspberryPiFoundation/editor-api/actions/runs/25098859876

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a LIMIT batching mechanism to the salesforce_sync:* rake tasks to support controlled Salesforce backfills without enqueueing an unbounded number of jobs at once.

Changes:

  • Introduces LIMIT (default 25, 0 = unlimited) to cap the number of jobs enqueued per task run.
  • Adds existence checks against Salesforce mirror tables to enqueue only missing Schools/Roles (and only Contacts that exist).
  • Prints basic “checked vs enqueued” stats after each task run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# rake salesforce_sync:school # Default: sync up to 25 records
# LIMIT=100 rake salesforce_sync:school # Sync up to 100 records
# LIMIT=0 rake salesforce_sync:school # Sync all missing records (unlimited)
desc 'Sync all Schools to Salesforce'
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage comments indicate this task syncs only missing schools and is LIMITed, but the desc still reads “Sync all Schools to Salesforce”. Consider updating the desc/wording to reflect the new behavior so operators don’t assume it will update existing Salesforce records.

Suggested change
desc 'Sync all Schools to Salesforce'
desc 'Sync missing Schools to Salesforce'

Copilot uses AI. Check for mistakes.
Comment on lines +28 to 33
# Sync non-student Roles to Salesforce that don't already exist there.
# Usage:
# rake salesforce_sync:role # Default: sync up to 25 records
# LIMIT=100 rake salesforce_sync:role # Sync up to 100 records
# LIMIT=0 rake salesforce_sync:role # Sync all missing records (unlimited)
desc 'Sync all non-student Roles to Salesforce'
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs/desc here still read like the task syncs “all” non-student roles, but the implementation only enqueues roles missing in Salesforce and stops after LIMIT. Consider updating the wording so it’s clear this is a “missing records backfill” task rather than a full resync.

Suggested change
# Sync non-student Roles to Salesforce that don't already exist there.
# Usage:
# rake salesforce_sync:role # Default: sync up to 25 records
# LIMIT=100 rake salesforce_sync:role # Sync up to 100 records
# LIMIT=0 rake salesforce_sync:role # Sync all missing records (unlimited)
desc 'Sync all non-student Roles to Salesforce'
# Backfill non-student Roles to Salesforce when they don't already exist there.
# Usage:
# rake salesforce_sync:role # Default: sync up to 25 missing records
# LIMIT=100 rake salesforce_sync:role # Sync up to 100 missing records
# LIMIT=0 rake salesforce_sync:role # Sync all missing records (unlimited)
desc 'Backfill missing non-student Roles to Salesforce'

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +61
limit = ENV['LIMIT'] ? ENV['LIMIT'].to_i : 25
limit = nil if limit.zero?
enqueued = 0
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV['LIMIT'].to_i again turns invalid values into 0 (unlimited). Since this task can enqueue many jobs, it’s safer to validate LIMIT (numeric, >= 0) and abort with a clear message when it’s invalid.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +70
if Salesforce::Contact.exists?(pi_accounts_unique_id__c: school.creator_id)
Salesforce::ContactSyncJob.perform_later(school_id: school.id)
enqueued += 1
break if limit && enqueued >= limit
end
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does Salesforce::Contact.exists? for every School, which is an N+1 query pattern. Consider batching by collecting creator_ids for each find_each batch and querying existing Contacts in one call, then enqueue only for those creator_ids that exist.

Copilot uses AI. Check for mistakes.
Comment on lines 10 to +14
task school: :environment do
limit = ENV['LIMIT'] ? ENV['LIMIT'].to_i : 25
limit = nil if limit.zero?
enqueued = 0
checked = 0
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tasks now have non-trivial behavior (LIMIT parsing, existence gating, early break) but there are no rake task specs covering them. Since the repo already tests tasks under spec/lib/tasks, consider adding specs for default LIMIT, LIMIT=0, and that existing Salesforce records are skipped.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +13
limit = ENV['LIMIT'] ? ENV['LIMIT'].to_i : 25
limit = nil if limit.zero?
enqueued = 0
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV['LIMIT'].to_i treats non-numeric values as 0, which then becomes nil (unlimited) here. That makes typos like LIMIT=abc potentially enqueue an unbounded number of jobs; consider validating LIMIT explicitly (e.g., parse with Integer and abort unless it’s >= 0) before proceeding.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +37
limit = ENV['LIMIT'] ? ENV['LIMIT'].to_i : 25
limit = nil if limit.zero?
enqueued = 0
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV['LIMIT'].to_i has the same “invalid string becomes 0 => unlimited” issue here as in the school task. Please validate that LIMIT is numeric and >= 0, and fail fast on invalid values to avoid accidentally enqueueing an unbounded backfill.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +22
unless Salesforce::School.exists?(editoruuid__c: school.id)
Salesforce::SchoolSyncJob.perform_later(school_id: school.id)
enqueued += 1
break if limit && enqueued >= limit
end
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses Salesforce::School.exists? inside the find_each, which results in one Salesforce DB query per School (N+1). For large backfills, consider batching existence checks per find_each batch (load existing editoruuid__c values for the batch into a Set) to reduce query volume.

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +46
unless Salesforce::Role.exists?(affiliation_id__c: role.id)
Salesforce::RoleSyncJob.perform_later(role_id: role.id)
enqueued += 1
break if limit && enqueued >= limit
end
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop performs Salesforce::Role.exists? per Role, creating an N+1 query pattern against the Salesforce DB. Consider batching existence checks for each find_each batch (query existing affiliation_id__c for the batch’s role IDs) before enqueueing jobs.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to 57
# Sync creator_agree_to_ux_contact for Schools where the Contact exists in Salesforce.
# Usage:
# rake salesforce_sync:contact # Default: sync up to 25 records
# LIMIT=100 rake salesforce_sync:contact # Sync up to 100 records
# LIMIT=0 rake salesforce_sync:contact # Sync all records (unlimited)
desc 'Sync creator_agree_to_ux_contact for all Schools to Salesforce Contact'
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage comment says LIMIT=0 will “Sync all records (unlimited)”, but the code only enqueues jobs for Schools whose Contact already exists in Salesforce. Consider updating this wording (and potentially the desc) so the scope/behavior is clear.

Suggested change
# Sync creator_agree_to_ux_contact for Schools where the Contact exists in Salesforce.
# Usage:
# rake salesforce_sync:contact # Default: sync up to 25 records
# LIMIT=100 rake salesforce_sync:contact # Sync up to 100 records
# LIMIT=0 rake salesforce_sync:contact # Sync all records (unlimited)
desc 'Sync creator_agree_to_ux_contact for all Schools to Salesforce Contact'
# Sync creator_agree_to_ux_contact for Schools whose Contact already exists in Salesforce.
# Usage:
# rake salesforce_sync:contact # Default: enqueue up to 25 eligible schools
# LIMIT=100 rake salesforce_sync:contact # Enqueue up to 100 eligible schools
# LIMIT=0 rake salesforce_sync:contact # Enqueue all eligible schools (unlimited)
desc 'Sync creator_agree_to_ux_contact to Salesforce Contact for Schools whose Contact already exists'

Copilot uses AI. Check for mistakes.
@zetter-rpf
Copy link
Copy Markdown
Contributor

A bit more context in the PR description or a linked issue would help me review and would be useful for people looking at the code in the future. My main questions are:

  • Why are we worried about overwhelming salesforce? Is there a rate limit or error we're trying to avoid?
  • How are you intending to run this in production- are you going to run the rake tasks repeatedly until everything is synced?

@fspeirs
Copy link
Copy Markdown
Contributor Author

fspeirs commented Apr 29, 2026

@zetter-rpf It was a request from the Salesforce team. I believe there are a number of post-ingest steps that may be run after import which may exhaust account credits capacity if too many arrive at once.

As for running in production, the intention was to probably run two or three times with LIMIT=200 until all are back-filled into Salesforce.

@zetter-rpf
Copy link
Copy Markdown
Contributor

zetter-rpf commented Apr 29, 2026

As for running in production, the intention was to probably run two or three times with LIMIT=200 until all are back-filled into Salesforce.

How do you know how big a batch to run or how frequently to run them?

Without knowing that I can't judge wether this solves the problem. And I imagine this would be very useful to know for someone running this in the future (even if the answer is just ask the Salesforce team).

@fspeirs fspeirs marked this pull request as draft April 29, 2026 13:28
@fspeirs
Copy link
Copy Markdown
Contributor Author

fspeirs commented Apr 29, 2026

I'm putting this into draft until I can get a bit more clarity from the Salesroce team.

@fspeirs fspeirs self-assigned this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants