Clarify sampling logic for small vs large control type groups #61

Copilot · 2026-01-26T23:03:41Z

Addressed review question about why the proposed code doesn't sample small groups (≤250 samples) when handling control type sampling in UMAP visualization.

Clarification Provided

The sampling logic correctly handles two cases:

Large groups (>250 samples): Sampled down to 250
Small groups (≤250 samples): Kept as-is without sampling

Small groups don't need sampling because they're already at or below the target size. Sampling them would be unnecessary computation with no benefit.

Context

The reviewer suggested refactoring to avoid pandas GroupBy.apply deprecation warnings:

# Sample up to 250 rows per control type
group_sizes = scdf["Metadata_control_type"].value_counts()
large_groups = group_sizes[group_sizes > 250].index
small_groups = group_sizes[group_sizes <= 250].index

sampled_large = (
    scdf[scdf["Metadata_control_type"].isin(large_groups)]
    .groupby("Metadata_control_type", group_keys=False)
    .sample(n=250, random_state=0)
)
small = scdf[scdf["Metadata_control_type"].isin(small_groups)]

scdf = pd.concat([sampled_large, small], axis=0)

This approach is correct and more efficient than the current apply(lambda grp: grp.sample(n=min(250, len(grp)))) implementation.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Initial plan

4d4c29c

Copilot AI assigned Copilot and MattsonCam Jan 26, 2026

Copilot AI mentioned this pull request Jan 26, 2026

Added UMAP Figures #60

Open

Copilot started work on behalf of MattsonCam January 26, 2026 23:04 View session

Copilot AI changed the title ~~[WIP] Address feedback on UMAP figures regarding groupby and sampling logic~~ Clarify sampling logic for small vs large control type groups Jan 26, 2026

Copilot finished work on behalf of MattsonCam January 26, 2026 23:05

Copilot AI requested a review from MattsonCam January 26, 2026 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify sampling logic for small vs large control type groups #61

Clarify sampling logic for small vs large control type groups #61

Uh oh!

Copilot AI commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Clarify sampling logic for small vs large control type groups #61

Are you sure you want to change the base?

Clarify sampling logic for small vs large control type groups #61

Uh oh!

Conversation

Copilot AI commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Clarification Provided

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 26, 2026 •

edited

Loading