Skip to content

Add pytest test suite and CI for data_format module#65

Open
francose wants to merge 2 commits intoComcast:mainfrom
francose:feat/add-test-infrastructure
Open

Add pytest test suite and CI for data_format module#65
francose wants to merge 2 commits intoComcast:mainfrom
francose:feat/add-test-infrastructure

Conversation

@francose
Copy link
Copy Markdown

@francose francose commented May 10, 2026

Noticed the repo doesn't have a test suite yet, so I put one together for the core detection module.

Added 18 tests covering keys_extractor(), credential_extractor(), remove_url_from_keys(), and remove_url_from_creds() — the core detection pipeline. Tests cover true positives (AWS keys, private key headers, Slack webhooks), true negatives (plain text, empty strings), and the preprocessing helpers.

Also added a GitHub Actions workflow that runs pytest on Python 3.9/3.10/3.11 on every push and PR to main. Nothing fancy, just the basics so regressions get caught before they ship.

Tests pass against current main. Kept the scope tight — just data_format.py for now. Happy to expand to the other modules if this lands.

Copilot AI review requested due to automatic review settings May 10, 2026 18:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first pytest-based test suite for the core xgitguard.common.data_format extraction/preprocessing functions and introduces a GitHub Actions workflow to run the tests on pushes/PRs.

Changes:

  • Added pytest coverage for keys_extractor(), credential_extractor(), remove_url_from_keys(), and remove_url_from_creds().
  • Added a CI workflow to run the test suite across a Python version matrix.
  • Added tests package initializer.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
tests/test_data_format.py New unit tests covering the key/credential extraction pipeline and URL/special-char stripping helpers.
tests/__init__.py Marks tests as a package (empty initializer).
.github/workflows/tests.yml New GitHub Actions workflow to install deps and run pytest on pushes/PRs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/tests.yml Outdated
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
Comment thread .github/workflows/tests.yml Outdated
Comment on lines +27 to +28
pip install -r requirements.txt
pip install pytest
Comment thread tests/test_data_format.py Outdated
prove the detection pipeline works end to end.
"""

import pytest
Comment thread tests/test_data_format.py Outdated
Comment on lines +10 to +14
import sys
import os

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

- Remove unused pytest import and sys.path hack from test module
- Install only urlextract+pytest in CI instead of full requirements.txt
- Drop Python 3.11 from matrix (pinned numpy/pandas don't support it)
@francose
Copy link
Copy Markdown
Author

Addressed all four Copilot suggestions in 5767bdc:

  • Dropped Python 3.11 from CI matrix — numpy==1.22.0 and pandas==1.3.5 don't ship wheels for it
  • Slimmed CI deps to just urlextract + pytest — tests don't need the ML stack
  • Removed unused pytest import
  • Removed sys.path hack — python -m pytest handles it from project root

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants