Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,33 @@ OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=1.0
```

### Cron Monitoring

If you run this tool on a schedule, you'll often want to be alerted when a backup
run fails to start or complete. To support this, GitHub Backup can report the
state of each scheduled run to an HTTP-based cron monitoring service such as
[Sentry Cron Monitors](https://docs.sentry.io/product/crons/) or
[healthchecks.io](https://healthchecks.io/).

Monitoring is configured under the top-level `ping` key, where you can provide
a separate URL for each state you care about. Each URL is fetched with a simple
HTTP `GET` request when the corresponding state is reached, and any state you
omit is simply not reported.

```yaml
ping:
# Fetched when a backup run starts.
start: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=in_progress
# Fetched when a backup run completes successfully.
success: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=ok
# Fetched when a backup run completes with one or more errors.
failure: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=error
```

A run is reported as a `failure` if any policy reports one or more errors, and as
a `success` otherwise. Reporting is best-effort: if the monitoring service can't
be reached, a warning is logged but the backup run itself is unaffected.

## Filters

This tool allows you to configure filters to control which GitHub repositories are backed up and
Expand Down
3 changes: 2 additions & 1 deletion docs/.vuepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ export default defineUserConfig({
children: [
'/guide/README.md',
'/guide/enterprise.md',
'/guide/telemetry.md'
'/guide/telemetry.md',
'/guide/monitors.md'
]
},
{
Expand Down
69 changes: 69 additions & 0 deletions docs/guide/monitors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Cron Monitoring
GitHub Backup is designed to run unattended on a schedule, which makes it
important to know when a backup run fails to start or complete. To support this,
GitHub Backup can report the state of each scheduled run to an HTTP-based cron
monitoring service such as [Sentry Cron Monitors](https://docs.sentry.io/product/crons/)
or [healthchecks.io](https://healthchecks.io/).

Whenever a backup run starts or completes, GitHub Backup will make a simple HTTP
`GET` request to the URL you've configured for that state, allowing your
monitoring service to track whether your backups are running as expected and to
alert you if they stop.

## Configuration
Monitoring is configured under the top-level `ping` key in your configuration
file. You may provide a separate URL for each of the `start`, `success`, and
`failure` states, and any state you leave out is simply not reported.

```yaml
schedule: "0 * * * *"

ping:
# Fetched when a backup run starts.
start: https://example.com/monitor/start
# Fetched when a backup run completes successfully.
success: https://example.com/monitor/success
# Fetched when a backup run completes with one or more errors.
failure: https://example.com/monitor/failure

backups:
- kind: github/repo
from: user
to: /backup/github
credentials: !Token your_access_token
```

A run is reported as a `failure` if any backup policy reports one or more errors,
and as a `success` otherwise.

::: tip
Reporting is best-effort. If the monitoring service can't be reached, a warning is
logged but the backup run itself is never affected, ensuring that a flaky monitor
can't cause an otherwise healthy backup to be reported as failed.
:::

## Examples

### Sentry
[Sentry's Cron Monitors](https://docs.sentry.io/product/crons/getting-started/http/)
expose a check-in URL which accepts a `status` query parameter. You can point each
state at the same URL while varying the `status` value to report the lifecycle of
your backups.

```yaml
ping:
start: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=in_progress
success: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=ok
failure: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=error
```

### healthchecks.io
[healthchecks.io](https://healthchecks.io/) provides a base ping URL, with
`/start` and `/fail` suffixes used to signal the start and failure of a run.

```yaml
ping:
start: https://hc-ping.com/your-uuid/start
success: https://hc-ping.com/your-uuid
failure: https://hc-ping.com/your-uuid/fail
```
9 changes: 9 additions & 0 deletions examples/config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
schedule: "0 * * * *"

# Optionally report the status of each backup run to an HTTP-based cron
# monitoring service (such as Sentry Crons or healthchecks.io). Each URL is
# fetched with a simple HTTP GET request when the corresponding state is
# reached, and any state you leave out is simply not reported.
ping:
start: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=in_progress
success: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=ok
failure: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=error

backups:
# Backup all the repositories that the provided credentials have access to
- kind: github/repo
Expand Down
37 changes: 36 additions & 1 deletion src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,16 @@ use human_errors::ResultExt;
use serde::{Deserialize, Deserializer};
use std::str::FromStr;

use crate::{Args, policy::BackupPolicy};
use crate::{Args, ping::PingConfig, policy::BackupPolicy};

#[derive(Deserialize)]
pub struct Config {
#[serde(default, deserialize_with = "deserialize_cron")]
pub schedule: Option<croner::Cron>,

#[serde(default)]
pub ping: PingConfig,

#[serde(default)]
pub backups: Vec<BackupPolicy>,
}
Expand Down Expand Up @@ -63,6 +66,38 @@ mod tests {
assert!(config.schedule.is_none());
}

#[test]
fn deserialize_ping_not_provided() {
let config: Config = serde_yaml::from_str("").unwrap();
assert_eq!(config.ping, crate::ping::PingConfig::default());
}

#[test]
fn deserialize_ping() {
let config: Config = serde_yaml::from_str(
r#"
ping:
start: https://example.com/start
success: https://example.com/success
failure: https://example.com/failure
"#,
)
.unwrap();

assert_eq!(
config.ping.start.as_deref(),
Some("https://example.com/start")
);
assert_eq!(
config.ping.success.as_deref(),
Some("https://example.com/success")
);
assert_eq!(
config.ping.failure.as_deref(),
Some("https://example.com/failure")
);
}

#[test]
#[cfg_attr(feature = "pure_tests", ignore)]
fn deserialize_example_config() {
Expand Down
69 changes: 58 additions & 11 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ use clap::Parser;
use engines::BackupState;
use human_errors::Error;
use pairing::PairingHandler;
use std::sync::atomic::AtomicBool;
use ping::Pinger;
use std::sync::atomic::{AtomicBool, AtomicUsize};
use std::time::Duration;
use tracing_batteries::prelude::*;
use tracing_batteries::{OpenTelemetry, Session, Umami};
Expand All @@ -16,6 +17,7 @@ mod entities;
mod errors;
pub(crate) mod helpers;
mod pairing;
mod ping;
mod policy;
mod sources;
mod target;
Expand Down Expand Up @@ -51,6 +53,8 @@ pub struct Args {
async fn run(args: Args, session: &Session) -> Result<(), Error> {
let config = config::Config::try_from(&args)?;

let pinger = Pinger::new(config.ping.clone());

let github_repo = pairing::Pairing::new(
sources::GitHubRepoSource::default(),
engines::RepoEngine::new(),
Expand Down Expand Up @@ -78,6 +82,10 @@ async fn run(args: Args, session: &Session) -> Result<(), Error> {
.as_ref()
.and_then(|s| s.find_next_occurrence(&chrono::Utc::now(), false).ok());

let handler = LoggingPairingHandler::default();

pinger.on_start().await;

{
let _span = info_span!("backup.all").entered();

Expand All @@ -88,21 +96,15 @@ async fn run(args: Args, session: &Session) -> Result<(), Error> {
match policy.kind.as_str() {
k if k == GitHubArtifactKind::Repo.as_str() => {
info!("Backing up repositories for {}", &policy);
github_repo
.run(policy, &LoggingPairingHandler, &CANCEL)
.await;
github_repo.run(policy, &handler, &CANCEL).await;
}
k if k == GitHubArtifactKind::Release.as_str() => {
info!("Backing up release artifacts for {}", &policy);
github_release
.run(policy, &LoggingPairingHandler, &CANCEL)
.await;
github_release.run(policy, &handler, &CANCEL).await;
}
k if k == GitHubArtifactKind::Gist.as_str() => {
info!("Backing up gist artifacts for {}", &policy);
github_gist
.run(policy, &LoggingPairingHandler, &CANCEL)
.await;
github_gist.run(policy, &handler, &CANCEL).await;
}
_ => {
error!("Unknown policy kind: {}", policy.kind);
Expand All @@ -112,9 +114,17 @@ async fn run(args: Args, session: &Session) -> Result<(), Error> {
}

if CANCEL.load(std::sync::atomic::Ordering::Relaxed) {
// The run was interrupted (e.g. by SIGINT), so we deliberately avoid
// reporting either success or failure to the cron monitor.
break;
}

if handler.errors() > 0 {
pinger.on_failure().await;
} else {
pinger.on_success().await;
}

if let Some(next_run) = next_run {
info!("Next backup scheduled for: {}", next_run);

Expand All @@ -131,7 +141,19 @@ async fn run(args: Args, session: &Session) -> Result<(), Error> {
Ok(())
}

pub struct LoggingPairingHandler;
#[derive(Default)]
pub struct LoggingPairingHandler {
errors: AtomicUsize,
}

impl LoggingPairingHandler {
/// The total number of errors observed across every policy reported to this
/// handler, used to decide whether a backup run should be reported as a
/// success or a failure to the cron monitor.
fn errors(&self) -> usize {
self.errors.load(std::sync::atomic::Ordering::Relaxed)
}
}

impl<E: BackupEntity> PairingHandler<E> for LoggingPairingHandler {
fn on_complete(&self, entity: E, state: BackupState) {
Expand All @@ -144,6 +166,8 @@ impl<E: BackupEntity> PairingHandler<E> for LoggingPairingHandler {
}

fn on_error(&self, error: Error) {
self.errors
.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
warn!("Error: {}", error);
}

Expand Down Expand Up @@ -186,3 +210,26 @@ async fn main() {
session.shutdown();
}
}

#[cfg(test)]
mod tests {
use super::*;
use crate::entities::GitRepo;

#[test]
fn logging_handler_counts_errors() {
let handler = LoggingPairingHandler::default();
assert_eq!(handler.errors(), 0);

// Each reported error should be accumulated so that a run with any
// failures can be reported to the cron monitor as a failure.
PairingHandler::<GitRepo>::on_error(&handler, human_errors::user("boom", &[]));
PairingHandler::<GitRepo>::on_error(&handler, human_errors::user("boom", &[]));
assert_eq!(handler.errors(), 2);

// Successful completions must not affect the error count.
let repo = GitRepo::new("octocat/Hello-World", "https://example.com/repo.git", None);
handler.on_complete(repo, BackupState::Skipped);
assert_eq!(handler.errors(), 2);
}
}
Loading