Feat(MachineLearning/PAC): PAC Learning Definitions and Sample Complexity Lower Bounds by SamuelSchlesinger · Pull Request #483 · leanprover/cslib

SamuelSchlesinger · 2026-04-10T00:04:29Z

This PR defines the Probably Approximately Correct (PAC) learning model. Then, it formalizes the information theoretic lower bound on sample complexity for PAC learning implemented in "A general lower bound on the number of examples needed for learning" (Andrzej Ehrenfeucht, David Haussler, Michael Kearns, Leslie Valiant, 1989).

Organized the PR such that it can be reviewed commit by commit.

Introduce the core PAC learning model: ConceptClass, LabeledSample, Learner, hypothesisError, sampleOf, and seenElements. Adds module index entry and bibliography references for [EHKV1989] and [Valiant1984].

Define SetShatters and vcDim for concept classes. Bridge to Mathlib's Finset.Shatters via finsetShatters_iff_setShatters.

Reusable lemmas for the EHKV proof: Bernoulli's inequality, product measure support, seenElements measurability, and sample agreement.

Combinatorial core of the EHKV lower bound: an involution on concepts pairs each with its complement on unseen points, showing at least half the concepts force large error on any bad sample.

Construct the discrete probability measure on the shattered set used in the EHKV proof: heavy mass on one point, uniform on the rest.

Assemble the full EHKV proof: Markov bound on bad samples, involution pairing, and adversarial measure yield the lower bound m ≥ (VCdim(C) - 1) / (32ε) for any (ε, δ)-learner.

SamuelSchlesinger · 2026-04-10T01:01:07Z

If this is too much to review in one pass, I am happy to make a sequence of PRs to build it up. I think the review commit by commit works personally, but I am happy to oblige whatever folks would prefer.

Making a pass of proof golfing right now, I hope some of these lemmas can be simplified at least somewhat.

- Replace verbose `show ... from by rw [...]` with direct `← lemma` rewrites - Use `tauto` to close symmetric case analysis in hypothesisError_eq_of_inter_eq - Simplify sampleOf_eq_of_agree to one-line `simp` - Shorten SetShatters.subset proof using `▸` and `id` - Replace `Pi.one_apply` for indicator simplification in adversarialMeasure_singleton - Use dot notation `.le`/`.trans` over `le_of_lt`/`le_trans`

SamuelSchlesinger added 6 commits April 9, 2026 19:59

feat(MachineLearning/PACLearning): add PAC learning definitions

525a336

Introduce the core PAC learning model: ConceptClass, LabeledSample, Learner, hypothesisError, sampleOf, and seenElements. Adds module index entry and bibliography references for [EHKV1989] and [Valiant1984].

feat(MachineLearning/PACLearning): add VC dimension and shattering

8b68590

Define SetShatters and vcDim for concept classes. Bridge to Mathlib's Finset.Shatters via finsetShatters_iff_setShatters.

feat(MachineLearning/PACLearning): add sample complexity helper lemmas

321a1c3

Reusable lemmas for the EHKV proof: Bernoulli's inequality, product measure support, seenElements measurability, and sample agreement.

feat(MachineLearning/PACLearning): add involution pairing argument

a327384

Combinatorial core of the EHKV lower bound: an involution on concepts pairs each with its complement on unseen points, showing at least half the concepts force large error on any bad sample.

feat(MachineLearning/PACLearning): add adversarial measure construction

d58580c

Construct the discrete probability measure on the shattered set used in the EHKV proof: heavy mass on one point, uniform on the rest.

feat(MachineLearning/PACLearning): EHKV sample complexity lower bound

0282b8e

Assemble the full EHKV proof: Markov bound on bad samples, involution pairing, and adversarial measure yield the lower bound m ≥ (VCdim(C) - 1) / (32ε) for any (ε, δ)-learner.

SamuelSchlesinger requested review from chenson2018 and fmontesi as code owners April 10, 2026 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(MachineLearning/PAC): PAC Learning Definitions and Sample Complexity Lower Bounds#483

Feat(MachineLearning/PAC): PAC Learning Definitions and Sample Complexity Lower Bounds#483
SamuelSchlesinger wants to merge 7 commits intoleanprover:mainfrom
SamuelSchlesinger:feat/pac-learning-sample-complexity

SamuelSchlesinger commented Apr 10, 2026

Uh oh!

SamuelSchlesinger commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamuelSchlesinger commented Apr 10, 2026

Uh oh!

SamuelSchlesinger commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant