-
Notifications
You must be signed in to change notification settings - Fork 0
[Example] Apply AI review to Erdos problem 7 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: erdos-all
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,151 @@ | ||
| # Review: Erdos Problem 7 | ||
|
|
||
| **File:** `FormalConjectures/ErdosProblems/7.lean` | ||
| **Reviewer:** Claude (automated) | ||
| **Date:** 2026-03-09 | ||
|
|
||
| --- | ||
|
|
||
| ## 1. Code Reuse | ||
|
|
||
| **Rating: Significant reuse opportunity missed.** | ||
|
|
||
| The file defines `IsCoveringSystem` and `HasDistinctModuli` locally in the `Erdos7` namespace. However, `FormalConjecturesForMathlib/NumberTheory/CoveringSystem.lean` already provides: | ||
|
|
||
| - `CoveringSystem R` — a structure over a general `CommSemiring R` with a finite index type, residues, ideals as moduli, a union-covers proof, and non-triviality. | ||
| - `StrictCoveringSystem R` — extends `CoveringSystem R` with `injective_moduli`, which encodes the "distinct moduli" constraint. | ||
|
|
||
| A `StrictCoveringSystem ℤ` is mathematically equivalent to `IsCoveringSystem S ∧ HasDistinctModuli S` as defined in Problem 7. Problem 273 already demonstrates using `StrictCoveringSystem` directly in a theorem statement. | ||
|
|
||
| Furthermore, Problems 2 and 8 define literally identical copies of `IsCoveringSystem` in their own namespaces. This is a codebase-wide consistency issue — at minimum, these definitions should be consolidated. | ||
|
|
||
| **Recommendation:** Restate `erdos_7` using `StrictCoveringSystem ℤ` with an additional constraint that each `moduli i` is a principal ideal generated by an odd number. This would align with the shared library and Problem 273's approach. If the `Finset`-based formulation is preferred for concreteness, it should be factored into a shared file and reused across Problems 2, 7, 8, 27, etc. | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Citations | ||
|
|
||
| **Rating: Mostly accurate, minor gaps.** | ||
|
|
||
| The docstring cites: | ||
| - **[HN19]** Hough, B. and Nielsen, N. — cited as "Solution of the minimum modulus problem for covering systems", Ann. of Math. (2019). | ||
| - **[Ba22]** Balister, P., Bollobás, B., Morris, R., Sahasrabudhe, J. and Tiba, M. — cited as "Covering systems, matchings, and odd covering systems" (2022). | ||
|
|
||
| Per erdosproblems.com/7: | ||
| - The [HN19] citation matches (listed as `HoNi19` on the website). | ||
| - The [Ba22] citation matches (listed as `BBMST22` on the website, with all five authors). The docstring correctly lists all five authors. | ||
|
|
||
| **Missing from the docstring:** | ||
| - The website notes that Selfridge showed the problem would have a positive answer if a covering system with pairwise non-divisible moduli existed, connecting this to Problem #586. This context is not mentioned. | ||
| - The website notes a $25 prize offered by Erdos and a separate $2000 prize by Selfridge for an explicit example. Prize information is not expected in formalizations but adds useful context. | ||
| - The problem is attributed to both Erdos and Selfridge (sometimes with Schinzel). The docstring attributes it only to Erdos. | ||
|
|
||
| **Recommendation:** Add a brief note mentioning the Selfridge connection and Problem 586. Consider using the website's citation keys (HoNi19, BBMST22) for consistency with the source. | ||
|
|
||
| --- | ||
|
|
||
| ## 3. Variants | ||
|
|
||
| **Rating: Incomplete — one important variant missing.** | ||
|
|
||
| The docstring mentions two results: | ||
| 1. [HN19]'s result that at least one modulus must be divisible by 2 or 3. | ||
| 2. [Ba22]'s result that no odd distinct covering system exists if moduli are squarefree. | ||
|
|
||
| The website identifies a **stronger variant**: "Is there a distinct covering system with all moduli odd **and squarefree**?" This was disproved by BBMST22. | ||
|
|
||
| While the docstring mentions the squarefree result in passing as a "known result," it is not formalized as a separate theorem. Given that this is a resolved variant (answered negatively), it would be a natural candidate for a companion theorem: | ||
|
|
||
| ```lean | ||
| theorem erdos_7_squarefree : ¬ ∃ S : Finset (ℤ × ℕ), | ||
| IsCoveringSystem S ∧ HasDistinctModuli S ∧ | ||
| (∀ p ∈ S, Odd p.2) ∧ (∀ p ∈ S, Squarefree p.2) := by | ||
| sorry | ||
| ``` | ||
|
|
||
| Additionally, the Selfridge conditional result (connection to Problem 586 — pairwise non-divisible moduli) is another variant worth noting. | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Readability | ||
|
|
||
| **Rating: Good, with minor suggestions.** | ||
|
|
||
| **Strengths:** | ||
| - Clean, minimal definitions with clear docstrings. | ||
| - The `IsCoveringSystem` definition is straightforward and easy to parse. | ||
| - Good use of LaTeX in docstrings. | ||
|
|
||
| **Suggestions:** | ||
| - The `HasDistinctModuli` name is slightly ambiguous — it could also mean "the set of moduli is non-trivial" in casual reading. A name like `HasPairwiseDistinctModuli` or `HasInjectiveModuli` would be more precise, though this is minor. | ||
| - The covering condition `(p.2 : ℤ) ∣ (n - p.1)` is equivalent to `n ≡ p.1 [ZMOD p.2]` but less immediately recognizable. Problems 204 and 277 use `Int.ModEq` notation (`z ≡ s.1 [ZMOD s.2]`), which is arguably more readable. However, the divisibility formulation is more direct for Lean proofs, so this is a stylistic choice. | ||
| - The theorem `erdos_7` uses `answer(sorry)`, which is the standard pattern for open problems with unknown truth values. This is correct and readable. | ||
|
|
||
| --- | ||
|
|
||
| ## 5. Formalizability | ||
|
|
||
| **Rating: High — the statement is precise and unambiguous.** | ||
|
|
||
| The mathematical statement "Is there a distinct covering system all of whose moduli are odd?" is fully precise: | ||
| - "Covering system" has a standard definition in combinatorial number theory. | ||
| - "Distinct" (meaning distinct moduli) is standard. | ||
| - "Odd moduli" is unambiguous. | ||
|
|
||
| There is **no meaningful ambiguity** in this problem statement. The only design choice is representational: how to encode covering systems in Lean (Finset-based, function-based, or structure-based). The chosen `Finset (ℤ × ℕ)` representation is concrete and natural. | ||
|
|
||
| One minor subtlety: the formalization allows a modulus of 1 (since `0 < p.2` permits `p.2 = 1`, and `1` is odd). A single congruence with modulus 1 trivially covers all integers, but the `HasDistinctModuli` constraint means at most one modulus-1 congruence can appear. Since modulus 1 is odd, the existence claim would be trivially true if the distinct-moduli constraint allowed it. However, `HasDistinctModuli` as defined only requires that elements with the same modulus are equal — it does not forbid modulus 1. So the set `{(0, 1)}` satisfies all three conditions (`IsCoveringSystem`, `HasDistinctModuli`, and all moduli odd). **This means `erdos_7` as formalized is trivially true**, which is almost certainly not the intended formalization. | ||
|
|
||
| The standard definition of a covering system in this context requires all moduli to be **greater than 1** (or equivalently ≥ 2). Problem 586 uses `2 ≤ n i` for exactly this reason. The website's description and the mathematical literature uniformly assume moduli ≥ 2 in covering system problems. | ||
|
|
||
| **This is a critical correctness issue** — see Section 6 below. | ||
|
|
||
| --- | ||
|
|
||
| ## 6. Correctness | ||
|
|
||
| **Rating: Incorrect — admits trivial solution.** | ||
|
|
||
| ### Critical Issue: Modulus 1 is permitted | ||
|
|
||
| As analyzed in Section 5, the set `S = {(0, 1)}` satisfies: | ||
| - `IsCoveringSystem S`: S is nonempty, `0 < 1`, and every integer `n` satisfies `1 ∣ (n - 0)`. | ||
| - `HasDistinctModuli S`: vacuously, the single element has a unique modulus. | ||
| - `∀ p ∈ S, Odd p.2`: `Odd 1` is true. | ||
|
|
||
| Therefore `erdos_7` is trivially provable with `answer(sorry) = True`, which contradicts the problem's open status. | ||
|
|
||
| **Fix:** Add the constraint `1 < p.2` (or equivalently `2 ≤ p.2`) to either `IsCoveringSystem` or the theorem statement: | ||
|
|
||
| ```lean | ||
| theorem erdos_7 : answer(sorry) ↔ | ||
| ∃ S : Finset (ℤ × ℕ), IsCoveringSystem S ∧ HasDistinctModuli S ∧ | ||
| ∀ p ∈ S, Odd p.2 ∧ 1 < p.2 := by | ||
| sorry | ||
| ``` | ||
|
|
||
| Alternatively, modify `IsCoveringSystem` to require `1 < p.2` instead of `0 < p.2`. | ||
|
|
||
| ### Secondary Issue: `HasDistinctModuli` is stronger than necessary | ||
|
|
||
| The definition `∀ p ∈ S, ∀ q ∈ S, p.2 = q.2 → p = q` says that elements of `S` are uniquely determined by their modulus. This means not only are the moduli distinct, but **there is at most one residue per modulus**. This is indeed the standard meaning of "distinct covering system" (each modulus appears exactly once), so this is correct. However, it's worth noting that this rules out systems like `{(0, 3), (1, 3)}` which have the same modulus but different residues — which is the intended behavior. | ||
|
|
||
| ### Note on `IsCoveringSystem` requiring `Nonempty` | ||
|
|
||
| The `S.Nonempty` condition is redundant given the covering property (if `S` were empty, no integer would be covered). This is not a correctness issue but a minor redundancy. | ||
|
|
||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| | Criterion | Rating | Key Issue | | ||
| |-----------|--------|-----------| | ||
| | Code Reuse | Needs improvement | Should use `StrictCoveringSystem` or shared definitions | | ||
| | Citations | Good | Minor gaps (Selfridge attribution, Problem 586 connection) | | ||
| | Variants | Incomplete | Squarefree variant should be formalized separately | | ||
| | Readability | Good | Minor naming suggestions | | ||
| | Formalizability | High | Statement is mathematically precise | | ||
| | Correctness | **Incorrect** | **Modulus 1 makes the problem trivially true** | | ||
|
|
||
| ### Priority Fix | ||
| The modulus-1 issue must be resolved. The covering system definition should require `1 < p.2` (moduli ≥ 2) to match the standard mathematical definition and the problem's open status. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Review Math | ||
|
|
||
| Assume the role of a PhD-level mathematician. | ||
|
|
||
| The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md. | ||
|
|
||
| 1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also add that if a definition is general enough and / or used across many problems, then he can also move that definition to the FormalConjecturesForMathlib Folder for future usage.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with this suggestion, though I think I'll decouple it into a separate step like, "review all files, identify common implementations of definitions, and abstract them out" |
||
| 2. Citations - Fetch data from https://www.erdosproblems.com/NUM to ensure any citations included in docstrings are documented as they exist on the website as opposed to shorthand references. | ||
| 3. Variants - Are all variants of the problem captured by the formalization? | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Are also all variants of the problem formalized", might be better as he might misunderstand the other sentence, in the sense that he should find one formalized statement which is general enough to encompass all variants or so. |
||
| 4. Readability - Could the code be made more readable? | ||
| 5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement. | ||
| 6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself? | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a good keyword is faithful here, correct could be misinterpreted as "Does the Lean compile?". So something along the lines, "Is the formalization faithful to the Natural Language Statement from the source material?".
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fun lesson from all this - when I was experimenting with method, I had tried to tell Claude Haiku to work on a batch of 80 problems with a single prompt, and it produced 80 lean files with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, thats why we need AI Safety and Human Reviews :D There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other point, which one could test is that he should write it in the docstring, if a statement had a bit of ambiguity and what way he chose to solve that ambiguity. We might have to be wary that it doesnt get out of hand, but one could experiment with it. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a additional step asking for sensible category test statements? Those might help increasing the confidence that the formalization are correct. But this can also be done later and for now are not necessarily needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to google "lean category test statements", looks like there would be some way within lean to more rigorously confirm that the code was implemented faithfully? If so, agree, is a good idea and can be implemented independently, will look into further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I should have specified what I meant by [category test] is that we have a custom categories in Formal Conjectures. Here is for example a case, where category test statements are used for the Dedekind Number Definitions.
They can increase reviewability by giving further evidence of the correctness / faithfulness of the mathematicial definitions. If sensible of course, not every definition can be tested sensible.