I built this after getting frustrated with traditional linters (flake8, pylint, etc.) that only catch syntax or style issues, not logical drift.
In large Python projects, I noticed we often had multiple implementations of the same business rule — all valid syntactically, but inconsistent logically.
Example:
apply_discount() in one file used price * 0.8, while another used price * 0.85 for the same user type.
Tests passed, linting passed — yet production behavior diverged.
Hefesto was designed to detect exactly this kind of semantic mismatch before commit.
How it works:
Parses your codebase, extracts function-level representations.
Uses sentence-transformers to build semantic embeddings for each function.
Compares them to spot near-duplicates with divergent logic.
Optionally uses an OpenAI model to propose normalized fixes or highlight intent mismatches.
Runs as a FastAPI service or CLI (hefesto analyze --project myapp/) integrated with pre-commit or CI/CD.
I’m especially interested in feedback on:
scaling to large repos (>100K LOC),
balancing false positives vs. meaningful matches,
and whether it makes sense to generalize to TypeScript or Go next.
Hey everyone — thanks for checking out Hefesto!
I built this after getting frustrated with traditional linters (flake8, pylint, etc.) that only catch syntax or style issues, not logical drift. In large Python projects, I noticed we often had multiple implementations of the same business rule — all valid syntactically, but inconsistent logically.
Example: apply_discount() in one file used price * 0.8, while another used price * 0.85 for the same user type. Tests passed, linting passed — yet production behavior diverged. Hefesto was designed to detect exactly this kind of semantic mismatch before commit.
How it works:
Parses your codebase, extracts function-level representations.
Uses sentence-transformers to build semantic embeddings for each function.
Compares them to spot near-duplicates with divergent logic.
Optionally uses an OpenAI model to propose normalized fixes or highlight intent mismatches.
Runs as a FastAPI service or CLI (hefesto analyze --project myapp/) integrated with pre-commit or CI/CD.
I’m especially interested in feedback on:
scaling to large repos (>100K LOC),
balancing false positives vs. meaningful matches,
and whether it makes sense to generalize to TypeScript or Go next.
Repo (MIT): https://github.com/artvepa80/Agents-Hefesto
Happy to answer any technical questions or share benchmarks if people are curious.