FregeLogic at SemEval 2026 Task 11: A hybrid neuro-symbolic architecture for content-robust syllogistic validity prediction
2026
We present FregeLogic, a hybrid neurosymbolic system for SemEval-2026 Task 11 (Subtask 1), which addresses syllogistic validity prediction while reducing content effects on predictions. Our approach combines an ensemble of five LLM classifiers, spanning three open-weights models (Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B) paired with varied prompting strategies, with a Z3 SMT solver that serves as a formal logic tiebreaker. The central hypothesis is that LLM disagreement within the ensemble signals likely contentbiased errors, where real-world believability interferes with logical judgment. By deferring to Z3's structurally-grounded formal verification on these disputed cases, our system achieves 94.3% accuracy with a content effect of 2.85 and a combined score of 41.88 in nested 5-fold cross-validation on the dataset (N=960). This represents a 2.76-point improvement in combined score over the pure ensemble (39.12), with a 0.9% accuracy gain, driven by a 16% reduction in content effect (3.39 → 2.85). Adopting structured-output API calls for Z3 extraction reduced failure rates from ∼22% to near zero, and an Aristotelian encoding with existence axioms was validated against task annotations. Our results suggest that targeted neurosymbolic integration, applying formal methods precisely where ensemble consensus is lowest, can improve the combined accuracy-pluscontent-effect metric used by this task.
Research areas