Skip to content

Fix: Comprehensive XGBoost 2.0+ base_score compatibility (NoneType & List parsing) for high-performance transpilation#598

Open
Virtuoso-8051 wants to merge 1 commit intoBayesWitnesses:masterfrom
Virtuoso-8051:fix-xgboost2-base-score
Open

Fix: Comprehensive XGBoost 2.0+ base_score compatibility (NoneType & List parsing) for high-performance transpilation#598
Virtuoso-8051 wants to merge 1 commit intoBayesWitnesses:masterfrom
Virtuoso-8051:fix-xgboost2-base-score

Conversation

@Virtuoso-8051
Copy link
Copy Markdown

@Virtuoso-8051 Virtuoso-8051 commented Apr 3, 2026

This PR resolves the persistent TypeError: unsupported operand type(s) for /: 'float' and 'NoneType' encountered when transpiling models trained with XGBoost 2.0 and higher (up to 3.2.0).

While PR #596 correctly identified the issue with XGBoost's new automatic parameter estimation, this PR provides a more comprehensive fix. It handles both NoneType defaults and the scenario where newer XGBoost architectures serialize base_score as a single-element list.

The Fix:
Updated _assemble_bin_class_output in m2cgen/assemblers/boosting.py to safely extract the float value:

Python

    current_score = self._base_score
    
    if current_score is None:
        current_score = 0.5
    elif isinstance(current_score, list):
        current_score = current_score[0]

    base_score = -math.log(1.0 / current_score - 1.0)

Real-World Validation & Performance Impact:
This fix is critical for modern MLOps pipelines. Being locked to XGBoost 1.7.6 to avoid this bug creates severe training bottlenecks.

I validated this patch in a bare-metal C++ hardware simulation project: AI-Driven CPU Branch Predictor

Dataset: 31.8 million execution trace rows.

Model: 100-tree XGBoost binary classifier.

Performance Gain: By allowing the pipeline to natively upgrade to XGBoost 3.2.0, multi-threaded training time dropped from 201.68 seconds (v1.7.6) down to 98.67 seconds (v3.2.0) on the exact same hardware.

Transpilation Success: The transpiled bare-metal C++ logic successfully compiled and executed against the 31.8M row trace with a mathematically perfect 86.76% accuracy.

This patch allows developers to leverage modern XGBoost training speeds without breaking the m2cgen extraction pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant