Softmax update by bugracyln · Pull Request #1494 · fastmachinelearning/hls4ml

bugracyln · 2026-06-25T18:50:11Z

Description

📝 Please include a summary of the change.

The softmax table generation logic was updated. The implementation for writing the softmax tables was revised, and memory attributes were added to enable a more efficient FPGA compilation flow. In addition, the templates were modified to use weights directly from the configuration.

Please also include relevant motivation and context.

The primary motivation for these changes was to bring the oneAPI backend closer to the Vivado backend in terms of implementation.

Memory attributes were added to enable memory banking on the FPGA, allowing for more efficient memory access. The weights are now copied directly into the configuration so that the compiler can recognise the entire table as a set of fixed values. This enables the memory to be implemented more efficiently, resulting in improved resource utilisation during FPGA compilation.

List any dependencies that are required for this change.

N/A

Type of change

For a new feature or function, please create an issue first to discuss it
with us before submitting a pull request.

Note: Please delete options that are not relevant.

Bug fix (non-breaking change that fixes an issue)
Documentation update
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality not to work as expected)
A new research paper code implementation
Other (Specify)

Tests

📝 Please describe the tests that you ran to verify your changes.

The changes were primarily verified using black-box tests on an isolated softmax unit. Testing was performed for both quantised and non-quantised implementations. For the quantised version, both configurations, with and without exp and inv table quantisers (QuantiserConfig(...)), were tested.

Additional testing included:

Generating FPGA RTL reports.
Building the emulator.
Performing a hardware compilation using the new Intel oneAPI compiler.

This PR currently supports only the Intel oneAPI compiler. Support for the Altera HLS compiler will be added in a future PR.

The implementation was also evaluated with different table sizes, and the resulting RTL reports were inspected to verify improvements in resource utilisation.

Provide instructions so we can reproduce.

A Python test file and a Keras model containing only a single softmax layer (Softmax or QSoftmax) were used. For the quantised implementation, the input and output quantisers for the exp and inv lookup tables were configured using QuantiserConfig(...). Tests were run with both the quantisers enabled and disabled.

The test configuration included:

Standard Softmax and QSoftmax models.
Explicit exp and inv table input output quantisation.
FPGA RTL generation.
Emulator build.
Hardware compilation with the Altera HLS (newer version of Intel oneAPI)compiler.

Please also list any relevant details for your test configuration.

Test Configuration:

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

…pi_weights

…api_qmha

…i_qmha

jmitrevs · 2026-06-25T18:56:40Z

We should probably squash the commits when merging. This should also be coordinated with #1476. We should see how best to do that.

JanFSchulte · 2026-06-25T19:03:00Z

Could we also get any form of description of what this is? I was about to close it as spam before I noticed that this was related to work by Lauri.

calad0i · 2026-06-25T21:07:07Z

About to close it as spam also if not seeing the please-test tag... Could you add some descriptions?

Copilot

Pull request overview

Updates the oneAPI backend softmax implementation to generate per-layer lookup tables as compile-time constants and wire them into the generated configuration, aiming to align the oneAPI flow more closely with the Vivado backend and improve FPGA compilation/resource utilization.

Changes:

Generate per-softmax-layer exp/inv lookup tables as headers and auto-include them into parameters.h.
Update oneAPI softmax C++ templates to use lookup tables from CONFIG_T instead of #included .tb fragments.
Extend oneAPI softmax config generation with per-table sizing/type plumbing and add a multidimensional softmax helper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`hls4ml/writer/oneapi_writer.py`	Generates and includes per-layer softmax exp/inv table headers during oneAPI project emission.
`hls4ml/templates/oneapi/firmware/parameters.h`	Adds an insertion point for writer-generated softmax table includes.
`hls4ml/templates/oneapi/firmware/nnet_utils/nnet_activation.h`	Reworks stable softmax to use `CONFIG_T` tables and adds a multidim helper implementation.
`hls4ml/templates/oneapi/firmware/nnet_utils/nnet_activation_stream.h`	Reworks streaming stable softmax to use `CONFIG_T` tables and cleans up type aliases.
`hls4ml/backends/oneapi/passes/core_templates.py`	Extends softmax config generation with exp/inv table wiring and sizing/typing logic.
`hls4ml/backends/oneapi/oneapi_backend.py`	Removes the prior oneAPI softmax multidimensional `io_parallel` restriction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                    ac_type = layer.get_attr('inp_norm_t')
+
+                    if ac_type is not None:
+                        try:
+                            fp_bits = ac_type.precision.integer + ac_type.precision.fractional
+                            fp_integer = ac_type.precision.integer
+                            fp_signed = ac_type.precision.signed
+                        except Exception:
+                            # FixedPrecisionType wasn't correctly stored in layer attributes, use default values
+                            pass
+                        if fp_signed is False:
+                            raise Exception('Softmax types need to be signed')


jmitrevs · 2026-07-01T16:30:12Z

+                    ac_type = layer.get_attr('inv_inp_t')
+
+                    if ac_type is not None:
+                        try:
+                            fp_bits = ac_type.precision.integer + ac_type.precision.fractional
+                            fp_integer = ac_type.precision.integer
+                            fp_signed = ac_type.precision.signed
+                        except Exception:
+                            # FixedPrecisionType wasn't correctly stored in layer attributes, use default values
+                            pass
+                        if fp_signed is False:
+                            raise Exception('Softmax types need to be signed')


This will all go away with #1476, so either incorporate things from there or ignore it for now.

+    using {exp_table_name}_arr_t = nnet::array<exp_table_t, exp_table_size>;
+    using {inv_table_name}_arr_t = nnet::array<inv_table_t, inv_table_size>;
+    static constexpr const {exp_table_name}_arr_t exp_table = {exp_table_name};
+    static constexpr const {inv_table_name}_arr_t invert_table = {inv_table_name};


+// *************************************************
+//       Multidimensional Softmax
+// *************************************************
+
+// Helper to remap the config for the core softmax function
+template <class CONFIG_T> struct softmax_multidim_slice_config : CONFIG_T {
+    static constexpr unsigned n_in = CONFIG_T::n_slice;
+};


    @layer_optimizer(Activation)
    def init_activation(self, layer):
        if layer.get_attr('activation') == 'tanh':
            layer.set_attr('activation', 'dense_tanh')
        if layer.get_attr('recurrent_activation') == 'tanh':
            layer.set_attr('recurrent_activation', 'dense_tanh')

-    @layer_optimizer(Softmax)
-    def init_softmax(self, layer):
-        if layer.model.config.get_config_value('IOType') == 'io_parallel':
-            assert len(layer.get_input_variable().shape) == 1, (
-                'Softmax with io_parallel strategy cannot be used on multidimensional tensors.'
-            )
-
    @layer_optimizer(Embedding)


jmitrevs · 2026-07-01T16:02:56Z

+        if params['type'] == 'softmax':
+            # The lookup input (x - x_max) is always <= 0, so only the negative half
+            if 'exp_table_size' in params and params['exp_table_size'] is not None:
+                params['exp_table_size'] //= 2


I would not divide this in half. The table size as given already takes the unsignedness into account. It doesn't make sense to expect people to give you twice the size of what they want implemented.

jmitrevs · 2026-07-01T16:08:38Z

+                params['exp_table_size'] //= 2
+            else:
+                # Use the default precision
+                params['exp_table_size'] = 2 ** (params['table_t'].precision.width - 1)


The default parameters should be defined https://github.com/fastmachinelearning/hls4ml/blob/main/hls4ml/backends/fpga/fpga_backend.py#L130 and similar. Note that the defaults are updated in #1476. I would remove all these updates here. You set the defaults in the attribute, not when reading the attributes.

jmitrevs · 2026-07-01T16:17:11Z

+            params.setdefault('table_size', params['exp_table_size'])  # Not sure if necessary
+
+            # Determine accumulator type if present, else derive it yourself based on the input size.
+            if params['accum_t'].name == 'model_default_t':


This should be in infer_precision.py. I think the change is in #1476 already so probably not needed.

jmitrevs · 2026-07-01T16:20:18Z

+                # the signed fixed-point input range is ever addressed.
+                # Therefore only half of the full address space is required.
+                table_size = (
+                    int(layer.get_attr('exp_table_size')) // 2


Again the //2 should be removed. The exp_table_size already takes that into account. It doesn't make sense to pass twice that value. Also, at this point it should be required to be defined. It should not be None.

jmitrevs · 2026-07-01T16:25:13Z

+                        except Exception:
+                            # FixedPrecisionType wasn't correctly stored in layer attributes, use default values
+                            pass
+                        if fp_signed is False:


The table type being signed is something that will go away. The defaults are all unsigned. You can either leave as is for now, or try to handle the usual, unsigned kind. See #1476 and the Vivado implementation.

jmitrevs · 2026-07-01T16:25:20Z

-        table_size = self.__get_table_size(model, 'softmax')
+                    ac_type = layer.get_attr('inp_norm_t')
+
+                    if ac_type is not None:


This doesn't make sense. fp_bits is just ac_type.precision.width. It can't be None. If it's old leftover code, that's fine, but it will go away once #1476 incorporates oneAPI.

jmitrevs · 2026-07-01T16:30:12Z

+                    ac_type = layer.get_attr('inv_inp_t')
+
+                    if ac_type is not None:
+                        try:
+                            fp_bits = ac_type.precision.integer + ac_type.precision.fractional
+                            fp_integer = ac_type.precision.integer
+                            fp_signed = ac_type.precision.signed
+                        except Exception:
+                            # FixedPrecisionType wasn't correctly stored in layer attributes, use default values
+                            pass
+                        if fp_signed is False:
+                            raise Exception('Softmax types need to be signed')


This will all go away with #1476, so either incorporate things from there or ignore it for now.

jmitrevs · 2026-07-01T16:38:49Z

            copyfile(srcpath, dstpath)

-    def __get_table_size(self, model, activation):
+    def __get_table_size(self, model, activation, table_name='table_size'):


I made this comment in the other PR. What does a table name being called table_size mean? They seem to be different things. The naming needs to be updated. It would be equally funny to have table name be table_dimension, for example.

laurilaatu and others added 30 commits January 26, 2026 20:37

weights for dense

3d463b3

hgq2 homogeneous quant fix

d678573

Merge branch 'hgq2_homo_quant' of github.com:calad0i/hls4ml into onea…

77258bc

…pi_weights

Changes required for oneAPI MHA

59bd96f

Original weight implementation

dbb207b

Merge branch 'main' of github.com:fastmachinelearning/hls4ml into one…

0c59255

…api_qmha

Restore oneAPI weight placement

51efff0

pre-commit

6067bea

Merge branch 'main' into oneapi_qmha

06fda4e

Merge branch 'main' into oneapi_qmha

bf38a6b

Merge branch 'main' into oneapi_qmha

e27fd11

Merge branch 'main' into oneapi_qmha

9f4a448

softmax multidim templates

16ca197

Merge branch 'oneapi_qmha' of github.com:laurilaatu/hls4ml into oneap…

564b692

…i_qmha

pre-commit

974e75a

uncomment

060c398

Merge branch 'main' into oneapi_qmha

f78558c

int_inp_t to config

772b93a

Merge branch 'oneapi_qmha' of github.com:laurilaatu/hls4ml into oneap…

d2b8921

…i_qmha

Merge branch 'main' into oneapi_qmha

a1ad891

Merge branch 'main' into oneapi_qmha

d65544d

Merge branch 'main' into oneapi_qmha

2d6a5cc

softmax fixed

c3a4584

Merge branch 'main' into oneapi_qmha

9b1cf17

table generation cleanup

31b7ad6

Merge pull request fastmachinelearning#4 from bugracyln/smax_fix

70b19d1

Merge branch 'main' into oneapi_qmha

29bdbb3

Fix formatting of inp_norm_t name string

cab4cbc

pre-commit for core templates

42ece34

pre-commit all

7e2798a

bugracyln and others added 3 commits June 25, 2026 19:15

softmax update

bd4778e

minor syntax fix

3946858

Merge branch 'oneapi_qmha' into softmax_updated

be76917

jmitrevs added the please test Trigger testing by creating local PR branch label Jun 25, 2026

Merge branch 'main' into softmax_updated

189f64a

jmitrevs self-requested a review June 25, 2026 18:58

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 25, 2026

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 26, 2026

bugracyln and others added 3 commits June 28, 2026 16:59

default case handling improvement

e0aba71

minor improvements to default rollback and added comments

584c4f7

Merge branch 'main' into softmax_updated

711083a

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 30, 2026

jmitrevs requested a review from Copilot July 1, 2026 12:45

Copilot started reviewing on behalf of jmitrevs July 1, 2026 12:45 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

jmitrevs reviewed Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

bugracyln commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tests

Checklist

Uh oh!

jmitrevs commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanFSchulte commented Jun 25, 2026

Uh oh!

calad0i commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bugracyln commented Jun 25, 2026 •

edited

Loading

jmitrevs commented Jun 25, 2026 •

edited

Loading