8382713: [VectorAPI] Perform late inlining of failed vector intrinsics#30876
8382713: [VectorAPI] Perform late inlining of failed vector intrinsics#30876jatin-bhateja wants to merge 6 commits intoopenjdk:masterfrom
Conversation
|
/label add hotspot-compiler-dev |
|
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@jatin-bhateja |
|
The total number of required reviews for this PR has been set to 2 based on the presence of this label: |
|
@jatin-bhateja To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command. Applicable Labels
|
Webrevs
|
| } | ||
| }; | ||
|
|
||
| bool LateInlineVectorCallGenerator::inline_fallback() const { |
There was a problem hiding this comment.
What's the purpose of this method? All vector intrinsics do have fallback implementation. If there are any cases added later, then they don't have to rely on LateInlineVectorCallGenerator.
|
Hi @iwanowww , your comments have been addressed. |
|
I modified BackSholes benchmark to use FloatVector.SPECIES_512, and then explicitly passed CommandLine: java -jar target/benchmarks.jar -f 1 -i 5 -wi 1 -w 30 -jvmArgs "-XX:UseAVX=2 --add-modules=jdk.incubator.vector -XX:+UnlockDiagnosticVMOptions -XX:+InlineVectorFallback" BlackScholes.vector_black_scholes |
|
Hi @iwanowww , your comments have been addressed. |
iwanowww
left a comment
There was a problem hiding this comment.
Overall, looks good. Minor suggestions follow.
| product(bool, EnableVectorAggressiveReboxing, false, EXPERIMENTAL, \ | ||
| "Enables aggressive reboxing of vectors") \ | ||
| \ | ||
| product(bool, InlineVectorFallback, true, DIAGNOSTIC, \ |
There was a problem hiding this comment.
Let's call it IncrementalInlineVector and put it next to IncrementalInline et al.
|
|
||
| if (failing()) return; | ||
|
|
||
| if (_late_inlines.length() == 0 && _vector_late_inlines.length() > 0) { |
There was a problem hiding this comment.
Can you extract it into a helper method? Otherwise, the patch looks good. I'll submit it for testing.
There was a problem hiding this comment.
I suggest to rename transfer_vector_late_inlines() to process_vector_late_inlines() and move _vector_late_inlines.length() > 0 guard there.
|
Hi @iwanowww , your comments have been addressed, please share the results of your test run. |
|
Unfortunately, I see multiple failures in Vector API-related tests. They failed mostly on linux-aarch64, but there were few linux-x64 failures [1] as well. I'll take a closer look, but it seems the problem on linux-aarch64 is that fallback implementations are unconditionally inlined and it causes problems (multiple tests on 512-bit vectors fail due memory exhaustion [2]). [1] In particular:
[2] jdk/incubator/vector/ByteVector512LoadStoreTests.java |
Hi @iwanowww , this failure is related to use of UseAVX=0, here fromBitsCoerced is not intrinsified, earlier it remained as CallStaticJavaNode but now it gets inlined, new inlined context has graph shape which infers CMoveI and CmpI and test failed since IR rule don't expect these nodes, one target agnostic fix is to guard these IR rules with -XX:-IncrementalInlineVector flag, but it will defeat the purpose of this test since IncrementalInlineVector is default on. Since test runs on multiple targets guarding by UseAVX > 0 may not be desirable. Let me know what do you think ?
With the patch, the vector intrinsic fallback inlining generates more code in the compilation unit, this effects inlining of This increase the outcout of VectorMaskCmpNode and inhabits optimization which folds XorVMask (VectorMaskCmp, maskAll(true) Increasing InlineSmallCode to 10000 allows intoArray to be inlined, mask is not boxed, VectorMaskCmp has outcnt=1, XorVMask is folded and Test Passes
Over all there is a tradeoff of unconditionally inlining vector intrinsic since most of them a bulky and it may impact inlining decisions within their calling context. Do you think its beneficial to limit the scope of inlining to only few intrinsics initially e.g. Please let me know your views. |
|
@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
I don't see why it defeats the purpose of the test. It's an IR test and limiting possible IR shapes is fine.
Do we miss
I think regular inlining heuristics should be applied to vector fallback implementations. |
Currently, we attempt lazy intrinsification of vector intrinsics during incremental inlining stage, in case intrinsification fail due to non-constant context expected by the inline expander, a static call is generated, this incurs a call overhead penalty.
As per following comments from @iwanowww on JDK-8303762 pull request
#24104 (comment)
We should attempt procedure inlining of failed vector intrinsics to avoid penalties associated with call overhead, for vector operations whose fall back implementation uses other vector APIs it will also save boxing penalty.
Patch address this concern by adding a new hybrid call generator (LateInlineVectorCallGenerator ) which encapsulates both intrinsic and parser call generator. During incremental inlining, the intrinsic gets multiple chances to succeed. If all attempts fail, the fallback implementation is inlined instead, absorbing call over head penalties.
Please review and share your feedback.
Best Regards,
Jatin
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30876/head:pull/30876$ git checkout pull/30876Update a local copy of the PR:
$ git checkout pull/30876$ git pull https://git.openjdk.org/jdk.git pull/30876/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30876View PR using the GUI difftool:
$ git pr show -t 30876Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30876.diff
Using Webrev
Link to Webrev Comment