merge main into amd-staging by ronlieb · Pull Request #2543 · ROCm/llvm-project

ronlieb · 2026-05-14T15:39:37Z

No description provided.

This commit adds initial documentation for the instrumentor to the html/man pages and provides a script that helps new users to setup the config and stubs file interactively. The script and docs have been created with Claude (AI) but proofread/tested and modified afterwards.

A previous commit switched us to use the value of the AT_EXECFN, which is an entry in the aux vector, as the executable path. As it turns out, if a symlink is used to launch a program, the symlink path will be in the AT_EXECFN string in core file memory. The PRPSINFO also contains a basename of the program, and it will also be the symlink basename. The best source of information to figure out the executable name is from the NT_FILE note. This always has the resolved path to the executable. Now the executable name is found in a reliable way starting with finding the NT_FILE entry for the main executable. This can reliably be done by finding the NT_FILE entry whose address contains the AT_PHDR aux vector value. This value is the address of the program headers for the main executable. If there is no NT_FILE entry we can find, we fall back to the AT_EXECFN entry from memory and then fallback to the basename in the PRPSINFO. This patch also creates a placeholder as the main executable when the executable can't be found to ensure users can see which executable they will need to track down in order to load the core file. The tests added will test the order of precedence. It does this by creating a core file with: - NT_FILE entry with a path of "/path/nt_file_foo" - AT_EXECFN in the aux vector with a path of "/path/execfn_foo" - NT_PRPSINFO entry with a path of "prpsinfo_foo" We then test that the correct entry is found as the best path option is removed from the core file.

check_cxx_compiler_flag, when passing multiple flags, we must separate them using a SEMICOLON-separated list. Not spaces. These checks succeed incorrectly sometimes because "-Werror -mcrc" has a different return value than "-Werror" "-mcrc" on some systems. This issue was verified with LLVM_ENABLE_PROJECTS=llvm;compiler-rt, and I'm uncertain whether it exists in runtime CMake builds. Nonetheless, it's still a bug. See: https://cmake.org/cmake/help/latest/module/CheckCXXCompilerFlag.html This issue was identified downstream in ChromiumOS. ChromiumOS Bug: https://issuetracker.google.com/507177988

before ```SystemVerilog (* x = "x" *) foreach(x[x]) x = x; ``` after ```SystemVerilog (* x = "x" *) foreach (x[x]) x = x; ``` The code for handling statements like the `foreach` preceded the part for handling the attributes inside `(* *)`. So there was a problem with some of the statements following attributes. The patch moves the part for the statements down. The loop in the code was also unnecessary.

This fixes 882d025. Co-authored-by: Google Bazel Bot <[email protected]>

Add a test case to verify that initFlags() correctly reads the SCUDO_ALLOCATION_RING_BUFFER_SIZE environment variable and updates the corresponding flag. This increases line coverage for flags.cpp to 100%.

…VM IR input (llvm#197566) 1. Replace the C++ source test that required compiling with %clangxx and separate Input files with self-contained .ll tests using split-file. 2. Split the test into two files: - clang-sycl-linker.ll: basic tool behavior (link, dev libs, AOT, errors) - clang-sycl-linker-split-mode.ll: device code split mode handling Co-Authored-By: Claude

Add first class support for building test inferiors without debug info, instead of having to pass `-g0` in the Makefile or the build dictionary. ``` def test(self): self.build(debug_info="none") ``` rdar://164923931

Summary: There's two ways you can put multiple binaries in the section. Either use the version two multi-binary support or just concatenate them. This PR changes the llvm-offload-binary tool to use the multi-support rather than directly concatenating them. The motivation for this is to save space and make it easier to support compression in the future. Compression would be a flag in the header and the compression is only really valuable if it can combine the architecture variants. ELF section compression is a little spotty but would be another good solution.

This operator creates a new ``list`` containing the same elements as *list* but in sorted order. To determine the order, TableGen binds the variable *var* to each element and evaluates the *key* expression, which presumably refers to *var*. The key must produce a ``string`` or integer value (``bit``, ``bits``, or ``int``); all keys must be of the same type. Elements with equal keys preserve their original relative order, resulting in a stable sort. For example, to sort a list of records by their ``Name`` field:: ` list<Thing> sorted = !sort(t, Things, t.Name);`

…WARD_SLASH is ON (llvm#184556) This patch fixes several LLVM test failures on Windows that occur when the LLVM_WINDOWS_PREFER_FORWARD_SLASH CMake option is enabled. The failures were caused by tests either hardcoding backslash expectations in FileCheck or constructing paths with strict backslashes in C++ unit tests, both of which break when the environment is configured to prefer forward slashes. Specific changes: - `llvm-cov` lit tests: Changed the path separators with `-DSEP=%{fs-sep}`. - `llvm-objdump` lit test: Relaxed `source-interleave-prefix-windows.test` to accept either forward or backward slashes using the `{{[/\\]}}` regex. This makes the path matching resilient to the underlying separator preference without losing precision. - CommandLineTest.cpp: Conditionalized the TestRoot variable to use `C:/` instead of `C:\` based on the build configuration. - Path.cpp (makeLongFormPath test): - Updated the OneDir string literal to conditionally use `/` or `\`. - Updated the ContainsDotAndDotDot lambda to check for `.` and `..` components with the correct separator style based on the build configuration.

This change defines 4 new output patterns, `PAIR8`,`EVEN8`, `AEXT8`, and `TRUNC4`, and uses them to implement the lowering of the intrinsics `int_ppc_amo_l[dw]at` and `int_ppc_amo_l[dw]at_cond` in TableGen. As result, the output pattern to generate the instructions becomes more understandable,, and the C++ code can be removed.

…#197096) This PR adds BF16 to I8 saturating FP to int convert custom lowering.

…trinsic (llvm#197380) Fix HLSL builtin to SPIR-V intrinsic lowering: most intrinsics calls must use CallingConv::C. Relates to llvm#197608 which tries to add CallingConv CHECK to IR Verifier.

In 2021, Augusto changed the Target::ReadMemory API from taking a `prefer_file_cache` argument to taking a `force_live_memory` argument, with opposite meanings - where we used to pass true, the callers now needed to pass false. The default argument was false, so many callers omitted the argument altogether after the change. One of the edits to UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly unintentionally swapped the intended behavior -- this method which reads the bytes of a function's instructions for emulation should get the bytes from the local binary, if possible, else read from live memory. But it was changed to force reading from live memory unconditionally. This leads to an extra memory read for every function we see for the first time in a single `lldb` process run (the UnwindTable they are added to is part of the Module, and kept in the global Module cache). It's not a major perf regression, but these are extra memory reads that we don't need to be doing. I audited all the other changes in the 2021 PR and this was the only mistake like this. rdar://177026608

This is the last patch for global/namespace thread-local variables. This patch emits the final 'init' function, which calls all other init functions, plus does the guard variable for the unordered variants.

This is a pretty trivial bit of adjustments that have to happen when emitting a materialized temporary, and is effectively a clone of classic codegen. Our output is effectively identical (other than some minor re-orering problems).

…lvm#197563)

…llvm#197474) On MSVC, Profile-* tests must link with the same CRT model as the clang_rt.profile static archive they exercise. When that archive pulls in RTInterception / RTSanitizerCommon object libraries, those are built with MultiThreadedDLL (/MD), so the .objs reference `__imp_*` symbols. The test binary defaults to /MT and fails to link with LNK2019 (`__imp__stricmp` from `interception_win.cpp`) and LNK4098 default-lib conflicts. Match the DLL CRT on the test side so test executables and the static archive use the same runtime. The change is gated on `COMPILER_RT_HAS_INTERCEPTION` and `!COMPILER_RT_PROFILE_BAREMETAL`, so configurations that don't pull interception into profile are unaffected. Split out as NFC from llvm#177665 per review feedback.

The [RFC](https://discourse.llvm.org/t/rfc-remove-80-column-limit-in-documentation-files/89678/41) on removing 80 columns limit got accepted. So we should no longer enforce that rule in clang-tidy's code-linter workflow.

Right now it takes validation path of an inline constant if fits even though it is forced to literal encoding.

…ed} (llvm#197518) A device-typed dummy with `!dir$ ignore_tkr(m)` is meant to be an overload discriminator (only selected for actuals with an explicit `device/managed/unified` attribute). Skip the host->device relaxation in AreCompatibleCUDADataAttrs when `IgnoreTKR::Managed` is set so unattributed host actuals no longer bind to such a dummy. Also document the §3.2.3 matching distance table next to GetMatchingDistance and add LIT tests for the full Table 2 grid and the ignore_tkr(m) carve-out.

This PR allows duplicate OpenACC `private` and `firstprivate` clauses. While maintaining the restriction on `reduction` clauses.

This method needs to match the set of cases handled in parseSummaryEntry.

…97565) Add support for DWARF opcodes seen in GCC-generated binaries: - DW_FORM_ref_udata: ULEB128-encoded CU-relative DIE reference. - DW_OP_regval_type (0xa5): DWARF5 expression opcode with operands (SizeLEB, BaseTypeRef). The BaseTypeRef was not being updated when DIEs were relocated because cloneExpression only handled (Size1, BaseTypeRef) patterns. Generalized the first-operand copying to use raw bytes from the data stream instead of assuming a single byte. Fixes llvm#188250 Assisted-by: Claude Opus 4.6/4.7

- `llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp` - `llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp`

For this PR: llvm#196325

This defined some always legal actions, removing our dependency on the Legacy ruleset in aarch64.

…m#197611) This fixes the CTAD template parameter transforms so they produce template template parameters which have correct depth for their own template parameters. This also stops calling SubstDecl directly on the non-type template parameters, so that a template parameter with correct position is produced directly, instead of manually fixing that up later. This helps llvm#197598 by making it possible to add assertions that the positions are always valid.

See https://discourse.llvm.org/t/rfc-forming-a-static-analysis-working-group-in-the-clang-ecosystem/90719/17

Register references in debug instructions can affect LiveRegUnits analysis. Skip over debug instructions. Tests in this PR would fail due to calls to LiveRegUnits::stepBackward in RegisterScavenging, DeadMachineInstructionElim, and AArch64InstrInfo.cpp getOutlinableRanges(). Other call-sites to stepBackward may also pass debug instructions to LiveRegUnits::stepBackward, but LIT testing did not fail when -debugify-and-strip-all-safe was enabled by default. --------- Signed-off-by: John Lu <[email protected]>

Update the base crypto builtins and LLVM intrinsics to drop the mma_ prefix. Also fix the builtin definitions for dmsha2hash, dmsha3hash, and dmxxshapad to use the correct immediate constraints.

…7448) This PR continues other work I've been doing trying to remove unnecessary extra passes from the RUN lines in order to make it easier to map the expected vectoriser output to the CHECK lines. As a result it has exposed some potential optimisations that we may be able to perform in VPlan. Here is a summary of the changes I've noticed: 1. instcombine likes to canonicalise GEPs into certain forms. I'm not sure if there is value in VPlan trying to guess what the canonical form should be. 2. In tests like sve-cond-inv-loads.ll, etc. the pattern sub(urem) is often replaced with and(sub). This is potentially something the vectoriser could improve although I don't know if it would change the cost model. 3. There is poor codegen in gather_nxv4i32_ind64_stride2 in the file sve-gather-scatter.ll, which is due to VPScalarIVStepsRecipe::execute. I have a PR that attempts to clean up this IR: llvm#197169. 4. Simple missing fold in sve-inductions.ll for icmp(and(x,1), 0) -> trunc(x) to i1 5. Missing nsw flag - see sve-interleaved-accesses.ll. I think this might be due to the range of vscale. 6. Missing fold in sve-interleaved-masked-accesses.ll for select(icmp(slt, x, y), y, x) -> smax 7. Missing folds for reverse transformations of uniform operations, e.g. see sve-vector-reverse.ll for things like reverse(fadd(reverse(x))) 8. Removal of xor when used by the exit condition - see sve-vfabi.ll. There isn't much we can do about this because VPlan requires successors to be in a certain order, therefore the non-zero cost xor instruction has to be present. 9. See PR llvm#196562 for fixes to some poor code in uniform-args-call-variants.ll 10. instcombine tends to narrow reduction PHI nodes generated by VPlan when there are extends involved. See reduction_i8 in reduction-small-size.ll. In this case perhaps the original scalar loop is simply not in a canonical form to start with?

…#197663)

…7677) Use DemandedElts + KnownBits to match hidden identity patterns - helps especially with reduction patterns padded by legalisation Once llvm#197455 has landed, I'm intending to convert this (plus SMIN/SMAX/UMIN/UMAX and the existing ISD::ADD case) to use isIdentityElement directly.

Otherwise we end up with errors like the following when building with bazel: ```c++ In file included from external/+_repo_rules+llvm-project/libc/src/__support/CPP/type_traits/is_move_constructible.h:12: external/+_repo_rules+llvm-project/libc/src/__support/CPP/type_traits/is_constructible.h:32:14: error: no template named 'bool_constant' 32 | : public bool_constant<__is_constructible(T, Args...)> {}; ```

Use hasMadNC64_32Insts() (backed by SubtargetFeature) for MAD 64_32 no-carry and drop the old helper.

Linkage was renamed + a capability added following review in KhronosGroup/SPIRV-Registry#401

…ry_constant_folding (llvm#197641) This reduces the time it takes to instantiate `std::format` from ~160ms to ~120ms in my testing.

…m#197689) Clearer than having to know that first is a CPU and second is the feature list.

This fills in always legal rules, to remove the dependency on the legacy ruleset. The trunc rule might make some differences but it looks like i64 zext / sext are not well supported at the moment. This is not guaranteed to be all the rules, just the ones that appear in tests.

…lvm#197692)

…ver (llvm#197669)

…have valid positions (llvm#197598) Some tests are violating these assertions, so they are commented out. For the test in `clang/test/SemaTemplate/concepts.cpp`, that was broken by llvm#195995 and needs a partial revert at least.

rampitec and others added 30 commits May 13, 2026 16:37

[AMDGPU] Fix conflicted literal test. NFC. (llvm#197587)

111ec2f

[Instrumentor][FIX] Fix oversight in docs heading (llvm#197594)

3ccc276

[Bazel] Fixes 882d025 (llvm#197593)

fe787a8

This fixes 882d025. Co-authored-by: Google Bazel Bot <[email protected]>

[BOLT][NFCI] Drop CFG profile attachment in DataAggregator (llvm#195986)

37c5916

[scudo] Add test for initFlags()

c5f8414

Add a test case to verify that initFlags() correctly reads the SCUDO_ALLOCATION_RING_BUFFER_SIZE environment variable and updates the corresponding flag. This increases line coverage for flags.cpp to 100%.

[clang-format][NFC] Correct comment (llvm#197592)

8ebd857

[AMDGPU] Add lit64 machine verifier (llvm#196457)

d2a57ec

[X86][AVX10.2] Add BF16 to (U/S)8 saturating FP to int lowering (llvm…

f996980

…#197096) This PR adds BF16 to I8 saturating FP to int convert custom lowering.

[Clang][HLSL] Use EmitIntrinsicCall instead of EmitRuntimeCall for in…

7c7a47e

…trinsic (llvm#197380) Fix HLSL builtin to SPIR-V intrinsic lowering: most intrinsics calls must use CallingConv::C. Relates to llvm#197608 which tries to add CallingConv CHECK to IR Verifier.

[CIR] Lower 'init' functions for global TLS (llvm#197460)

2ee4669

This is the last patch for global/namespace thread-local variables. This patch emits the final 'init' function, which calls all other init functions, plus does the guard variable for the unordered variants.

[CIR]Materialize temp adjustments (llvm#197585)

f7f6040

This is a pretty trivial bit of adjustments that have to happen when emitting a materialized temporary, and is effectively a clone of classic codegen. Our output is effectively identical (other than some minor re-orering problems).

[flang][cuda] Use wider cudaMemcpy2D rows for descriptor transfers (l…

b638763

…lvm#197563)

[AMDGPU] Validate forced lit() immediate (llvm#196623)

e2b5048

Right now it takes validation path of an inline constant if fits even though it is forced to literal encoding.

[flang][openacc] allow duplicate data sharing clauses (llvm#197019)

83ae5cc

This PR allows duplicate OpenACC `private` and `firstprivate` clauses. While maintaining the restriction on `reduction` clauses.

Handle typeidCompatibleVTable in skipModuleSummaryEntry (llvm#196849)

6cdd328

This method needs to match the set of cases handled in parseSummaryEntry.

shiltian and others added 22 commits May 14, 2026 13:31

[NFC] Format two AMDGPU files (llvm#197672)

4e2ad71

- `llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp` - `llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp`

[AMDGPU][NFC] Autogenerate checks in andorn2.ll (llvm#197613)

f816732

For this PR: llvm#196325

[VPlan] Simplify BCast with onlyScalarsUsed (llvm#195444)

c4054b5

[AArch64][GlobalISel] Add always legal action builders. (llvm#197238)

ca3a210

This defined some always legal actions, removing our dependency on the Legacy ruleset in aarch64.

[docs] Add the Clang Static Analysis WG to sync-ups (llvm#197679)

e42de9d

See https://discourse.llvm.org/t/rfc-forming-a-static-analysis-working-group-in-the-clang-ecosystem/90719/17

[PowerPC] Update base crypto builtins and intrinsics (llvm#197017)

d2de1d2

Update the base crypto builtins and LLVM intrinsics to drop the mma_ prefix. Also fix the builtin definitions for dmsha2hash, dmsha3hash, and dmxxshapad to use the correct immediate constraints.

[lldb][windows] fix x86_64 arg register mapping for lldb-server (llvm…

2a110fe

…#197663)

[AMDGPU][NFC] Remove redundant hasMadU64U32NoCarry helper (llvm#197682)

f2a9f41

Use hasMadNC64_32Insts() (backed by SubtargetFeature) for MAD 64_32 no-carry and drop the old helper.

Adjust SPV_AMD_weak_linkage (llvm#197484)

0f79ba2

Linkage was renamed + a capability added following review in KhronosGroup/SPIRV-Registry#401

[libc++] Replace ranges::find_first_of with std::find_first_of in __t…

692b8fd

…ry_constant_folding (llvm#197641) This reduces the time it takes to instantiate `std::format` from ~160ms to ~120ms in my testing.

[clang][AArch64] Use structured bindings in feature parsing code (llv…

0ac83dc

…m#197689) Clearer than having to know that first is a CPU and second is the feature list.

[docs] Add "LLVM Memory Safety" and "Lifetime Safety" working Groups (l…

277372b

…lvm#197692)

[lldb][windows] Keep int3 breakpoints inside the debugger on lldb-ser…

00559c2

…ver (llvm#197669)

[LV] Add store to test case to prevent dead code. nfc (llvm#197703)

7bfb4d9

merge main into amd-staging

37acaa0

ronlieb requested review from a team, dpalermo and skganesan008 May 14, 2026 15:39

ronlieb requested a review from vangthao95 as a code owner May 14, 2026 15:39

ronlieb removed the request for review from vangthao95 May 14, 2026 15:39

dpalermo approved these changes May 14, 2026

View reviewed changes

ronlieb merged commit fa79039 into amd-staging May 14, 2026
104 of 110 checks passed

ronlieb deleted the amd/merge/upstream_merge_20260514101317 branch May 14, 2026 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-staging#2543

merge main into amd-staging#2543
ronlieb merged 96 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260514101317

ronlieb commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ronlieb commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants