Skip to content

merge main into amd-staging#2543

Merged
ronlieb merged 96 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260514101317
May 14, 2026
Merged

merge main into amd-staging#2543
ronlieb merged 96 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260514101317

Conversation

@ronlieb
Copy link
Copy Markdown
Collaborator

@ronlieb ronlieb commented May 14, 2026

No description provided.

rampitec and others added 30 commits May 13, 2026 16:37
This commit adds initial documentation for the instrumentor to the
html/man pages and provides a script that helps new users to setup the
config and stubs file interactively.

The script and docs have been created with Claude (AI) but
proofread/tested and modified afterwards.
A previous commit switched us to use the value of the AT_EXECFN, which
is an entry in the aux vector, as the executable path. As it turns out,
if a symlink is used to launch a program, the symlink path will be in
the AT_EXECFN string in core file memory. The PRPSINFO also contains a
basename of the program, and it will also be the symlink basename. The
best source of information to figure out the executable name is from the
NT_FILE note. This always has the resolved path to the executable.

Now the executable name is found in a reliable way starting with finding
the NT_FILE entry for the main executable. This can reliably be done by
finding the NT_FILE entry whose address contains the AT_PHDR aux vector
value. This value is the address of the program headers for the main
executable. If there is no NT_FILE entry we can find, we fall back to
the AT_EXECFN entry from memory and then fallback to the basename in the
PRPSINFO. This patch also creates a placeholder as the main executable
when the executable can't be found to ensure users can see which
executable they will need to track down in order to load the core file.

The tests added will test the order of precedence. It does this by
creating a core file with:
- NT_FILE entry with a path of "/path/nt_file_foo"
- AT_EXECFN in the aux vector with a path of "/path/execfn_foo"
- NT_PRPSINFO entry with a path of "prpsinfo_foo"

We then test that the correct entry is found as the best path option is
removed from the core file.
check_cxx_compiler_flag, when passing multiple flags, we must separate
them using a SEMICOLON-separated list. Not spaces. These checks
succeed incorrectly sometimes because "-Werror -mcrc" has a different
return value than "-Werror" "-mcrc" on some systems.

This issue was verified with LLVM_ENABLE_PROJECTS=llvm;compiler-rt,
and I'm uncertain whether it exists in runtime CMake builds.
Nonetheless, it's still a bug.

See:
https://cmake.org/cmake/help/latest/module/CheckCXXCompilerFlag.html

This issue was identified downstream in ChromiumOS.

ChromiumOS Bug:
https://issuetracker.google.com/507177988
before

```SystemVerilog
(* x = "x" *) foreach(x[x]) x = x;
```

after

```SystemVerilog
(* x = "x" *) foreach (x[x])
  x = x;
```

The code for handling statements like the `foreach` preceded the part
for handling the attributes inside `(* *)`. So there was a problem with
some of the statements following attributes. The patch moves the part
for the statements down. The loop in the code was also unnecessary.
This fixes 882d025.

Co-authored-by: Google Bazel Bot <[email protected]>
Add a test case to verify that initFlags() correctly reads the
SCUDO_ALLOCATION_RING_BUFFER_SIZE environment variable and updates the
corresponding flag. This increases line coverage for flags.cpp to 100%.
…VM IR input (llvm#197566)

1. Replace the C++ source test that required compiling with %clangxx and
separate Input files with self-contained .ll tests using split-file.

2. Split the test into two files:
- clang-sycl-linker.ll: basic tool behavior (link, dev libs, AOT,
errors)
  - clang-sycl-linker-split-mode.ll: device code split mode handling

Co-Authored-By: Claude
Add first class support for building test inferiors without debug info,
instead of having to pass `-g0` in the Makefile or the build dictionary.

```
def test(self):
    self.build(debug_info="none")
```

rdar://164923931
Summary:
There's two ways you can put multiple binaries in the section. Either
use the version two multi-binary support or just concatenate them. This
PR changes the llvm-offload-binary tool to use the multi-support rather
than directly concatenating them.

The motivation for this is to save space and make it easier to support
compression in the future. Compression would be a flag in the header and
the compression is only really valuable if it can combine the
architecture variants. ELF section compression is a little spotty but
would be another good solution.
This operator creates a new ``list`` containing the same elements as
*list*
but in sorted order. To determine the order, TableGen binds the variable
*var* to each element and evaluates the *key* expression, which
presumably
refers to *var*. The key must produce a ``string`` or integer value
(``bit``, ``bits``, or ``int``); all keys must be of the same type.
Elements
with equal keys preserve their original relative order, resulting in a
stable
sort.

For example, to sort a list of records by their ``Name`` field::

`  list<Thing> sorted = !sort(t, Things, t.Name);`
…WARD_SLASH is ON (llvm#184556)

This patch fixes several LLVM test failures on Windows that occur when
the LLVM_WINDOWS_PREFER_FORWARD_SLASH CMake option is enabled.

The failures were caused by tests either hardcoding backslash
expectations in FileCheck or constructing paths with strict backslashes
in C++ unit tests, both of which break when the environment is
configured to prefer forward slashes.

Specific changes:
- `llvm-cov` lit tests: Changed the path separators with
`-DSEP=%{fs-sep}`.
- `llvm-objdump` lit test: Relaxed
`source-interleave-prefix-windows.test` to accept either forward or
backward slashes using the `{{[/\\]}}` regex. This makes the path
matching resilient to the underlying separator preference without losing
precision.
- CommandLineTest.cpp: Conditionalized the TestRoot variable to use
`C:/` instead of `C:\` based on the build configuration.
- Path.cpp (makeLongFormPath test):
  - Updated the OneDir string literal to conditionally use `/` or `\`.
- Updated the ContainsDotAndDotDot lambda to check for `.` and `..`
components with the correct separator style based on the build
configuration.
This change defines 4 new output patterns, `PAIR8`,`EVEN8`, `AEXT8`, and
`TRUNC4`, and uses them to implement the lowering of the intrinsics
`int_ppc_amo_l[dw]at` and `int_ppc_amo_l[dw]at_cond` in TableGen. As
result, the output pattern to generate the instructions becomes more
understandable,, and the C++ code can be removed.
…#197096)

This PR adds BF16 to I8 saturating FP to int convert custom lowering.
…trinsic (llvm#197380)

Fix HLSL builtin to SPIR-V intrinsic lowering: most intrinsics calls
must use CallingConv::C.

Relates to llvm#197608 which tries to add CallingConv CHECK to IR Verifier.
In 2021, Augusto changed the Target::ReadMemory API from taking a
`prefer_file_cache` argument to taking a `force_live_memory` argument,
with opposite meanings - where we used to pass true, the callers now
needed to pass false. The default argument was false, so many callers
omitted the argument altogether after the change.

One of the edits to
UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly
unintentionally swapped the intended behavior -- this method which reads
the bytes of a function's instructions for emulation should get the
bytes from the local binary, if possible, else read from live memory.
But it was changed to force reading from live memory unconditionally.
This leads to an extra memory read for every function we see for the
first time in a single `lldb` process run (the UnwindTable they are
added to is part of the Module, and kept in the global Module cache).

It's not a major perf regression, but these are extra memory reads that
we don't need to be doing.

I audited all the other changes in the 2021 PR and this was the only
mistake like this.

rdar://177026608
This is the last patch for global/namespace thread-local variables. This
patch emits the final 'init' function, which calls all other init
functions, plus does the guard variable for the unordered variants.
This is a pretty trivial bit of adjustments that have to happen when
emitting a materialized temporary, and is effectively a clone of classic
codegen. Our output is effectively identical (other than some minor
re-orering problems).
…llvm#197474)

On MSVC, Profile-* tests must link with the same CRT model as the
clang_rt.profile static archive they exercise. When that archive pulls
in RTInterception / RTSanitizerCommon object libraries, those are built
with MultiThreadedDLL (/MD), so the .objs reference `__imp_*` symbols.
The test binary defaults to /MT and fails to link with LNK2019
(`__imp__stricmp` from `interception_win.cpp`) and LNK4098 default-lib
conflicts.

Match the DLL CRT on the test side so test executables and the static
archive use the same runtime. The change is gated on
`COMPILER_RT_HAS_INTERCEPTION` and `!COMPILER_RT_PROFILE_BAREMETAL`, so
configurations that don't pull interception into profile are unaffected.

Split out as NFC from llvm#177665 per review feedback.
The
[RFC](https://discourse.llvm.org/t/rfc-remove-80-column-limit-in-documentation-files/89678/41)
on removing 80 columns limit got accepted. So we should no longer
enforce that rule in clang-tidy's code-linter workflow.
Right now it takes validation path of an inline constant if fits
even though it is forced to literal encoding.
…ed} (llvm#197518)

A device-typed dummy with `!dir$ ignore_tkr(m)` is meant to be an
overload discriminator (only selected for actuals with an explicit
`device/managed/unified` attribute). Skip the host->device relaxation in
AreCompatibleCUDADataAttrs when `IgnoreTKR::Managed` is set so
unattributed host actuals no longer bind to such a dummy.

Also document the §3.2.3 matching distance table next to
GetMatchingDistance and add LIT tests for the full Table 2 grid
and the ignore_tkr(m) carve-out.
This PR allows duplicate OpenACC `private` and `firstprivate` clauses.
While maintaining the restriction on `reduction` clauses.
This method needs to match the set of cases handled in parseSummaryEntry.
…97565)

Add support for DWARF opcodes seen in GCC-generated binaries:

- DW_FORM_ref_udata: ULEB128-encoded CU-relative DIE reference.

- DW_OP_regval_type (0xa5): DWARF5 expression opcode with operands
(SizeLEB, BaseTypeRef). The BaseTypeRef was not being updated when DIEs
were relocated because cloneExpression only handled (Size1, BaseTypeRef)
patterns. Generalized the first-operand copying to use raw bytes from
the data stream instead of assuming a single byte.

Fixes llvm#188250

Assisted-by: Claude Opus 4.6/4.7
shiltian and others added 22 commits May 14, 2026 13:31
- `llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp`
- `llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp`
This defined some always legal actions, removing our dependency on the
Legacy ruleset in aarch64.
…m#197611)

This fixes the CTAD template parameter transforms so they produce
template template parameters which have correct depth for their own
template parameters.

This also stops calling SubstDecl directly on the non-type template
parameters, so that a template parameter with correct position is
produced directly, instead of manually fixing that up later. This helps
llvm#197598 by making it possible to add assertions that the positions are
always valid.
Register references in debug instructions can affect LiveRegUnits
analysis. Skip over debug instructions.

Tests in this PR would fail due to calls to LiveRegUnits::stepBackward
in RegisterScavenging, DeadMachineInstructionElim, and
AArch64InstrInfo.cpp getOutlinableRanges().

Other call-sites to stepBackward may also pass debug instructions to
LiveRegUnits::stepBackward, but LIT testing did not fail when
-debugify-and-strip-all-safe was enabled by default.

---------

Signed-off-by: John Lu <[email protected]>
Update the base crypto builtins and LLVM intrinsics to drop the mma_
prefix. Also fix the builtin definitions for dmsha2hash, dmsha3hash,
and dmxxshapad to use the correct immediate constraints.
…7448)

This PR continues other work I've been doing trying to remove
unnecessary extra passes from the RUN lines in order to make it easier
to map the expected vectoriser output to the CHECK lines. As a result it
has exposed some potential optimisations that we may be able to perform
in VPlan.

Here is a summary of the changes I've noticed:

1. instcombine likes to canonicalise GEPs into certain forms. I'm not
sure if there is value in VPlan trying to guess what the canonical form
should be.
2. In tests like sve-cond-inv-loads.ll, etc. the pattern sub(urem) is
often replaced with and(sub). This is potentially something the
vectoriser could improve although I don't know if it would change the
cost model.
3. There is poor codegen in gather_nxv4i32_ind64_stride2 in the file
sve-gather-scatter.ll, which is due to
VPScalarIVStepsRecipe::execute. I have a PR that attempts to clean up
this IR: llvm#197169.
4. Simple missing fold in sve-inductions.ll for icmp(and(x,1), 0) ->
trunc(x) to i1
5. Missing nsw flag - see sve-interleaved-accesses.ll. I think this
might be due to the range of vscale.
6. Missing fold in sve-interleaved-masked-accesses.ll for
select(icmp(slt, x, y), y, x) -> smax
7. Missing folds for reverse transformations of uniform operations, e.g.
see sve-vector-reverse.ll for things like reverse(fadd(reverse(x)))
8. Removal of xor when used by the exit condition - see sve-vfabi.ll.
There isn't much we can do about this because VPlan requires successors
to be in a certain order, therefore the non-zero cost xor instruction
has to be present.
9. See PR llvm#196562 for fixes to some poor code in
uniform-args-call-variants.ll
10. instcombine tends to narrow reduction PHI nodes generated by VPlan
when there are extends involved. See reduction_i8 in
reduction-small-size.ll. In this case perhaps the original scalar loop
is simply not in a canonical form to start with?
…7677)

Use DemandedElts + KnownBits to match hidden identity patterns - helps
especially with reduction patterns padded by legalisation

Once llvm#197455 has landed, I'm intending to convert this (plus
SMIN/SMAX/UMIN/UMAX and the existing ISD::ADD case) to use
isIdentityElement directly.
Otherwise we end up with errors like the following when building with
bazel:
```c++
In file included from external/+_repo_rules+llvm-project/libc/src/__support/CPP/type_traits/is_move_constructible.h:12:
external/+_repo_rules+llvm-project/libc/src/__support/CPP/type_traits/is_constructible.h:32:14: error: no template named 'bool_constant'
   32 |     : public bool_constant<__is_constructible(T, Args...)> {};
```
Use hasMadNC64_32Insts() (backed by SubtargetFeature) for MAD 64_32
no-carry and drop the old helper.
Linkage was renamed + a capability added following review in
KhronosGroup/SPIRV-Registry#401
…ry_constant_folding (llvm#197641)

This reduces the time it takes to instantiate `std::format` from ~160ms
to ~120ms in my testing.
…m#197689)

Clearer than having to know that first is a CPU and second is the
feature list.
This fills in always legal rules, to remove the dependency on the legacy
ruleset. The trunc rule might make some differences but it looks like
i64 zext / sext are not well supported at the moment. This is not
guaranteed to be all the rules, just the ones that appear in tests.
…have valid positions (llvm#197598)

Some tests are violating these assertions, so they are commented out.

For the test in `clang/test/SemaTemplate/concepts.cpp`, that was broken
by llvm#195995 and needs a partial revert at least.
@ronlieb ronlieb requested review from a team, dpalermo and skganesan008 May 14, 2026 15:39
@ronlieb ronlieb requested a review from vangthao95 as a code owner May 14, 2026 15:39
@ronlieb ronlieb removed the request for review from vangthao95 May 14, 2026 15:39
@ronlieb ronlieb merged commit fa79039 into amd-staging May 14, 2026
104 of 110 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20260514101317 branch May 14, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.