wnf_at is not thread-safe on shared cells

# `wnf_at` is not thread-safe on shared cells

Under `-T>=2` HVM4 returns different normal forms on different runs
for programs that share reducible subgraphs across multiple consumers.
The interaction calculus is confluent, so this is an implementation
bug in the C runtime, not a calculus-level issue.

## Minimal reproduction

```hvm
// repro.hvm
@name = "POLY_INPUT"
@main = #P{%env(@name), %env(@name)}
```

Build the runtime and run:

```
$ cd clang && clang -O2 -o main main.c

$ env POLY_INPUT=val1 ./main -C1 -T1 repro.hvm
#P{#OK{"val1"},#OK{"val1"}}                 # deterministic

$ for i in 1 2 3 4 5 6 7 8; do
    env POLY_INPUT=val1 ./main -C1 -T4 repro.hvm
  done
#P{#ERR{"ERROR(env): invalid `name`; expected #NIL or #CON(#CHR{NUM}, tail)"}, #OK{"val1"}}
#P{#OK{"val1"},#OK{"val1"}}
#P{#ERR{"ERROR(env): invalid `name`; ..."},#ERR{"ERROR(env): invalid `name`; ..."}}
#P{#OK{"val1"},#OK{"val1"}}
#P{#OK{"val1"},#OK{"val1"}}
#P{#ERR{"ERROR(env): invalid `name`; ..."},#OK{"val1"}}
#P{#OK{"val1"},#OK{"val1"}}
#P{#OK{"val1"},#OK{"val1"}}
```

Roughly 50% failure rate; also non-deterministic failure pattern
(sometimes left, sometimes right, sometimes both).

The empirical signature is consistent: the primitive `%env`'s walker
chain (`env_go_name → env_go_chr → env_go_num → env_go_io`) observes
the character list in an inconsistent state — truncated or with `BJV`
placeholders at interior positions — and returns `invalid \`name\``.

## Root cause

The hot path in `clang/wnf/_.c:831-858`:

```c
fn Term wnf_at(u64 loc) {
  Term cur = heap_read(loc);        // (1) atomic load
  switch (term_tag(cur)) {
    /* WNF-already cases ... */
    case C00 ... C16: return cur;
    ...
  }
  Term res = wnf(cur);              // (2) reduce off-heap
  if (res != cur) {
    heap_set(loc, res);             // (3) atomic store
  }
  return res;
}
```

Steps (1) and (3) are individually atomic, but the `(read → reduce →
write-back)` triple **is not**.  Two worker threads both enter
`wnf_at(loc)` with the same `cur`, both reduce it independently to
equivalent `res` values, both write back.  Individually each write is
a valid result (confluence still holds), but consumers that read the
heap cells pointed to by `res` now see heap locations that were freshly
allocated by one thread and have already been rewritten by the other's
reduction, producing `BJV`/`DP0`/`DP1` intermediates escaping into
primitive frames.

Compare the `DP0`/`DP1` path on the same hot loop (`clang/wnf/_.c:108`)
which *does* use `heap_take`:

```c
case DP0:
case DP1: {
  u64 loc = term_val(next);
  Term cell = heap_take(loc);       // atomic exchange: takes the cell
  ...
}
```

`heap_take` (`clang/heap/take.c:11` — `__atomic_exchange_n ... ACQUIRE`)
gives one thread exclusive ownership of the cell for the duration of
the reduction.  The other thread sees `cell == 0` and is expected to
wait / retry.  Every reduction step that commits a result back to
the heap needs this discipline, not just `DP0`/`DP1`.



## Why user-level fixes don't help

- Wrapping an argument in `!!x = val; body` (strict let, `clang/parse/term/dup.c:74-76`)
  forces only the top-level WNF — not the char cells underneath.
  The sub-cells still race in `wnf_at`.
- Rebuilding the argument through a recursive strict fold (the same
  pattern used by `@force_poly_bytes` in callers' runtime preambles)
  *does* visit every sub-cell, but each visit still goes through the
  racy `wnf_at` path — at `-T>=4` the fold itself deadlocks because
  two workers' write-backs to the same cell contend infinitely.
- Putting a `pthread_mutex` around a primitive doesn't help because
  HVM4 dispatches sub-primitives from worker threads independent of
  the caller's frame; the race is between the primitive's internal
  `wnf` calls and other workers reducing the same shared subgraph.
- Calling `eval_normalize` inside a primitive (`clang/eval/normalize.c`)
  spawns its own pthread pool, which compounds the same race
  because the inner SNF workers hit the same non-atomic `wnf_at`.

## Proposed direction

Make every cache write-back in the reducers atomic with respect to
concurrent reads of the same cell.  Concretely:

1. `wnf_at` (and the analogous cache steps at the end of interaction
   rules in `clang/wnf/*.c`) should `heap_take` the cell before
   reducing and write the result back through `heap_set_rel` (release-
   ordered store already exists as `clang/heap/set_rel.c`).  When a
   second thread sees the cell `0`, it either spins on `heap_read`
   until the writer publishes the result, or re-enters the original
   term through the ALO mechanism.
2. Every reduction rule that currently does `... = wnf(x); heap_set(loc, ...)`
   needs the same lock-free discipline.  A grep for `heap_set` inside
   `clang/wnf/*.c` is the full patch surface (≈20 sites).
3. Regression test: run the repro above for `-T1..-T8` in the test
   harness and require byte-identical output.

The edits are mechanical but dense; the tricky part is picking the
right memory-order between the `heap_take` / `heap_set_rel` pair and
avoiding starvation when many workers hit the same cell.  A reader-
waits-on-publication scheme parallels the existing `term_sub_get` /
`term_sub_set` SUB-bit protocol used in the `DP0`/`DP1` path.

## Scope

The fix is internal to `clang/wnf/` and `clang/heap/`.  No changes
needed to the language surface, the parser, or any primitive's
semantics.  Single-threaded performance should not regress
appreciably: `heap_take` has identical cost to `heap_read` + first
write on the fast path.


Happy to begin implementing this if a maintainer 👍 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wnf_at is not thread-safe on shared cells #57

`wnf_at` is not thread-safe on shared cells

Minimal reproduction

Root cause

Why user-level fixes don't help

Proposed direction

Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

wnf_at is not thread-safe on shared cells #57

Description

wnf_at is not thread-safe on shared cells

Minimal reproduction

Root cause

Why user-level fixes don't help

Proposed direction

Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`wnf_at` is not thread-safe on shared cells