Add loci-analysis workflow from overlay

loci-dev · loci-dev · commit 3a9c4ae9ebc2 · 2026-03-08T07:43:29.000Z
diff --git a/.github/workflows/loci-analysis.yml b/.github/workflows/loci-analysis.yml
@@ -0,0 +1,69 @@
+name: LOCI Analysis
+on:
+  push:
+    branches:
+      - loci/main-*
+  pull_request:
+    types: [opened, synchronize, reopened]
+
+jobs:
+  loci:
+    if: vars.UPSTREAM_REPO != ''
+    runs-on: ubuntu-latest
+
+    env:
+      LOCI_PROJECT: 'gitoxide'
+      LOCI_API_KEY: '${{ secrets.LOCI_API_KEY }}'
+      LOCI_BACKEND_URL: '${{ vars.LOCI_BACKEND_URL }}'
+      GH_TOKEN: ${{ secrets.MIRROR_REPOS_WRITE_PAT }}
+      CC_aarch64_unknown_linux_gnu: "aarch64-linux-gnu-gcc"
+      CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER: "aarch64-linux-gnu-gcc"
+      RUSTFLAGS: "-C debuginfo=2"
+      OPENSSL_STATIC: "1"
+
+    environment: ${{ vars.LOCI_ENV || 'PROD__AL_DEMO' }}
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          ref: ${{ (github.event_name == 'pull_request' && github.event.pull_request.head.sha) || github.sha }}
+
+      - name: Compute target
+        id: target
+        if: github.event_name == 'push'
+        run: |
+          branch="${{ github.ref_name }}"
+          sha="${branch#loci/main-}"
+          echo "value=main@${sha}" >> "$GITHUB_OUTPUT"
+
+      - name: Compute base
+        id: base
+        if: github.event_name == 'pull_request'
+        run: |
+          git remote add upstream "https://github.com/${{ vars.UPSTREAM_REPO }}.git" 2>/dev/null || true   
+          git fetch upstream                                                                               
+          upstream_default=$(gh api "repos/${{ vars.UPSTREAM_REPO }}" --jq .default_branch)                
+          merge_base=$(git merge-base HEAD "upstream/${upstream_default}")                                 
+          short_sha="${merge_base:0:7}"                                                                    
+          echo "value=main@${short_sha}" >> "$GITHUB_OUTPUT"                                               
+
+      - name: Install dependencies
+        run: sudo apt install gcc-aarch64-linux-gnu && rustup target add aarch64-unknown-linux-gnu
+        shell: bash -euo pipefail {0}
+
+      - name: Build
+        run: cargo build --target aarch64-unknown-linux-gnu --release --bins --no-default-features --features max-pure
+        shell: bash -euo pipefail {0}
+
+      - name: Upload
+        uses: auroralabs-loci/loci-action@v1
+        with:
+          mode: upload
+          binaries: |
+                target/aarch64-unknown-linux-gnu/release/ein
+                target/aarch64-unknown-linux-gnu/release/gix
+          project: '${{ env.LOCI_PROJECT }}'
+          target: ${{ steps.target.outputs.value || ''}}
+          base: ${{ steps.base.outputs.value || '' }}
diff --git a/pulls.ndjson b/pulls.ndjson
@@ -0,0 +1 @@
+{"pull_number":"2461","title":"perf(EntryMode::extract_from_bytes): add happy path check","body":"Since the position of the space in the entrymode is often 6, we can add an explicit check for this case and skip some of the operations performed in the loop, making the benchmark a little faster.\r\n\r\nBenches (`cargo bench --bench decode-objects -- TreeRef`) when compared to `main`:\r\n```\r\nTreeRef()               time:   [91.447 ns 91.539 ns 91.631 ns]\r\n                        change: [−4.4580% −4.3152% −4.1764%] (p = 0.00 < 0.05)\r\n                        Performance has improved.\r\n\r\nTreeRefIter()           time:   [34.566 ns 34.611 ns 34.661 ns]\r\n                        change: [−15.910% −15.735% −15.567%] (p = 0.00 < 0.05)\r\n                        Performance has improved.\r\n```\r\n\r\nObviously the usefulness of this change relies on two things: the case of index 6 being the space is indeed the happy path (and from what I can find on the internet, that does seem to be the default case), and whether this micro-optimization is worth the increased code complexity.\r\n\r\nAdditionally, we can skip some subtraction & logic stuff if the octal value is computed immediately and used, which saves a few cycles.\r\n\r\nThe 2nd improvement (which is independent of the first) is the use of `iter().position()` instead of `ByteSlice::find_byte` in `decode::fast_entry`. It yielded the following improvements (compared to only the happy-path fix) for me\r\n\r\n```\r\nTreeRef()               time:   [84.198 ns 84.299 ns 84.405 ns]\r\n                        change: [−13.030% −12.865% −12.676%] (p = 0.00 < 0.05)\r\n                        Performance has improved.\r\n\r\nTreeRefIter()           time:   [26.710 ns 26.887 ns 27.067 ns]\r\n                        change: [−35.780% −35.469% −35.121%] (p = 0.00 < 0.05)\r\n                        Performance has improved.\r\n``` \r\n\r\nThis large a speedup was actually a little unexpected, ~but as indicated in the commit message, I guess there's some \"we were blocking the compiler from optimizing/vectorizing for us\" that is now removed.~. Looking at the compiler output in compiler explorer actually does not support this theory. I'm not entirely sure what `TREE` looks like, but perhaps this is just a false positive: the names used in the benchmark are too small to benefit from the `memchr` implementation that `find_byte` uses, so using a basic simple loop (which is what the `iter().position()` compiles to) is faster.","pull_head_sha":"afe37536a6807b9f76c42f2d27e46c698cdb27ae","loci_pr_branch":"loci/pr-2461-mini-optimize","short_merge_base":"15c835a","loci_main_branch":"loci/main-15c835a","use_loci_base":0}

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+{"pull_number":"2461","title":"perf(EntryMode::extract_from_bytes): add happy path check","body":"Since the position of the space in the entrymode is often 6, we can add an explicit check for this case and skip some of the operations performed in the loop, making the benchmark a little faster.\r\n\r\nBenches (`cargo bench --bench decode-objects -- TreeRef`) when compared to `main`:\r\n```\r\nTreeRef() time: [91.447 ns 91.539 ns 91.631 ns]\r\n change: [−4.4580% −4.3152% −4.1764%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n\r\nTreeRefIter() time: [34.566 ns 34.611 ns 34.661 ns]\r\n change: [−15.910% −15.735% −15.567%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n```\r\n\r\nObviously the usefulness of this change relies on two things: the case of index 6 being the space is indeed the happy path (and from what I can find on the internet, that does seem to be the default case), and whether this micro-optimization is worth the increased code complexity.\r\n\r\nAdditionally, we can skip some subtraction & logic stuff if the octal value is computed immediately and used, which saves a few cycles.\r\n\r\nThe 2nd improvement (which is independent of the first) is the use of `iter().position()` instead of `ByteSlice::find_byte` in `decode::fast_entry`. It yielded the following improvements (compared to only the happy-path fix) for me\r\n\r\n```\r\nTreeRef() time: [84.198 ns 84.299 ns 84.405 ns]\r\n change: [−13.030% −12.865% −12.676%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n\r\nTreeRefIter() time: [26.710 ns 26.887 ns 27.067 ns]\r\n change: [−35.780% −35.469% −35.121%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n``` \r\n\r\nThis large a speedup was actually a little unexpected, ~but as indicated in the commit message, I guess there's some \"we were blocking the compiler from optimizing/vectorizing for us\" that is now removed.~. Looking at the compiler output in compiler explorer actually does not support this theory. I'm not entirely sure what `TREE` looks like, but perhaps this is just a false positive: the names used in the benchmark are too small to benefit from the `memchr` implementation that `find_byte` uses, so using a basic simple loop (which is what the `iter().position()` compiles to) is faster.","pull_head_sha":"afe37536a6807b9f76c42f2d27e46c698cdb27ae","loci_pr_branch":"loci/pr-2461-mini-optimize","short_merge_base":"15c835a","loci_main_branch":"loci/main-15c835a","use_loci_base":0}