Skip to content

Commit 3a9c4ae

Browse files
committed
Add loci-analysis workflow from overlay
1 parent 6183fd0 commit 3a9c4ae

File tree

2 files changed

+70
-0
lines changed

2 files changed

+70
-0
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
name: LOCI Analysis
2+
on:
3+
push:
4+
branches:
5+
- loci/main-*
6+
pull_request:
7+
types: [opened, synchronize, reopened]
8+
9+
jobs:
10+
loci:
11+
if: vars.UPSTREAM_REPO != ''
12+
runs-on: ubuntu-latest
13+
14+
env:
15+
LOCI_PROJECT: 'gitoxide'
16+
LOCI_API_KEY: '${{ secrets.LOCI_API_KEY }}'
17+
LOCI_BACKEND_URL: '${{ vars.LOCI_BACKEND_URL }}'
18+
GH_TOKEN: ${{ secrets.MIRROR_REPOS_WRITE_PAT }}
19+
CC_aarch64_unknown_linux_gnu: "aarch64-linux-gnu-gcc"
20+
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER: "aarch64-linux-gnu-gcc"
21+
RUSTFLAGS: "-C debuginfo=2"
22+
OPENSSL_STATIC: "1"
23+
24+
environment: ${{ vars.LOCI_ENV || 'PROD__AL_DEMO' }}
25+
26+
steps:
27+
- name: Checkout repository
28+
uses: actions/checkout@v4
29+
with:
30+
fetch-depth: 0
31+
ref: ${{ (github.event_name == 'pull_request' && github.event.pull_request.head.sha) || github.sha }}
32+
33+
- name: Compute target
34+
id: target
35+
if: github.event_name == 'push'
36+
run: |
37+
branch="${{ github.ref_name }}"
38+
sha="${branch#loci/main-}"
39+
echo "value=main@${sha}" >> "$GITHUB_OUTPUT"
40+
41+
- name: Compute base
42+
id: base
43+
if: github.event_name == 'pull_request'
44+
run: |
45+
git remote add upstream "https://github.com/${{ vars.UPSTREAM_REPO }}.git" 2>/dev/null || true
46+
git fetch upstream
47+
upstream_default=$(gh api "repos/${{ vars.UPSTREAM_REPO }}" --jq .default_branch)
48+
merge_base=$(git merge-base HEAD "upstream/${upstream_default}")
49+
short_sha="${merge_base:0:7}"
50+
echo "value=main@${short_sha}" >> "$GITHUB_OUTPUT"
51+
52+
- name: Install dependencies
53+
run: sudo apt install gcc-aarch64-linux-gnu && rustup target add aarch64-unknown-linux-gnu
54+
shell: bash -euo pipefail {0}
55+
56+
- name: Build
57+
run: cargo build --target aarch64-unknown-linux-gnu --release --bins --no-default-features --features max-pure
58+
shell: bash -euo pipefail {0}
59+
60+
- name: Upload
61+
uses: auroralabs-loci/loci-action@v1
62+
with:
63+
mode: upload
64+
binaries: |
65+
target/aarch64-unknown-linux-gnu/release/ein
66+
target/aarch64-unknown-linux-gnu/release/gix
67+
project: '${{ env.LOCI_PROJECT }}'
68+
target: ${{ steps.target.outputs.value || ''}}
69+
base: ${{ steps.base.outputs.value || '' }}

pulls.ndjson

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"pull_number":"2461","title":"perf(EntryMode::extract_from_bytes): add happy path check","body":"Since the position of the space in the entrymode is often 6, we can add an explicit check for this case and skip some of the operations performed in the loop, making the benchmark a little faster.\r\n\r\nBenches (`cargo bench --bench decode-objects -- TreeRef`) when compared to `main`:\r\n```\r\nTreeRef() time: [91.447 ns 91.539 ns 91.631 ns]\r\n change: [−4.4580% −4.3152% −4.1764%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n\r\nTreeRefIter() time: [34.566 ns 34.611 ns 34.661 ns]\r\n change: [−15.910% −15.735% −15.567%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n```\r\n\r\nObviously the usefulness of this change relies on two things: the case of index 6 being the space is indeed the happy path (and from what I can find on the internet, that does seem to be the default case), and whether this micro-optimization is worth the increased code complexity.\r\n\r\nAdditionally, we can skip some subtraction & logic stuff if the octal value is computed immediately and used, which saves a few cycles.\r\n\r\nThe 2nd improvement (which is independent of the first) is the use of `iter().position()` instead of `ByteSlice::find_byte` in `decode::fast_entry`. It yielded the following improvements (compared to only the happy-path fix) for me\r\n\r\n```\r\nTreeRef() time: [84.198 ns 84.299 ns 84.405 ns]\r\n change: [−13.030% −12.865% −12.676%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n\r\nTreeRefIter() time: [26.710 ns 26.887 ns 27.067 ns]\r\n change: [−35.780% −35.469% −35.121%] (p = 0.00 < 0.05)\r\n Performance has improved.\r\n``` \r\n\r\nThis large a speedup was actually a little unexpected, ~but as indicated in the commit message, I guess there's some \"we were blocking the compiler from optimizing/vectorizing for us\" that is now removed.~. Looking at the compiler output in compiler explorer actually does not support this theory. I'm not entirely sure what `TREE` looks like, but perhaps this is just a false positive: the names used in the benchmark are too small to benefit from the `memchr` implementation that `find_byte` uses, so using a basic simple loop (which is what the `iter().position()` compiles to) is faster.","pull_head_sha":"afe37536a6807b9f76c42f2d27e46c698cdb27ae","loci_pr_branch":"loci/pr-2461-mini-optimize","short_merge_base":"15c835a","loci_main_branch":"loci/main-15c835a","use_loci_base":0}

0 commit comments

Comments
 (0)