Skip to content

Add string slicing support#2639

Merged
mikefarah merged 8 commits intomasterfrom
copilot/add-string-slice-support
Apr 6, 2026
Merged

Add string slicing support#2639
mikefarah merged 8 commits intomasterfrom
copilot/add-string-slice-support

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 27, 2026

yq supported array slicing (.[1:3]) but not string slicing — .field[4:] on a string value silently returned the wrong result or panicked.

Changes

  • operator_slice.go — Added sliceStringNode (rune-based for Unicode correctness); uses the shared clampSliceIndex helper for bounds clamping. sliceArrayOperator now routes !!str scalar nodes to it instead of the array path. The original node's tag and style are preserved on the replacement.
  • operator_traverse_path.go — Fixed traverseArrayOperator to evaluate the LHS before calling sliceArrayOperator. Previously it passed the raw document context, so .country[4:] would operate on the whole map rather than the country value.
  • lexer.go — Extended the implicit-0 insertion (for omitted start index in [:N]) to fire on openCollect ([) in addition to traverseArrayCollect (.[), enabling .field[:N] syntax. The condition is tightened to require that openCollect follows a traversal token (operationToken, closeCollect, or closeCollectObject), preventing standalone [: "b"] map literals from being silently misinterpreted.
  • operator_slice_test.go — Added string slicing scenarios: start-only, end-only, both, negative indices, out-of-bounds clamping (including underflow), full [:], and Unicode. Regression tests for very-negative array indices (OSS-Fuzz) are included.
  • doc/operators/headers/slice-array.md — Header updated to "Slice Array or String" (removed "Splice" which implied unsupported in-place mutation).

Example

# sample.yml
country: Australia
yq '.country[4:]'   sample.yml  # ralia
yq '.country[:5]'   sample.yml  # Austr
yq '.country[-5:]'  sample.yml  # ralia
yq '.country[1:-1]' sample.yml  # ustrali

Unicode indices are rune-based:

echo 'greeting: héllo' | yq '.greeting[1:3]'  # él

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Copilot AI changed the title [WIP] Add support for string slicing in yq Add string slicing support Mar 27, 2026
Copilot AI requested a review from mikefarah March 27, 2026 10:15
```
then
```bash
yq '.country[0:5]' sample.yml
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test says without the second number, but both numbers are present

description: "Slicing strings - without the second number",
subdescription: "Finishes at the end of the string",
document: `country: Australia`,
expression: `.country[0:5]`,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this has the second number...should this be .country[5:] ?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot fix this ^;

also the build is failing with:

Error: pkg/yqlib/operator_slice.go:19:98: sliceStringNode - result 1 (error) is always nil (unparam)
func sliceStringNode(lhsNode *CandidateNode, firstNumber int, secondNumber int) (*CandidateNode, error) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9a9399a:

  • Changed "Slicing strings - without the second number" expression from .country[0:5] to .country[5:] (now actually omits the second number, giving alia)
  • Changed the first "Slicing strings" test to use both numbers: .country[0:5]Austr
  • Removed the always-nil error return from sliceStringNode to fix the build warning

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds correct string-slicing behavior to yq’s slice operator and fixes traversal so slice operations apply to the evaluated LHS value (e.g. .country[4:] slices the country string, not the root document).

Changes:

  • Add rune-based string slicing support and route !!str scalar nodes through it.
  • Fix traverseArrayOperator to evaluate/pipeline the LHS before applying slice logic.
  • Extend lexer post-processing to support omitted-start string slicing syntax like .field[:N].

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
project-words.txt Adds new words used by tests/docs to the project dictionary.
pkg/yqlib/operator_traverse_path.go Ensures slice operations receive the evaluated LHS context.
pkg/yqlib/operator_slice.go Introduces sliceStringNode and routes string scalars through it.
pkg/yqlib/lexer.go Enables implicit 0 insertion for [:N] after [ (not just after .[).
pkg/yqlib/operator_slice_test.go Adds coverage for string slicing (including Unicode and bounds cases).
pkg/yqlib/doc/operators/slice-array.md Updates generated operator docs to include string slicing examples.
pkg/yqlib/doc/operators/headers/slice-array.md Updates the slice operator header to reflect string support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +77 to +80
relativeFirstNumber := firstNumber
if relativeFirstNumber < 0 {
relativeFirstNumber = len(lhsNode.Content) + firstNumber
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array slicing can still panic when the start index is more negative than the array length (e.g. .[-100:]): relativeFirstNumber is adjusted by len(lhsNode.Content)+firstNumber but is not clamped back to 0, so the loop can index lhsNode.Content with a negative i. Clamp relativeFirstNumber (and similarly relativeSecondNumber) into [0,len] before iterating to avoid runtime panics and match the existing out-of-bounds clamping behavior for other cases.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 341e252. Array slice now clamps both relativeFirstNumber and relativeSecondNumber into [0, len(Content)] after adjusting for negative indices, preventing a panic on expressions like .[-100:]. Also added test cases for both directions of out-of-bounds negative indices.

@mikefarah mikefarah marked this pull request as ready for review March 28, 2026 08:19
@jandubois
Copy link
Copy Markdown
Contributor

Hi Mike, while I was working on some OSS-Fuzz bug reports against yqlib I've also been running my Deep Review tool against this PR because I was curious what it would find: https://jandubois.github.io/yq/20260405-232458-pr-2639.html

For the I2 finding (index clamping) you could use the clampSliceIndex function from #2646 that I filed earlier. It deals with the same issue for regular slices.

Haven't looked through the review in detail yet, but thought you might be interested.

@mikefarah
Copy link
Copy Markdown
Owner

@copilot resolve conflicts with main; you should be able to use the new "clampSliceIndex" function; and action the following feeback

S2. Lexer openCollect change affects standalone [: expr] ClaudeGemini

The added openCollect condition correctly enables .field[:N] but also fires on standalone [ (array literal start). [: "b"] now silently produces [{0: b}] instead of an error. Unlikely to affect real users, but broadens accepted syntax unintentionally. (suggestion, gap)
pkg/yqlib/lexer.go

if tokenIsOpType(currentToken, createMapOpType) {
	log.Debugf("tokenIsOpType: createMapOpType")
	// check the previous token is '[', means we are slice, but dont have a first number
	if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) {
		log.Debugf("previous token is : traverseArrayOpType")
		// need to put the number 0 before this token, as that is implied
		postProcessedTokens = append(postProcessedTokens, &token{TokenType: operationToken, Operation: createValueOperation(0, "0")})
	}
}

Fix: Tighten the condition to require that openCollect follows a traversal token.

S3. Doc header "Splice" misleading for strings Claude

The header was updated from "Slice/Splice Array" to "Slice/Splice Array or String". "Splice" implies in-place mutation, which is not supported for strings. Consider "Slice Array or String" instead.

S4. Documentation uses only ASCII examples Claude

All four string slicing doc sections use country: Australia (ASCII-only). Since the feature's distinguishing value is rune-based Unicode handling, one documented example with a multibyte string would demonstrate this.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot resolve conflicts with main; you should be able to use the new "clampSliceIndex" function; and action the following feeback

...

Done in adb150d (and merge commit 1afa3c7):

  • Merge master: Integrated clampSliceIndex from master and refactored sliceStringNode to use it, removing the duplicated clamping logic.
  • S2 (lexer): Tightened the openCollect condition to only insert an implicit 0 when the [ follows an operationToken, closeCollect, or closeCollectObject — so [: "b"] map literals are no longer silently accepted.
  • S3 (header): Changed "Slice/Splice Array or String" → "Slice Array or String".
  • S4 (Unicode doc): The Unicode test (greeting: héllo, .greeting[1:3]él) now has description/subdescription and is included in the generated docs.

@mikefarah
Copy link
Copy Markdown
Owner

@copilot build failed; pkg/yqlib/doc/operators/slice-array.md:169:28 - Unknown word (multibyte)

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot build failed; pkg/yqlib/doc/operators/slice-array.md:169:28 - Unknown word (multibyte)

Fixed in 7bc2d0c — changed "multibyte" to "multi-byte" in the Unicode test subdescription. Spelling check now passes with 0 issues.

@mikefarah mikefarah merged commit 3b2423e into master Apr 6, 2026
5 checks passed
@jandubois
Copy link
Copy Markdown
Contributor

Hi Mike, thanks for the quick updates!

I'm wondering what your plans are regarding the length byte vs rune inconsistency? Changing how length works is potentially a breaking change, but I think one can argue that it is a bug fix and that strings always were supposed to be composed of runes / codepoints, and not bytes.

Also automated check of review against merged state:


Follow-up: PR #2639 Post-Merge Finding Status

Checked against: upstream/master at 3b2423e8 (squash-merge of PR #2639) plus 80139ae1 (negative-index fix from fix-slice-negative-index branch, merged as PR #2646)

Addressed

I2 — Inconsistent index-clamping structure. Fixed. sliceStringNode now calls the shared clampSliceIndex helper introduced by PR #2646. Both the string and array paths use identical clamping logic.

I3 — Missing string edge-case tests. Partially fixed. The merged PR adds tests for explicit indices, omitted start/end, negative indices, both ends, full slice [:], out-of-bounds clamping, and one Unicode case (héllo[1:3]). Still missing: very negative string indices (.country[-100:], .country[:-100]), empty string slicing, and non-string scalar slicing.

S1 — Commit structure. Fixed. Squash-merged into a single commit.

S2 — Lexer openCollect side-effect on [: expr]. Fixed. The condition now requires index >= 2 and verifies the preceding token is an operation or close-bracket, so standalone [: "b"] no longer silently inserts 0.

S3 — "Splice" in doc header. Fixed. Header changed to "Slice Array or String".

S4 — ASCII-only documentation examples. Fixed. Added a Unicode example (héllo[1:3]) with a subdescription explaining rune-based indexing.

Not addressed

I1 — Byte-vs-rune length inconsistency. operator_length.go is unchanged. "café" | length still returns 5 (bytes) while slicing treats the same string as 4 characters. Verified on upstream/master:

$ echo '"café"' | yq '(. | length) as $n | .[($n - 2):]'
é           # wrong — expected "fé"

$ echo '"café"' | yq '.[-2:]'
fé          # correct

This is the most impactful remaining issue: any expression that feeds length output into a slice bound produces wrong results on multibyte strings.

I4 — Non-string scalar slicing produces invalid output. No guard was added before the array fallthrough. .value[0:2] on a !!int scalar still produces !!int [].

I5 — Slice index expressions evaluated against LHS instead of root context. sliceArrayOperator still receives lhsContext for both node iteration and index evaluation. .a[0:.max] still fails with "cannot index array with 'max'" instead of resolving .max from the document root.

Summary

Six of nine findings addressed (all four suggestions, I2, and partially I3). The three remaining important issues (I1, I4, I5) are pre-existing gaps that the PR exposed rather than introduced — none are regressions.

jandubois added a commit to jandubois/yq that referenced this pull request Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants