Skip to content

Commit 3e60ec2

Browse files
committed
Optimize INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP
Claude Code helped to optimize this potentially slow regexp depending on the document. Specifically, the backtracking in the original regexp was the main performance issue. This does a few things: 1. adds a very simple and faster pre-check to gate if the main regexp should even run 2. prevents as much backtracking as possible Benchmarks: | Document Type | Before | After | Speedup | |--------------------------------|--------|--------|---------| | Typical large (125KB) | 0.72s | 0.06s | 12x | | Colon-heavy (35KB) | 0.24s | 0.007s | 34x | | Pathological worst-case (26KB) | 1.64s | 0.25s | 6.6x | | No digits (17KB) | 0.10s | 0.002s | 54x |
1 parent 5065756 commit 3e60ec2

1 file changed

Lines changed: 18 additions & 12 deletions

File tree

lib/graphql/language.rb

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -79,22 +79,28 @@ def self.escape_single_quoted_newlines(query_str)
7979

8080
LEADING_REGEX = Regexp.union(" ", *Lexer::Punctuation.constants.map { |const| Lexer::Punctuation.const_get(const) })
8181

82+
# Optimized pattern using:
83+
# - Possessive quantifiers (*+, ++) to prevent backtracking in number patterns
84+
# - Atomic group (?>...) for IGNORE to prevent backtracking
85+
# - Single unified number pattern instead of three alternatives
86+
EFFICIENT_NUMBER_REGEXP = /-?(?:0|[1-9][0-9]*+)(?:\.[0-9]++)?(?:[eE][+-]?[0-9]++)?/
87+
EFFICIENT_IGNORE_REGEXP = /(?>[, \r\n\t]+|\#[^\n]*$)*/
88+
89+
MAYBE_INVALID_NUMBER = /\d[_a-zA-Z]/
90+
8291
INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP = %r{
8392
(?<leading>#{LEADING_REGEX})
84-
(
85-
((?<num>#{Lexer::INT_REGEXP}(#{Lexer::FLOAT_EXP_REGEXP})?)(?<name>#{Lexer::IDENTIFIER_REGEXP})#{Lexer::IGNORE_REGEXP}:)
86-
|
87-
((?<num>#{Lexer::INT_REGEXP}#{Lexer::FLOAT_DECIMAL_REGEXP}#{Lexer::FLOAT_EXP_REGEXP})(?<name>#{Lexer::IDENTIFIER_REGEXP})#{Lexer::IGNORE_REGEXP}:)
88-
|
89-
((?<num>#{Lexer::INT_REGEXP}#{Lexer::FLOAT_DECIMAL_REGEXP})(?<name>#{Lexer::IDENTIFIER_REGEXP})#{Lexer::IGNORE_REGEXP}:)
90-
)}x
93+
(?<num>#{EFFICIENT_NUMBER_REGEXP})
94+
(?<name>#{Lexer::IDENTIFIER_REGEXP})
95+
#{EFFICIENT_IGNORE_REGEXP}
96+
:
97+
}x
9198

9299
def self.add_space_between_numbers_and_names(query_str)
93-
if query_str.match?(INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP)
94-
query_str.gsub(INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP, "\\k<leading>\\k<num> \\k<name>:")
95-
else
96-
query_str
97-
end
100+
# Fast check for digit followed by identifier char. If this doesn't match, skip the more expensive regexp entirely.
101+
return query_str unless query_str.match?(MAYBE_INVALID_NUMBER)
102+
return query_str unless query_str.match?(INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP)
103+
query_str.gsub(INVALID_NUMBER_FOLLOWED_BY_NAME_REGEXP, "\\k<leading>\\k<num> \\k<name>:")
98104
end
99105
end
100106
end

0 commit comments

Comments
 (0)