Skip to content

MDEV-39412: parse error reading tabs in ranges#5049

Open
bsrikanth-mariadb wants to merge 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges
Open

MDEV-39412: parse error reading tabs in ranges#5049
bsrikanth-mariadb wants to merge 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges

Conversation

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor

@bsrikanth-mariadb bsrikanth-mariadb commented May 7, 2026

Note:
while reading from information_schema.optimizer_context one level of unescaping
is already done i.e. (\\t becomes \t or \\\\t becomes \\t)

w.r.t the MDEV, there are 2 problems: -

1.
When reading from the sql script file, json parser is not able to parse
the range value in json_read_value() from json_lib.c
"ranges": [
            "(b\t\t\t\t\t\t) <= (b) <= (b???????)"
          ],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context,
we get Impossible WHERE noticed after reading const tables

2.
There is unescaping call being made in read_string() from sql_json_lib.cc
while parsing of the context. With this \\t was becoming \t.
However, print_range() from opt_range.cc already does escaping of the values.
The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t".
Later, we try to compare range values from the query and the context.

Here a mismatch is found because, in one case there is escaping,
and in the other case escaping got removed.

Solution
========
Since, there are 2 levels of unescaping being performed, 1. during sql
parse of the context from information_schema, and 2. during
read_string. So, we need to have 2 levels of escaping.

First is done in the dump_mrr_info_calls() - here
json_escape_to_string() is used.

Second is done at the end of store_optimizer_context(), for the entire
opt_context. Here a newly introduced function escape_json_for_sql_literal()
is used which does escaping only for backslash, and single quote.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39412 by implementing JSON escaping for range information in optimizer context dumps and adding relevant regression tests. Feedback highlights that removing the unescaping logic from the JSON reader breaks standard behavior and should be reverted. Additionally, an optimization was suggested to move a buffer allocation outside a loop in the context dumping logic to improve efficiency.

Comment thread sql/sql_json_lib.cc Outdated
}

value= strdup_root(mem_root, val_buf.c_ptr_safe());
value= strmake_root(mem_root, (const char *) je->value, je->value_len);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Removing the unescaping logic from read_string breaks the standard behavior of a JSON parser. JSON strings are expected to be unescaped to correctly handle sequences like \n, \t, \", and \\.

The issue described in the PR (where \\t becomes \t but should remain \\t) is actually addressed by the double-escaping introduced in dump_mrr_info_calls. With double-escaping, the JSON will contain \\\\t, which after SQL unescaping becomes \\t, and then the JSON reader's unescaping will turn it into \t (the literal backslash and 't').

By removing unescaping here, you are making the JSON reader return the raw, escaped content of the JSON string, which is non-standard and will break other potential users of this utility function. Please restore the unescaping logic.

  StringBuffer<128> val_buf;
  if (json_unescape_to_string((const char *) je->value, je->value_len,
                              &val_buf))
  {
    err_buf->append(STRING_WITH_LEN("un-escaping error of "));
    err_buf->append(read_elem_key, strlen(read_elem_key));
    err_buf->append(STRING_WITH_LEN(" element"));
    return true;
  }

  value= strdup_root(mem_root, val_buf.c_ptr_safe());

Comment thread sql/opt_context_store_replay.cc
Comment on lines +63 to +64
rc= json_read_object(&je, array, &err_buf) ||
strcmp(parsed_esc_str, esc_str_val);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test change validates that the reader does NOT unescape, which is incorrect for a JSON library. For example, a JSON string "a\\bc" should be unescaped to a\bc (where \b is a backspace character if following standard JSON rules, or literal backslash and 'b' if escaped as \\b). Once read_string is restored to its correct behavior, this test should be updated to verify that unescaping is handled correctly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm. We should address Gemini's input here.
It's good that unit test exposes this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed it.

@spetrunia
Copy link
Copy Markdown
Member

Discussion take-aways:

  1. Make the JSON writer do escaping (try using json_escape() )

  2. The "Json parser" layer we've developed for this task should do unescaping in read_string().

  3. In store_optimizer_context(),

    const char *SET_REPLAY_CONTEXT_VAR=
        "set optimizer_replay_context=\'opt_context\'";
    String *s= const_cast<String *>(ctx_writer.output.get_string());
    sql_script.append(SET_OPT_CONTEXT_VAR, strlen(SET_OPT_CONTEXT_VAR));
    sql_script.append(*s);

here we should do extra escaping to do the opposite what the SQL parser unescaping does...

  1. (This is for the code outside of the scope of this bug, but look nevertheless) Check why we've got question mark substitution when string representation of range was produced. It currently has "(a) <= (a???????)", it shouldn't do that

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

Discussion take-aways:

  1. Make the JSON writer do escaping (try using json_escape() )
  2. The "Json parser" layer we've developed for this task should do unescaping in read_string().
  3. In store_optimizer_context(),
    const char *SET_REPLAY_CONTEXT_VAR=
        "set optimizer_replay_context=\'opt_context\'";
    String *s= const_cast<String *>(ctx_writer.output.get_string());
    sql_script.append(SET_OPT_CONTEXT_VAR, strlen(SET_OPT_CONTEXT_VAR));
    sql_script.append(*s);

here we should do extra escaping to do the opposite what the SQL parser unescaping does...

  1. (This is for the code outside of the scope of this bug, but look nevertheless) Check why we've got question mark substitution when string representation of range was produced. It currently has "(a) <= (a???????)", it shouldn't do that

Well, although, the test case reported in this Jira succeeded with this change, 2 new problems emerge with this approach: -

  1. Existing tests such as opt_context_store_stats fail with the parsing error.
  2. The opt_context loses all the formatting.

@spetrunia
Copy link
Copy Markdown
Member

Hi @bsrikanth-mariadb,
I don't have a clear understanding the issues you've mentioned. Please provide more details. If nothing else, at least attach the current diff here as a file and examples of what gets broken.

@spetrunia
Copy link
Copy Markdown
Member

Note to self : there are String::append_for_single_quote() / String::append_for_single_quote_using_mb(). They seem to do quoting that's reverse to the un-quoting that the SQL parser will do for the string constant in set @context='...'.

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

Hi @bsrikanth-mariadb, I don't have a clear understanding the issues you've mentioned. Please provide more details. If nothing else, at least attach the current diff here as a file and examples of what gets broken.

The extra escaping that we have added to counter the un-escaping being doing by sql_parse (used when reading the sql_script file) is making the tests in opt_context_store_stats fail. Reason being, the tests here read the context from information_schema.optimizer_context for validation. However, we now get for eg: - "Syntax error in JSON text in argument 1 to function 'json_extract' at position 393"

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges branch from 37d9189 to a924ecc Compare May 21, 2026 05:37
@bsrikanth-mariadb bsrikanth-mariadb marked this pull request as draft May 21, 2026 07:11
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges branch from a924ecc to 43d1a9e Compare May 21, 2026 09:23
@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

Hi @bsrikanth-mariadb, I don't have a clear understanding the issues you've mentioned. Please provide more details. If nothing else, at least attach the current diff here as a file and examples of what gets broken.

The extra escaping that we have added to counter the un-escaping being doing by sql_parse (used when reading the sql_script file) is making the tests in opt_context_store_stats fail. Reason being, the tests here read the context from information_schema.optimizer_context for validation. However, we now get for eg: - "Syntax error in JSON text in argument 1 to function 'json_extract' at position 393"

This is taken care now. Changed the logic as described in the commit comment.

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges branch from 43d1a9e to c9919e0 Compare May 21, 2026 11:42
@bsrikanth-mariadb bsrikanth-mariadb marked this pull request as ready for review May 21, 2026 11:42
Note:
while reading from information_schema.optimizer_context one level of unescaping
is already done i.e. (\\t becomes \t or \\\\t becomes \\t)

w.r.t the MDEV, there are 2 problems: -

1.
When reading from the sql script file, json parser is not able to parse
the range value in json_read_value() from json_lib.c
"ranges": [
            "(b\t\t\t\t\t\t) <= (b) <= (b???????)"
          ],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context,
we get Impossible WHERE noticed after reading const tables

2.
There is unescaping call being made in read_string() from sql_json_lib.cc
while parsing of the context. With this \\t was becoming \t.
However, print_range() from opt_range.cc already does escaping of the values.
The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t".
Later, we try to compare range values from the query and the context.

Here a mismatch is found because, in one case there is escaping,
and in the other case escaping got removed.

Solution
========
Since, there are 2 levels of unescaping being performed, 1. during sql
parse of the context from information_schema, and 2. during
read_string. So, we need to have 2 levels of escaping.

First is done in the dump_mrr_info_calls() - here
json_escape_to_string() is used.

Second is done at the end of store_optimizer_context(), for the entire
opt_context. Here a newly introduced function escape_json_for_sql_literal()
is used which does escaping only for backslash, and single quote.
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges branch from c9919e0 to 5f06c84 Compare May 22, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants