MDEV-39412: parse error reading tabs in ranges#5049
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-39412 by implementing JSON escaping for range information in optimizer context dumps and adding relevant regression tests. Feedback highlights that removing the unescaping logic from the JSON reader breaks standard behavior and should be reverted. Additionally, an optimization was suggested to move a buffer allocation outside a loop in the context dumping logic to improve efficiency.
| } | ||
|
|
||
| value= strdup_root(mem_root, val_buf.c_ptr_safe()); | ||
| value= strmake_root(mem_root, (const char *) je->value, je->value_len); |
There was a problem hiding this comment.
Removing the unescaping logic from read_string breaks the standard behavior of a JSON parser. JSON strings are expected to be unescaped to correctly handle sequences like \n, \t, \", and \\.
The issue described in the PR (where \\t becomes \t but should remain \\t) is actually addressed by the double-escaping introduced in dump_mrr_info_calls. With double-escaping, the JSON will contain \\\\t, which after SQL unescaping becomes \\t, and then the JSON reader's unescaping will turn it into \t (the literal backslash and 't').
By removing unescaping here, you are making the JSON reader return the raw, escaped content of the JSON string, which is non-standard and will break other potential users of this utility function. Please restore the unescaping logic.
StringBuffer<128> val_buf;
if (json_unescape_to_string((const char *) je->value, je->value_len,
&val_buf))
{
err_buf->append(STRING_WITH_LEN("un-escaping error of "));
err_buf->append(read_elem_key, strlen(read_elem_key));
err_buf->append(STRING_WITH_LEN(" element"));
return true;
}
value= strdup_root(mem_root, val_buf.c_ptr_safe());| rc= json_read_object(&je, array, &err_buf) || | ||
| strcmp(parsed_esc_str, esc_str_val); |
There was a problem hiding this comment.
This test change validates that the reader does NOT unescape, which is incorrect for a JSON library. For example, a JSON string "a\\bc" should be unescaped to a\bc (where \b is a backspace character if following standard JSON rules, or literal backslash and 'b' if escaped as \\b). Once read_string is restored to its correct behavior, this test should be updated to verify that unescaping is handled correctly.
There was a problem hiding this comment.
Confirm. We should address Gemini's input here.
It's good that unit test exposes this.
There was a problem hiding this comment.
addressed it.
|
Discussion take-aways:
const char *SET_REPLAY_CONTEXT_VAR=
"set optimizer_replay_context=\'opt_context\'";
String *s= const_cast<String *>(ctx_writer.output.get_string());
sql_script.append(SET_OPT_CONTEXT_VAR, strlen(SET_OPT_CONTEXT_VAR));
sql_script.append(*s);here we should do extra escaping to do the opposite what the SQL parser unescaping does...
|
Well, although, the test case reported in this Jira succeeded with this change, 2 new problems emerge with this approach: -
|
|
Hi @bsrikanth-mariadb, |
|
Note to self : there are |
The extra escaping that we have added to counter the un-escaping being doing by sql_parse (used when reading the sql_script file) is making the tests in opt_context_store_stats fail. Reason being, the tests here read the context from information_schema.optimizer_context for validation. However, we now get for eg: - "Syntax error in JSON text in argument 1 to function 'json_extract' at position 393" |
37d9189 to
a924ecc
Compare
a924ecc to
43d1a9e
Compare
This is taken care now. Changed the logic as described in the commit comment. |
43d1a9e to
c9919e0
Compare
Note:
while reading from information_schema.optimizer_context one level of unescaping
is already done i.e. (\\t becomes \t or \\\\t becomes \\t)
w.r.t the MDEV, there are 2 problems: -
1.
When reading from the sql script file, json parser is not able to parse
the range value in json_read_value() from json_lib.c
"ranges": [
"(b\t\t\t\t\t\t) <= (b) <= (b???????)"
],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context,
we get Impossible WHERE noticed after reading const tables
2.
There is unescaping call being made in read_string() from sql_json_lib.cc
while parsing of the context. With this \\t was becoming \t.
However, print_range() from opt_range.cc already does escaping of the values.
The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t".
Later, we try to compare range values from the query and the context.
Here a mismatch is found because, in one case there is escaping,
and in the other case escaping got removed.
Solution
========
Since, there are 2 levels of unescaping being performed, 1. during sql
parse of the context from information_schema, and 2. during
read_string. So, we need to have 2 levels of escaping.
First is done in the dump_mrr_info_calls() - here
json_escape_to_string() is used.
Second is done at the end of store_optimizer_context(), for the entire
opt_context. Here a newly introduced function escape_json_for_sql_literal()
is used which does escaping only for backslash, and single quote.
c9919e0 to
5f06c84
Compare
Uh oh!
There was an error while loading. Please reload this page.