Skip to content

Commit a29d39b

Browse files
authored
Fix batched insert regex to support backtick-quoted identifiers with special characters (#1403)
## Summary - Fixed `INSERT_DETAILS_PATTERN` regex in `InsertStatementParser` to support backtick-quoted identifiers containing special characters (hyphens, spaces, escaped backticks, etc.) in table/schema/catalog names - Added a `warn` log in `PreparedStatementBatchExecutor.canUseBatchedInsert()` when parsing fails and the driver falls back to individual execution - Added unit tests for the new regex behavior ## Root Cause The regex capture group for the table name matched characters individually, so backtick-quoted identifiers with special characters failed to match. The fix changes the group to match whole backtick-quoted segments — allowing any character inside, including escaped backtick pairs — or plain identifiers, separated by dots. ## Test plan - [x] All `InsertStatementParserTest` tests pass - [x] Verified against real Databricks warehouse with various table name formats - [x] Warn log confirmed to appear when fallback occurs Fixes #1398 This pull request was AI-assisted by Isaac. --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
1 parent 5742436 commit a29d39b

4 files changed

Lines changed: 72 additions & 1 deletion

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
### Updated
1010

1111
### Fixed
12+
- Fixed `EnableBatchedInserts` silently falling back to individual execution when table or schema names contain special characters (e.g., hyphens) inside backtick-quoted identifiers. Added a warn log when the fallback occurs.
1213
- Fixed `IntervalConverter` crash (`IllegalArgumentException: Invalid interval metadata`) when INTERVAL columns are returned via CloudFetch. Arrow metadata from CloudFetch uses underscored format (`INTERVAL_YEAR_MONTH`, `INTERVAL_DAY_TIME`) which the driver's regex did not accept.
1314
- Fixed primitive types within complex types (ARRAY, MAP, STRUCT) not being correctly parsed when Arrow serialization uses alternate formats: TIMESTAMP/TIMESTAMP_NTZ as epoch microseconds or component arrays, and BINARY as base64-encoded strings.
1415
- Fixed `PARSE_SYNTAX_ERROR` for column names containing special characters (e.g., dots) when `EnableBatchedInserts` is enabled, by re-quoting column names with backticks in reconstructed multi-row INSERT statements.

src/main/java/com/databricks/jdbc/api/impl/PreparedStatementBatchExecutor.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,10 @@ private boolean canUseBatchedInsert() {
7272
return true;
7373
} catch (Exception e) {
7474
// Not a valid INSERT statement suitable for batching
75+
LOGGER.warn(
76+
"EnableBatchedInserts is enabled but the INSERT statement could not be parsed for"
77+
+ " batching, falling back to individual execution: {}",
78+
e.getMessage());
7579
return false;
7680
}
7781
}

src/main/java/com/databricks/jdbc/common/util/InsertStatementParser.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,12 @@
1818
public class InsertStatementParser {
1919

2020
// Pattern to extract table and columns from INSERT INTO table (col1, col2, ...) VALUES format
21+
// Table name group matches dot-separated segments where each segment is either a
22+
// backtick-quoted identifier (allowing any character inside, including escaped backticks ``)
23+
// or an unquoted identifier (\w+).
2124
private static final Pattern INSERT_DETAILS_PATTERN =
2225
Pattern.compile(
23-
"^\\s*INSERT\\s+INTO\\s+([\\w`\\.]+)\\s*\\(([^)]+)\\)\\s+VALUES\\s*\\(",
26+
"^\\s*INSERT\\s+INTO\\s+((?:`(?:[^`]|``)+`|\\w+)(?:\\.(?:`(?:[^`]|``)+`|\\w+))*)\\s*\\(([^)]+)\\)\\s+VALUES\\s*\\(",
2427
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
2528

2629
/** Represents the parsed components of an INSERT statement. */

src/test/java/com/databricks/jdbc/common/util/InsertStatementParserTest.java

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,4 +284,67 @@ private String generateLargeInsert(int columnCount) {
284284

285285
return "INSERT INTO large_table (" + columns + ") VALUES (" + values + ")";
286286
}
287+
288+
@Test
289+
void testParseInsertWithHyphenatedTableName() {
290+
String sql = "INSERT INTO catalog.schema.`my-table` (id, name, value) VALUES (?, ?, ?)";
291+
InsertInfo info = InsertStatementParser.parseInsert(sql);
292+
293+
assertNotNull(info);
294+
assertEquals("catalog.schema.`my-table`", info.getTableName());
295+
assertEquals(Arrays.asList("id", "name", "value"), info.getColumns());
296+
}
297+
298+
@Test
299+
void testParseInsertWithSpacesInTableName() {
300+
String sql = "INSERT INTO `my table` (id, name) VALUES (?, ?)";
301+
InsertInfo info = InsertStatementParser.parseInsert(sql);
302+
303+
assertNotNull(info);
304+
assertEquals("`my table`", info.getTableName());
305+
assertEquals(Arrays.asList("id", "name"), info.getColumns());
306+
}
307+
308+
@Test
309+
void testParseInsertWithAllSegmentsQuoted() {
310+
String sql = "INSERT INTO `my-catalog`.`my-schema`.`my-table` (id, name) VALUES (?, ?)";
311+
InsertInfo info = InsertStatementParser.parseInsert(sql);
312+
313+
assertNotNull(info);
314+
assertEquals("`my-catalog`.`my-schema`.`my-table`", info.getTableName());
315+
assertEquals(Arrays.asList("id", "name"), info.getColumns());
316+
}
317+
318+
@Test
319+
void testParseInsertWithMixedQuotedAndUnquotedSegments() {
320+
String sql = "INSERT INTO catalog.`my-schema`.normal_table (id, name) VALUES (?, ?)";
321+
InsertInfo info = InsertStatementParser.parseInsert(sql);
322+
323+
assertNotNull(info);
324+
assertEquals("catalog.`my-schema`.normal_table", info.getTableName());
325+
assertEquals(Arrays.asList("id", "name"), info.getColumns());
326+
}
327+
328+
@Test
329+
void testGenerateMultiRowInsertWithHyphenatedTableName() throws Exception {
330+
String sql = "INSERT INTO catalog.schema.`my-table` (id, name, value) VALUES (?, ?, ?)";
331+
InsertInfo info = InsertStatementParser.parseInsert(sql);
332+
333+
assertNotNull(info);
334+
String multiRowSql = InsertStatementParser.generateMultiRowInsert(info, 3);
335+
String expected =
336+
"INSERT INTO catalog.schema.`my-table` (`id`, `name`, `value`) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?)";
337+
assertEquals(expected, multiRowSql);
338+
}
339+
340+
@Test
341+
void testParseInsertWithEscapedBackticksInTableName() {
342+
// Table names containing literal backticks use doubled backticks as escape: `my``table`
343+
String sql = "INSERT INTO catalog.schema.`my``table` (id, name) VALUES (?, ?)";
344+
InsertInfo info = InsertStatementParser.parseInsert(sql);
345+
346+
assertNotNull(info);
347+
assertEquals("catalog.schema.`my``table`", info.getTableName());
348+
assertEquals(Arrays.asList("id", "name"), info.getColumns());
349+
}
287350
}

0 commit comments

Comments
 (0)