Skip to content

[BUG] Leaked Socket prevents CRaC checkpointing #1325

@jnd77

Description

@jnd77

Describe the bug

This is a follow-up to the issue #1233, and the last comment in particular.
I still encounter some issues with an opened Socket, but it's random and I didn't manage to come up with a simple reproducer.

To Reproduce

I did some investigations, and here are the findings:

  1. DatabricksClientConfiguratorManager#instances is always empty after your fix, so that's good.
  2. DatabricksHttpClientFactory#instances is not always empty; sometimes there is still a DatabricksHttpClient with type TELEMETRY.
    I suspect there is a race condition where the connection closes and calls DatabricksHttpClientFactory#removeClient(context), and afterwards a different thread calls DatabricksHttpClientFactory#getClient(context, TELEMETRY) with that connection's context.

I had to do this hack:

    try {
      final Field instancesField = DatabricksHttpClientFactory.class.getDeclaredField("instances");
      instancesField.setAccessible(true);
      final Map<SimpleEntry<String, HttpClientType>, DatabricksHttpClient> openedClients =
          (Map<SimpleEntry<String, HttpClientType>, DatabricksHttpClient>)
              instancesField.get(DatabricksHttpClientFactory.getInstance());
      openedClients
          .values()
          .forEach(
              databricksHttpClient -> {
                try {
                  databricksHttpClient.close();
                } catch (final IOException e) {
                  // Do not throw
                }
              });
    } catch (final Exception e) {
      // Do nothing and hope checkpoint can happen
    }

Expected behavior

I would expect all the sockets to be closed when all connections have been closed.

Client Environment (please complete the following information):

  • OS: [e.g. Windows] Ubuntu
  • Java version [e.g. Java 21] Java 21
  • Java vendor [e.g. OpenJDK] Azul
  • Driver Version [e.g. 3.1.1] 3.3.1

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions