Skip to content

feat: Add timestamp nanosecond primitive types#653

Open
zhjwpku wants to merge 5 commits into
apache:mainfrom
zhjwpku:iceberg-v3-timestamp_ns-timestamptz_ns_types
Open

feat: Add timestamp nanosecond primitive types#653
zhjwpku wants to merge 5 commits into
apache:mainfrom
zhjwpku:iceberg-v3-timestamp_ns-timestamptz_ns_types

Conversation

@zhjwpku
Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku commented May 17, 2026

No description provided.

@zhjwpku
Copy link
Copy Markdown
Collaborator Author

zhjwpku commented May 17, 2026

I chose TypeId::kTimestampNs over TypeId::kTimestampNano (Java uses Nano) to align with the spec. @evindj Please help review the timestamp parsing part when you have time. I changed the fractional seconds handling a bit.

@zhjwpku zhjwpku requested a review from wgtmac May 17, 2026 04:33
template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Iceberg V3 spec and the Java implementation (BucketTimestampNano.java), nanosecond timestamps must be converted to microseconds (divided by 1000) before hashing. This ensures that bucket partitioning is consistent between microsecond and nanosecond precision types for the same logical time.

return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

  private static class BucketTimestampNano extends Bucket<Long>
      implements SerializableFunction<Long, Integer> {

    private BucketTimestampNano(int numBuckets) {
      super(numBuckets);
    }

    @Override
    protected int hash(Long nanos) {
      return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
    }
  }

We can see that it also calls floorDiv inside:

  public static long nanosToMicros(long nanos) {
    return Math.floorDiv(nanos, NANOS_PER_MICRO);
  }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a TemporalUtils/temporal_util.cc, I think we can enrich that.

Comment thread src/iceberg/util/transform_util.cc Outdated
Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding timestamp_ns and timestamptz_ns! Here are a few findings based on Java parity and the Iceberg Spec.

Comment thread src/iceberg/expression/literal.cc Outdated
return Literal::Date(std::get<int32_t>(days.value()));
}
case TypeId::kTimestamp:
return source_is_nanos ? Literal::Timestamp(timestamp_val / 1000)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ integer division truncates toward zero, causing incorrect results for negative timestamps (pre-1970) not evenly divisible by 1000. Java uses Math.floorDiv. We should use a floor division helper here.

Comment thread src/iceberg/expression/literal.cc Outdated
case TypeId::kTimestampNs:
return source_is_nanos ? Literal::TimestampNs(timestamp_val)
: Literal::TimestampNs(timestamp_val * 1000);
case TypeId::kTimestampTzNs:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting from Timestamp(Ns) to TimestampTz(Ns) is allowed here, but Java (TimestampNanoLiteral.to(Type)) explicitly returns null (blocking this promotion) because timezone information is missing. Should we return NotSupported to match Java?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the logic in java impl, ISTM zoned vs non-zoned is not distinguished at the literal cast level, in java zoned vs non-zoned share the same TypeID.

.expected_string = "1684137600000000001"},
BasicLiteralTestParam{.test_name = "TimestampTzNs",
.literal = Literal::TimestampTzNs(1684137600000000001LL),
.expected_type_id = TypeId::kTimestampTzNs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cast tests for TimestampNs and TimestampTzNs (e.g., from String, and cross-casting between TimestampNs and Timestamp), especially for negative timestamps, to ensure rounding parity with Java.

Comment thread src/iceberg/type_fwd.h
Comment on lines 47 to +50
kTimestamp,
kTimestampTz,
kTimestampNs,
kTimestampTzNs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sort them as

  kTimestamp,
  kTimestampNs,
  kTimestampTz,
  kTimestampTzNs,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to strictly follow alphabetical order, but rather the ordering defined in the spec?

Comment thread src/iceberg/type_fwd.h
class TimestampBase;
class TimestampType;
class TimestampTzType;
class TimestampNsType;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread src/iceberg/expression/literal.cc Outdated
case TypeId::kTimestampTzNs:
return rhs == TypeId::kLong || rhs == TypeId::kTimestamp ||
rhs == TypeId::kTimestampTz;
rhs == TypeId::kTimestampTz || rhs == TypeId::kTimestampNs ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect to me. Should we be strict that only identical types are allowed to compare? It looks also dangerous to compare a timestamp value against a long value. Should we remove that support as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in Java, BaseLiteral.equals requires the lhs and rhs to have the same getClass(). Since zoned and non-zoned timestamps appear to share the same class, I think we should allow kTimestamp/kTimestampTz and kTimestampNs/kTimestampTzNs to be comparable.

template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original review comment had hallucination. Sorry about that.

The actual workflow is as below:

  private static class BucketTimestampNano extends Bucket<Long>
      implements SerializableFunction<Long, Integer> {

    private BucketTimestampNano(int numBuckets) {
      super(numBuckets);
    }

    @Override
    protected int hash(Long nanos) {
      return BucketUtil.hash(DateTimeUtil.nanosToMicros(nanos));
    }
  }

We can see that it also calls floorDiv inside:

  public static long nanosToMicros(long nanos) {
    return Math.floorDiv(nanos, NANOS_PER_MICRO);
  }

So my original (AI) suggestion was wrong. Please follow the same approach to use floorDiv here.

template <>
int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth adding a dedicated utility class/file for temporal types just like Java for reuse.

return quotient;
}

Result<int64_t> MultiplyExact(int64_t lhs, int64_t rhs) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also noticed that most of the utilities in transform_util.h seem to belong more naturally in a temporal_util. I'll create a separate PR to address these issues.

Comment thread src/iceberg/type_fwd.h
kTime,
kTimestamp,
kTimestampTz,
kTimestampNs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the same v3 gate as Java. Otherwise a v2 table can accept and write a v3-only schema.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment thread src/iceberg/util/transform_util.cc Outdated
return InvalidArgument("Invalid timestamptz string (missing timezone suffix): '{}'",
str);
}
return static_cast<int64_t>(days) * kNanosPerDay + time_nanos;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can overflow past the int64 nanos boundary. Please use checked arithmetic here and in the timezone offset path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants