-
Notifications
You must be signed in to change notification settings - Fork 27
Pluggable ingestion serializers with json_ingestion #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../spec_support/subdir_dot_rspec |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../config/site/yardopts |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../Gemfile |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| The MIT License (MIT) | ||
|
|
||
| Copyright (c) 2024 - 2026 Block, Inc. | ||
|
|
||
| Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| of this software and associated documentation files (the "Software"), to deal | ||
| in the Software without restriction, including without limitation the rights | ||
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| copies of the Software, and to permit persons to whom the Software is | ||
| furnished to do so, subject to the following conditions: | ||
|
|
||
| The above copyright notice and this permission notice shall be included in | ||
| all copies or substantial portions of the Software. | ||
|
|
||
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
| THE SOFTWARE. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # ElasticGraph::JSONIngestion | ||
|
|
||
| Pluggable JSON Schema ingestion serializer for ElasticGraph. | ||
|
|
||
| This gem extracts the JSON Schema generation and validation logic from ElasticGraph's core into a | ||
| pluggable extension, following the same pattern as `elasticgraph-warehouse` and `elasticgraph-apollo`. | ||
| This is the first step toward supporting alternative ingestion serializers (e.g., Protocol Buffers). | ||
|
|
||
| Higher-level schema-definition entry points use the JSON Schema serializer by default for backward | ||
| compatibility, so existing users do not need configuration changes. | ||
|
|
||
| ## Dependency Diagram | ||
|
|
||
| ```mermaid | ||
| graph LR; | ||
| classDef targetGemStyle fill:#FADBD8,stroke:#EC7063,color:#000,stroke-width:2px; | ||
| classDef otherEgGemStyle fill:#A9DFBF,stroke:#2ECC71,color:#000; | ||
| classDef externalGemStyle fill:#E0EFFF,stroke:#70A1D7,color:#2980B9; | ||
| elasticgraph-json_ingestion["elasticgraph-json_ingestion"]; | ||
| class elasticgraph-json_ingestion targetGemStyle; | ||
| elasticgraph-support["elasticgraph-support"]; | ||
| elasticgraph-json_ingestion --> elasticgraph-support; | ||
| class elasticgraph-support otherEgGemStyle; | ||
| elasticgraph-schema_definition["elasticgraph-schema_definition"]; | ||
| elasticgraph-schema_definition --> elasticgraph-json_ingestion; | ||
| class elasticgraph-schema_definition otherEgGemStyle; | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # Copyright 2024 - 2026 Block, Inc. | ||
| # | ||
| # Use of this source code is governed by an MIT-style | ||
| # license that can be found in the LICENSE file or at | ||
| # https://opensource.org/licenses/MIT. | ||
| # | ||
| # frozen_string_literal: true | ||
|
|
||
| require_relative "../elasticgraph-support/lib/elastic_graph/version" | ||
|
|
||
| Gem::Specification.new do |spec| | ||
| spec.name = "elasticgraph-json_ingestion" | ||
| spec.version = ElasticGraph::VERSION | ||
| spec.authors = ["Josh Wilson", "Myron Marston", "Block Engineering"] | ||
| spec.email = ["joshuaw@squareup.com"] | ||
| spec.homepage = "https://block.github.io/elasticgraph/" | ||
| spec.license = "MIT" | ||
| spec.summary = "Pluggable JSON Schema ingestion serializer for ElasticGraph." | ||
|
|
||
| spec.metadata = { | ||
| "bug_tracker_uri" => "https://github.com/block/elasticgraph/issues", | ||
| "changelog_uri" => "https://github.com/block/elasticgraph/releases/tag/v#{ElasticGraph::VERSION}", | ||
| "documentation_uri" => "https://block.github.io/elasticgraph/api-docs/v#{ElasticGraph::VERSION}/", | ||
| "homepage_uri" => "https://block.github.io/elasticgraph/", | ||
| "source_code_uri" => "https://github.com/block/elasticgraph/tree/v#{ElasticGraph::VERSION}/#{spec.name}", | ||
| "gem_category" => "extension" | ||
| } | ||
|
|
||
| spec.files = Dir.chdir(File.expand_path(__dir__)) do | ||
| `git ls-files -z`.split("\x0").reject do |f| | ||
| (f == __FILE__) || f.match(%r{\A(?:(?:test|spec|features|sig)/|\.(?:git|travis|circleci)|appveyor)}) | ||
| end - [".rspec", "Gemfile", ".yardopts"] | ||
| end | ||
|
|
||
| spec.required_ruby_version = [">= 3.4", "< 4.1"] | ||
|
|
||
| # This extension is loaded by `elasticgraph-schema_definition` at schema-definition time, so we intentionally | ||
| # avoid a runtime dependency here to keep the dependency direction acyclic across gems. | ||
| spec.add_development_dependency "elasticgraph-schema_definition", ElasticGraph::VERSION | ||
| spec.add_dependency "elasticgraph-support", ElasticGraph::VERSION | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # Copyright 2024 - 2026 Block, Inc. | ||
| # | ||
| # Use of this source code is governed by an MIT-style | ||
| # license that can be found in the LICENSE file or at | ||
| # https://opensource.org/licenses/MIT. | ||
| # | ||
| # frozen_string_literal: true | ||
|
|
||
| module ElasticGraph | ||
| # Pluggable JSON Schema ingestion serializer for ElasticGraph. | ||
| # | ||
| # This gem extracts the JSON Schema generation and validation logic from ElasticGraph's | ||
| # core into a pluggable extension, following the same pattern as `elasticgraph-warehouse` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This PR does the extraction. The gem merely contains what was extracted. And I think it'll be confusing in the future for readers who aren't aware of the development history (e.g. that the JSON schema support started within core EG and was moved into an extension). This doesn't need to comment on the extraction--it can just say this gem is where JSON ingestion support lives. |
||
| # and `elasticgraph-apollo`. This is the first step toward supporting alternative ingestion | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likewise, it'll be confusing for the docs here to talk about this gem as a "first step". This PR is the first step, not the gem itself. |
||
| # serializers (e.g., Protocol Buffers). Higher-level schema-definition entry points use it by | ||
| # default for backward compatibility. | ||
| module JSONIngestion | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,93 @@ | ||||||||||||||||||
| # Copyright 2024 - 2026 Block, Inc. | ||||||||||||||||||
| # | ||||||||||||||||||
| # Use of this source code is governed by an MIT-style | ||||||||||||||||||
| # license that can be found in the LICENSE file or at | ||||||||||||||||||
| # https://opensource.org/licenses/MIT. | ||||||||||||||||||
| # | ||||||||||||||||||
| # frozen_string_literal: true | ||||||||||||||||||
|
|
||||||||||||||||||
| require "elastic_graph/constants" | ||||||||||||||||||
| require "elastic_graph/json_ingestion/schema_definition/factory_extension" | ||||||||||||||||||
|
|
||||||||||||||||||
| module ElasticGraph | ||||||||||||||||||
| module JSONIngestion | ||||||||||||||||||
| # Namespace for all JSON Schema schema definition support. | ||||||||||||||||||
| # | ||||||||||||||||||
| # {SchemaDefinition::APIExtension} is the primary entry point and should be used as a schema definition extension module. | ||||||||||||||||||
| module SchemaDefinition | ||||||||||||||||||
| # Module designed to be extended onto an {ElasticGraph::SchemaDefinition::API} instance | ||||||||||||||||||
| # to add JSON Schema ingestion serializer capabilities. Higher-level schema-definition | ||||||||||||||||||
| # entry points use it by default for backward compatibility, but it can also be explicitly passed in | ||||||||||||||||||
| # `schema_definition_ingestion_serializer_extension_modules` when defining your {ElasticGraph::Local::RakeTasks}. | ||||||||||||||||||
| module APIExtension | ||||||||||||||||||
| # Wires up the factory extension when this module is extended onto an API instance. | ||||||||||||||||||
| # | ||||||||||||||||||
| # @param api [ElasticGraph::SchemaDefinition::API] the API instance to extend | ||||||||||||||||||
| # @return [void] | ||||||||||||||||||
| # @api private | ||||||||||||||||||
| def self.extended(api) | ||||||||||||||||||
| api.instance_variable_get(:@state).ingestion_serializer_state.tap do |state| | ||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You shouldn't have to do elasticgraph/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/api.rb Lines 52 to 53 in 14cb7cf
|
||||||||||||||||||
| state[:allow_omitted_json_schema_fields] = false unless state.key?(:allow_omitted_json_schema_fields) | ||||||||||||||||||
| state[:allow_extra_json_schema_fields] = true unless state.key?(:allow_extra_json_schema_fields) | ||||||||||||||||||
| state[:reserved_type_names] = (state[:reserved_type_names] || ::Set.new).merge([EVENT_ENVELOPE_JSON_SCHEMA_NAME]) | ||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In other extensions, we've directly added new fields to elasticgraph/elasticgraph-apollo/lib/elastic_graph/apollo/schema_definition/state_extension.rb Lines 15 to 22 in 14cb7cf
I prefer that pattern, because it's more "strongly typed" than just having a hash. e.g. with a hash if you do |
||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| api.factory.extend FactoryExtension | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| # Defines the version number of the current JSON schema. Importantly, every time a change is made that impacts the JSON schema | ||||||||||||||||||
| # artifact, the version number must be incremented to ensure that each different version of the JSON schema is identified by a unique | ||||||||||||||||||
| # version number. The publisher will then include this version number in published events to identify the version of the schema it | ||||||||||||||||||
| # was using. This avoids the need to deploy the publisher and ElasticGraph indexer at the same time to keep them in sync. | ||||||||||||||||||
| # | ||||||||||||||||||
| # @note While this is an important part of how ElasticGraph is designed to support schema evolution, it can be annoying constantly | ||||||||||||||||||
| # have to increment this while rapidly changing the schema during prototyping. You can disable the requirement to increment this | ||||||||||||||||||
| # on every JSON schema change by setting `enforce_json_schema_version` to `false` in your `Rakefile`. | ||||||||||||||||||
| # | ||||||||||||||||||
| # @param version [Integer] current version number of the JSON schema artifact | ||||||||||||||||||
| # @return [void] | ||||||||||||||||||
| # @see Local::RakeTasks#enforce_json_schema_version | ||||||||||||||||||
| def json_schema_version(version) | ||||||||||||||||||
| if !version.is_a?(Integer) || version < 1 | ||||||||||||||||||
| raise Errors::SchemaError, "`json_schema_version` must be a positive integer. Specified version: #{version}" | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| if @state.ingestion_serializer_state[:json_schema_version] | ||||||||||||||||||
| raise Errors::SchemaError, "`json_schema_version` can only be set once on a schema. Previously-set version: #{@state.ingestion_serializer_state[:json_schema_version]}" | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| @state.ingestion_serializer_state[:json_schema_version] = version | ||||||||||||||||||
| @state.ingestion_serializer_state[:json_schema_version_setter_location] = caller_locations(1, 1).to_a.first | ||||||||||||||||||
| nil | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| # Defines strictness of the JSON schema validation. By default, the JSON schema will require all fields to be provided by the | ||||||||||||||||||
| # publisher (but they can be nullable) and will ignore extra fields that are not defined in the schema. Use this method to | ||||||||||||||||||
| # configure this behavior. | ||||||||||||||||||
| # | ||||||||||||||||||
| # @param allow_omitted_fields [bool] Whether nullable fields can be omitted from indexing events. | ||||||||||||||||||
| # @param allow_extra_fields [bool] Whether extra fields (e.g. beyond fields defined in the schema) can be included in indexing events. | ||||||||||||||||||
| # @return [void] | ||||||||||||||||||
| # | ||||||||||||||||||
| # @note If you allow both omitted fields and extra fields, ElasticGraph's JSON schema validation will allow (and ignore) misspelled | ||||||||||||||||||
| # field names in indexing events. For example, if the ElasticGraph schema has a nullable field named `parentId` but the publisher | ||||||||||||||||||
| # accidentally provides it as `parent_id`, ElasticGraph would happily ignore the `parent_id` field entirely, because `parentId` | ||||||||||||||||||
| # is allowed to be omitted and `parent_id` would be treated as an extra field. Therefore, we recommend that you only set one of | ||||||||||||||||||
| # these to `true` (or none). | ||||||||||||||||||
| def json_schema_strictness(allow_omitted_fields: false, allow_extra_fields: true) | ||||||||||||||||||
| unless [true, false].include?(allow_omitted_fields) | ||||||||||||||||||
| raise Errors::SchemaError, "`allow_omitted_fields` must be true or false" | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| unless [true, false].include?(allow_extra_fields) | ||||||||||||||||||
| raise Errors::SchemaError, "`allow_extra_fields` must be true or false" | ||||||||||||||||||
| end | ||||||||||||||||||
|
|
||||||||||||||||||
| @state.ingestion_serializer_state[:allow_omitted_json_schema_fields] = allow_omitted_fields | ||||||||||||||||||
| @state.ingestion_serializer_state[:allow_extra_json_schema_fields] = allow_extra_fields | ||||||||||||||||||
| nil | ||||||||||||||||||
| end | ||||||||||||||||||
| end | ||||||||||||||||||
| end | ||||||||||||||||||
| end | ||||||||||||||||||
| end | ||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.