-
Notifications
You must be signed in to change notification settings - Fork 91
feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 38 commits
3a84840
9574059
8dd8795
a8af40c
2015a4a
b78682f
1247bba
1400a90
91040d1
9ec648c
4097478
b1ce135
70a56d7
99ac4e1
6dde297
b22746d
708756d
2eeda5c
8bb98f8
9fc487d
64e8edf
00bd3f1
28f5b70
cccf9fe
04f3d2a
3381eb8
bea5bb6
1f68965
94b0210
22c53e0
2f0de28
d621bf3
7300d20
4bc2f58
80c41d4
9d2146b
ff06c62
cf11694
2618f13
782b87d
8107518
4d896e5
713670a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,13 @@ Using object storage | |
| Using CLP to ingest logs from object storage and store archives on object storage. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} | ||
| :link: guides-using-presto | ||
| Using Presto with CLP | ||
| ^^^ | ||
| How to use Presto to query compressed logs in CLP. | ||
| ::: | ||
|
Comment on lines
+15
to
+20
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Maintain link-path conventions for guide cards. Existing cards either use an 🤖 Prompt for AI Agents |
||
|
|
||
| :::{grid-item-card} | ||
| :link: guides-multi-node | ||
| Multi-node deployment | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,178 @@ | ||||||||||||||||||||||||
| # Using Presto with CLP | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| [Presto] is a distributed SQL query engine that can be used to query data stored in CLP (using SQL). | ||||||||||||||||||||||||
| This guide describes how to set up and use Presto with CLP. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| :::{warning} | ||||||||||||||||||||||||
| Currently, only the [clp-json](quick-start/clp-json.md) flavor of CLP supports queries through | ||||||||||||||||||||||||
| Presto. | ||||||||||||||||||||||||
| ::: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| :::{note} | ||||||||||||||||||||||||
| Currently, this integration with Presto is under development and may change in the future. It is | ||||||||||||||||||||||||
| also being maintained in a [fork][yscope-presto] of the Presto project. We are working on merging | ||||||||||||||||||||||||
| these changes into the main Presto repository so that you can use official Presto releases with CLP. | ||||||||||||||||||||||||
| ::: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Requirements | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * [CLP][clp-releases] (clp-json) v0.4.0 or higher | ||||||||||||||||||||||||
| * [Docker] v28 or higher | ||||||||||||||||||||||||
| * [Docker Compose][docker-compose] v2.20.2 or higher | ||||||||||||||||||||||||
| * Python | ||||||||||||||||||||||||
| * python3-venv (for the version of Python installed) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Set up | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Using Presto with CLP requires: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * [Setting up CLP](#setting-up-clp) and compressing some logs. | ||||||||||||||||||||||||
| * [Setting up Presto](#setting-up-presto) to query CLP's metadata database and archives. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Setting up CLP | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A | ||||||||||||||||||||||||
| sample dataset that works well with Presto is [postgresql]. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Setting up Presto | ||||||||||||||||||||||||
|
Comment on lines
+31
to
+37
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Improve sentence variety to enhance readability Three consecutive sentences begin with similar structures, which reduces readability. Consider varying the sentence beginnings: -Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A
-sample dataset that works well with Presto is the [postgresql] dataset.
-
-### Setting up Presto
+Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. The [postgresql] dataset serves as an excellent sample for testing with Presto.
+### Setting up Presto📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 1. Clone the CLP repository: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| git clone https://github.com/y-scope/clp.git | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 2. Navigate to the `tools/deployment/presto-clp` directory in your terminal. | ||||||||||||||||||||||||
| 3. Run the following script to generate the necessary config for Presto to work with CLP: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| scripts/set-up-config.sh <clp-json-dir> | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `<clp-json-dir>` is the location of the clp-json package you set up in the previous section. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Note that for the metadata filter config (i.e., | ||||||||||||||||||||||||
| `tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json`), it is a config | ||||||||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||||||||
| to indicate which columns are used for filtering splits that will be processed by Presto. Here | ||||||||||||||||||||||||
| is an example: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```json | ||||||||||||||||||||||||
| { | ||||||||||||||||||||||||
| "clp.default.default": [ | ||||||||||||||||||||||||
| { | ||||||||||||||||||||||||
| "columnName": "timestamp", | ||||||||||||||||||||||||
| "rangeMapping": { | ||||||||||||||||||||||||
| "lowerBound": "begin_timestamp", | ||||||||||||||||||||||||
| "upperBound": "end_timestamp" | ||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||
| "required": false | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `"clp.default.default"` is the filter's scope. A scope can be one of the following: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * A catalog name | ||||||||||||||||||||||||
| * A fully-qualified schema name | ||||||||||||||||||||||||
| * A fully-qualified table name | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Filter configs under a particular scope will apply to all child scopes. For example, filter | ||||||||||||||||||||||||
| configs at the schema level will apply to all tables within that schema. In this example, | ||||||||||||||||||||||||
| the filter will only apply to the `default` table under the `default` schema of the `clp` | ||||||||||||||||||||||||
| catalog. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `"columnName"` is the data column's name. You can use the column used as `--timestamp-key` | ||||||||||||||||||||||||
| when compressing if you want to filter splits by timestamp. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `"rangeMapping"` is an optional object with the following properties: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `"lowerBound"` is the metadata column that represents the lower bound of values in a split | ||||||||||||||||||||||||
| for the data column. | ||||||||||||||||||||||||
| * `"upperBound"` is the metadata column that represents the upper bound of values in a split | ||||||||||||||||||||||||
| for the data column. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| In this example, since in CLP's metadata database, for each split (i.e., archive) there are | ||||||||||||||||||||||||
| two fields `begin_timestamp` and `end_timestamp` to store the earilest and latest timestamps | ||||||||||||||||||||||||
| of the log messages compressed in that split, we have to remap the original data column's | ||||||||||||||||||||||||
| name to these two fields so that it can query the metadata database to retrieve filtered | ||||||||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||||||||
| splits. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `"required"` is an optional field (defaults to false) which indicates whether the filter must | ||||||||||||||||||||||||
| be present in the translated metadata filter SQL query. If a required filter is missing or | ||||||||||||||||||||||||
| cannot be pushed down, the query will be rejected. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 4. Start a Presto cluster by running: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| docker compose up | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * To use more than Presto worker, you can use the `--scale` option as follows: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
|
Comment on lines
+87
to
+89
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Grammar: insert missing word in scaling instruction - * To use more than Presto worker, you can use the `--scale` option as follows:
+ * To use more than one Presto worker, you can use the `--scale` option as follows:📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
| docker compose up --scale presto-worker=<num-workers> | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * `<num-workers>` is the number of Presto worker nodes you want to run. | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Add clarification about worker scaling The scaling instructions could benefit from explaining when multiple workers would be beneficial. Consider adding context about when to use multiple workers: - * To use more than Presto worker, you can use the `--scale` option as follows:
+ * To use multiple Presto workers (beneficial for larger datasets or improved query parallelism), you can use the `--scale` option as follows:📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Stopping the Presto cluster | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To stop the Presto cluster, use CTRL + C. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To clean up the Presto cluster entirely: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| docker compose rm | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Querying your logs through Presto | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To query your logs through Presto, you can use the Presto CLI: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| docker compose exec presto-coordinator \ | ||||||||||||||||||||||||
| presto-cli \ | ||||||||||||||||||||||||
| --catalog clp \ | ||||||||||||||||||||||||
| --schema default | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Each dataset in CLP shows up as a table in Presto. To show all available datasets: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```sql | ||||||||||||||||||||||||
| SHOW TABLES; | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored | ||||||||||||||||||||||||
| in the `default` dataset. If you also didn't specify any metadata filters, you can query the logs | ||||||||||||||||||||||||
| in this dataset: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```sql | ||||||||||||||||||||||||
| SELECT * FROM default LIMIT 1; | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs | ||||||||||||||||||||||||
| contain the field `foo.bar`, you can query it using: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```sql | ||||||||||||||||||||||||
| SELECT foo.bar FROM default LIMIT 1; | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
Comment on lines
+122
to
+134
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we also mention that this only works when the user didn't setup any "required" metadata filters
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. That should probably go in the section about configuring the metadata filter.
Comment on lines
+129
to
+134
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Define the abbreviation “kv-pairs” for clarity Readers who are new to CLP may not immediately recognise the shorthand. -All kv-pairs in each log event can be queried directly using dot-notation.
+All key-value (KV) pairs in each log event can be queried directly using dot notation.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Limitations | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| The Presto CLP integration has the following limitations at present: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| * Nested fields containing special characters (i.e., any non-alphanumeric characters except `_`; | ||||||||||||||||||||||||
| see [y-scope/presto#8]). To get around this limitation,you will need to preprocess your logs to | ||||||||||||||||||||||||
| remove such special characters. | ||||||||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||||||||
| * Only logs stored on the filesystem, rather than S3, can be queried through Presto. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| These limitations will be addressed in a future release of the Presto integration. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| [clp-releases]: https://github.com/y-scope/clp/releases | ||||||||||||||||||||||||
| [docker-compose]: https://docs.docker.com/compose/install/ | ||||||||||||||||||||||||
| [Docker]: https://docs.docker.com/engine/install/ | ||||||||||||||||||||||||
| [postgresql]: https://zenodo.org/records/10516401 | ||||||||||||||||||||||||
| [Presto]: https://prestodb.io/ | ||||||||||||||||||||||||
| [y-scope/presto#8]: https://github.com/y-scope/presto/issues/8 | ||||||||||||||||||||||||
| [yscope-presto]: https://github.com/y-scope/presto | ||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,7 +103,8 @@ tasks: | |
| components/package-template/src/etc \ | ||
| docs \ | ||
| taskfile.yaml \ | ||
| taskfiles | ||
| taskfiles \ | ||
| tools/deployment | ||
|
|
||
| check-cpp-format: | ||
| sources: &cpp_source_files | ||
|
|
@@ -772,6 +773,7 @@ tasks: | |
| - "components/clp-py-utils/clp_py_utils" | ||
| - "components/core/tools/scripts/utils" | ||
| - "components/job-orchestration/job_orchestration" | ||
| - "tools/deployment" | ||
| - "tools/scripts" | ||
|
Comment on lines
+776
to
777
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Black/Ruff may recurse through large non-code trees under tools/deployment. Running 🤖 Prompt for AI Agents |
||
| - "docs/conf" | ||
| cmd: |- | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| PRESTO_COORDINATOR_HTTPPORT=8080 | ||
| PRESTO_COORDINATOR_SERVICENAME=presto-coordinator | ||
|
|
||
| # node.properties | ||
| PRESTO_COORDINATOR_NODEPROPERTIES_ENVIRONMENT=production | ||
|
anlowee marked this conversation as resolved.
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # clp.properties | ||
| PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql | ||
| PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER=mysql | ||
|
|
||
| # config.properties | ||
| PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY=1GB | ||
| PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY_PER_NODE=1GB | ||
|
|
||
| # jvm.config | ||
| PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE=4G | ||
| PRESTO_COORDINATOR_JVMCONFIG_G1HEAPREGIONSIZE=32M | ||
|
|
||
| # log.properties | ||
| PRESTO_COORDINATOR_LOGPROPERTIES_LEVEL=INFO |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| connector.name=clp | ||
| clp.metadata-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE} | ||
| clp.metadata-db-url=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL} | ||
| clp.metadata-db-name=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_NAME} | ||
| clp.metadata-db-user=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER} | ||
| clp.metadata-db-password=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} | ||
| clp.metadata-table-prefix=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX} | ||
| clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER} | ||
| clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| coordinator=true | ||
| node-scheduler.include-coordinator=false | ||
| http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} | ||
| query.max-memory=${PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY} | ||
| query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY_PER_NODE} | ||
| discovery-server.enabled=true | ||
| discovery.uri=${PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🤖 Prompt for AI Agents |
||
| optimizer.optimize-hash-generation=false | ||
| regex-library=RE2J | ||
| use-alternative-function-signatures=true | ||
| inline-sql-functions=false | ||
| nested-data-serialization-enabled=false | ||
| native-execution-enabled=true | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| -server | ||
| -Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE} | ||
| -XX:+UseG1GC | ||
| -XX:G1HeapRegionSize=${PRESTO_COORDINATOR_JVMCONFIG_G1HEAPREGIONSIZE} | ||
| -XX:+UseGCOverheadLimit | ||
| -XX:+ExplicitGCInvokesConcurrent | ||
| -XX:+HeapDumpOnOutOfMemoryError | ||
| -XX:+ExitOnOutOfMemoryError | ||
| -Djdk.attach.allowAttachSelf=true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| com.facebook.presto=${PRESTO_COORDINATOR_LOGPROPERTIES_LEVEL} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| { | ||
| } | ||
|
Comment on lines
+1
to
+2
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Provide a minimal example or document expected schema An empty JSON object is syntactically valid, but future maintainers may be unsure what keys are supported. A commented exemplar or pointer to docs beside this file would improve clarity without affecting runtime. 🤖 Prompt for AI Agents
Comment on lines
+1
to
+2
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @anlowee I think we should explain to the user how to configure this for the timestamp field in their logs, right? And also that it may need to be different for each dataset they compress.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we direct them to the related presto-doc section?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not published and is more general than they need, right? We should write a simplified section for them here. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| node.environment=${PRESTO_COORDINATOR_NODEPROPERTIES_ENVIRONMENT} | ||
| node.id=${PRESTO_COORDINATOR_SERVICENAME} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| set -eu | ||
| set -o pipefail | ||
|
|
||
| readonly PRESTO_CONFIG_DIR="/opt/presto-server/etc" | ||
|
|
||
| # Substitute environment variables in config template | ||
| find /configs -type f | while read -r f; do | ||
| ( | ||
| echo "cat <<EOF" | ||
| cat "$f" | ||
| echo "EOF" | ||
| ) | sh >"${PRESTO_CONFIG_DIR}/$(basename "$f")" | ||
| done | ||
|
|
||
| # Remove existing catalog files that exist in the image and add the CLP catalog | ||
| rm -f "${PRESTO_CONFIG_DIR}/catalog/"* | ||
| mv "${PRESTO_CONFIG_DIR}/clp.properties" "${PRESTO_CONFIG_DIR}/catalog" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| services: | ||
| presto-coordinator: | ||
| image: "ghcr.io/y-scope/presto/coordinator:dev" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider using stable image tags instead of 'dev' Using the 'dev' tag in production deployments can lead to unpredictable behavior as the image content may change without notice. Consider using specific version tags or at minimum a more stable tag: - image: "ghcr.io/y-scope/presto/coordinator:dev"
+ image: "ghcr.io/y-scope/presto/coordinator:${PRESTO_VERSION:-latest}"And similarly for the worker: - image: "ghcr.io/y-scope/presto/prestissimo-worker:dev"
+ image: "ghcr.io/y-scope/presto/prestissimo-worker:${PRESTO_VERSION:-latest}"🤖 Prompt for AI Agents |
||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "coordinator.env" | ||
| volumes: | ||
| - "./coordinator/config-template:/configs:ro" | ||
| - "./coordinator/scripts:/scripts:ro" | ||
| - "coordinator-config:/opt/presto-server/etc" | ||
| networks: | ||
| - "presto" | ||
| healthcheck: | ||
| test: | ||
| - "CMD" | ||
| - "curl" | ||
| - "-f" | ||
| - "${PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI}/v1/info" | ||
| interval: "10s" | ||
| retries: 30 | ||
|
|
||
| presto-worker: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should define a @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've filed issue #1504 to track adding a healthcheck to the Presto worker service. The issue has been assigned to you. |
||
| image: "ghcr.io/y-scope/presto/prestissimo-worker:dev" | ||
| depends_on: | ||
| presto-coordinator: | ||
| condition: "service_healthy" | ||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "worker.env" | ||
| volumes: | ||
| - "./worker/config-template:/configs:ro" | ||
| - "./worker/scripts:/scripts:ro" | ||
| - "${CLP_ARCHIVES_DIR}:${CLP_ARCHIVES_DIR}" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainHost path mount requires documentation The Verify that the CLP_ARCHIVES_DIR environment variable is properly documented and validated: 🏁 Script executed: #!/bin/bash
# Search for documentation or validation of CLP_ARCHIVES_DIR
rg -A 5 -B 5 "CLP_ARCHIVES_DIR"Length of output: 2006 🏁 Script executed: #!/bin/bash
# Inspect environment templates for CLP_ARCHIVES_DIR entries and comments
echo "=== coordinator-common.env ==="
sed -n '1,200p' tools/deployment/presto-clp/coordinator-common.env
echo "---"
echo "=== worker.env ==="
sed -n '1,200p' tools/deployment/presto-clp/worker.envLength of output: 524 Document and validate CLP_ARCHIVES_DIR The • Add
This will make setup instructions clearer and surface any potential security issues before runtime. 🤖 Prompt for AI Agents
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Presto Docker Compose project and the CLP Package Docker Compose project (introduced in #1178) use different conventions for file mounting, which causes Presto queries to return zero rows.
Before #1178, when the components were orchestrated via Python subprocess calls to the With the @coderabbitai create an issues to track the fix for this inconsistency
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1496 to track the file mounting inconsistency between the Presto and CLP Package Docker Compose projects. The issue has been assigned to you and includes all the details you provided about the path mismatch problem and its impact on Presto queries. |
||
| - "worker-config:/opt/presto-server/etc" | ||
| networks: | ||
| - "presto" | ||
|
|
||
| volumes: | ||
| coordinator-config: | ||
| worker-config: | ||
|
|
||
| networks: | ||
| presto: | ||
| driver: "bridge" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /.venv/ |
Uh oh!
There was an error while loading. Please reload this page.