-
Notifications
You must be signed in to change notification settings - Fork 91
feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3a84840
9574059
8dd8795
a8af40c
2015a4a
b78682f
1247bba
1400a90
91040d1
9ec648c
4097478
b1ce135
70a56d7
99ac4e1
6dde297
b22746d
708756d
2eeda5c
8bb98f8
9fc487d
64e8edf
00bd3f1
28f5b70
cccf9fe
04f3d2a
3381eb8
bea5bb6
1f68965
94b0210
22c53e0
2f0de28
d621bf3
7300d20
4bc2f58
80c41d4
9d2146b
ff06c62
cf11694
2618f13
782b87d
8107518
4d896e5
713670a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Setup local docker stack for presto + clp | ||
|
|
||
| ## Install docker | ||
|
|
||
| Follow the guide here: [docker] | ||
|
|
||
| # Launch clp-package | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Convert secondary titles to H2 and fix punctuation for Markdown lint-cleanliness The document currently contains five -# Launch clp-package
+# ## Launch clp-package
…
-# Create Docker Cluster
+# ## Create Docker Cluster
…
-# Use cli:
+# ## Use CLI
…
-# Delete docker Cluster
+# ## Delete Docker ClusterAlso applies to: 35-41, 48-57, 65-69 🧰 Tools🪛 markdownlint-cli2 (0.17.2)7-7: Multiple top-level headings in the same document (MD025, single-title, single-h1) 🤖 Prompt for AI Agents |
||
|
|
||
| 1. Find the clp-package for test on our official website [clp-json-v0.4.0]. Here is a sample dataset for demo testing: [postgresql dataset]. | ||
|
|
||
| 2. Untar the clp-package and the postgresql dataset. | ||
|
|
||
| 3. Launch: | ||
|
|
||
| ```bash | ||
| # You probably want to run a python 3.9 or newer virtual environment | ||
| sbin/start-clp.sh | ||
| ``` | ||
|
|
||
| 5. Compress: | ||
|
|
||
| ```bash | ||
| # You can also use your own dataset | ||
| sbin/compress.sh --timestamp-key 'timestamp' /path/to/postgresql.log | ||
| ``` | ||
|
|
||
| 6. Use the following command to update `.env`: | ||
|
|
||
| ```bash | ||
| scripts/set-up-config.sh /path/to/clp-json-package | ||
| ``` | ||
|
|
||
| # Create Docker Cluster | ||
|
|
||
| Create a local docker stack: | ||
|
|
||
| ```bash | ||
| docker compose up | ||
| ``` | ||
|
|
||
| To create a docker stack with more than 1 worker (e.g., 3 workers): | ||
| ``` | ||
| docker compose up --scale presto-worker=3 | ||
| ``` | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
|
|
||
| # Use cli: | ||
|
|
||
| After all containers are in "Started" states (check by `docker ps`): | ||
|
|
||
| ```bash | ||
| # On your host | ||
| docker exec -it compose-presto-coordinator-1 sh | ||
|
|
||
| # In presto-coordinator container | ||
| /opt/presto-cli --catalog clp --schema default --server localhost:8080 | ||
| ``` | ||
|
|
||
| Example query: | ||
| ```sql | ||
| SELECT * FROM default LIMIT 1; | ||
| ``` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Minor: qualify SQL fenced block for syntax highlighting Add -```sql
+```sql
SELECT * FROM default LIMIT 1;In tools/deployment/presto-clp/README.md around lines 61 to 63, the SQL code |
||
|
|
||
| # Delete docker Cluster | ||
|
|
||
| ```bash | ||
| docker compose down | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| [clp-json-v0.4.0]: https://github.com/y-scope/clp/releases/tag/v0.4.0 | ||
| [docker]: https://docs.docker.com/engine/install | ||
| [postgresql dataset]: https://zenodo.org/records/10516402 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| PRESTO_COORDINATOR_HTTPPORT="8080" | ||
| PRESTO_COORDINATOR_SERVICENAME="presto-coordinator" | ||
|
|
||
| # node.properties | ||
| PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT="production" | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # clp.properties | ||
| PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE="mysql" | ||
| PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER="mysql" | ||
|
|
||
| # config.properties | ||
| PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY="1GB" | ||
| PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE="1GB" | ||
|
|
||
| # jvm.config | ||
| PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE="4G" | ||
| PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE="32M" | ||
|
|
||
| # log.properties | ||
| PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL="DEBUG" | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| connector.name=clp | ||
| clp.metadata-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE} | ||
| clp.metadata-db-url=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL} | ||
| clp.metadata-db-name=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME} | ||
| clp.metadata-db-user=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER} | ||
| clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} | ||
| clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX} | ||
| clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER} | ||
| clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several placeholders have no matching definitions – coordinator fails to start.
Add them to 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| coordinator=true | ||
| node-scheduler.include-coordinator=false | ||
| http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} | ||
| query.max-memory=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY} | ||
| query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE} | ||
| discovery-server.enabled=true | ||
| discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT} | ||
| optimizer.optimize-hash-generation=false | ||
| regex-library=RE2J | ||
| use-alternative-function-signatures=true | ||
| inline-sql-functions=false | ||
| nested-data-serialization-enabled=false | ||
| native-execution-enabled=true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| -server | ||
| -Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE} | ||
| -XX:+UseG1GC | ||
| -XX:G1HeapRegionSize=${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE} | ||
| -XX:+UseGCOverheadLimit | ||
| -XX:+ExplicitGCInvokesConcurrent | ||
| -XX:+HeapDumpOnOutOfMemoryError | ||
| -XX:+ExitOnOutOfMemoryError | ||
| -Djdk.attach.allowAttachSelf=true |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||
| com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL} | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid quoting the log level value in Docker The extra quotes are not a valid log level and Presto will fall back to the default ( -com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}
+com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO}and remove surrounding quotes in the 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| { | ||
| } | ||
|
Comment on lines
+1
to
+2
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Provide a minimal example or document expected schema An empty JSON object is syntactically valid, but future maintainers may be unsure what keys are supported. A commented exemplar or pointer to docs beside this file would improve clarity without affecting runtime. 🤖 Prompt for AI Agents
Comment on lines
+1
to
+2
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @anlowee I think we should explain to the user how to configure this for the timestamp field in their logs, right? And also that it may need to be different for each dataset they compress.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we direct them to the related presto-doc section?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not published and is more general than they need, right? We should write a simplified section for them here. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||
| node.id=${PRESTO_COORDINATOR_SERVICENAME} |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,17 @@ | ||||||||||||||||||
| #!/bin/sh | ||||||||||||||||||
|
|
||||||||||||||||||
| # Exit on error | ||||||||||||||||||
| set -e | ||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||
|
|
||||||||||||||||||
| PRESTO_CONFIG_DIR="/opt/presto-server/etc" | ||||||||||||||||||
|
|
||||||||||||||||||
| # Substitute environemnt variables in config template | ||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||
| find /configs -type f | while read -r f; do | ||||||||||||||||||
| ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")" | ||||||||||||||||||
| done | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Security risk: Avoid shell evaluation for template processing. The current approach using shell evaluation ( Consider using -# Substitute environemnt variables in config template
-find /configs -type f | while read -r f; do
- ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
-done
+# Substitute environment variables in config template
+find /configs -type f | while read -r f; do
+ envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+done📝 Committable suggestion
Suggested change
🧰 Tools🪛 Shellcheck (0.10.0)[info] 10-10: Double quote to prevent globbing and word splitting. (SC2086) 🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| # Setup the config directory hierarchy | ||||||||||||||||||
| rm -f ${PRESTO_CONFIG_DIR}/catalog/* | ||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||
|
|
||||||||||||||||||
| # Copy over files | ||||||||||||||||||
| mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog | ||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| version: "3.9" | ||
|
|
||
| services: | ||
| presto-coordinator: | ||
| image: "ghcr.io/y-scope/presto/coordinator:dev" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider using stable image tags instead of 'dev' Using the 'dev' tag in production deployments can lead to unpredictable behavior as the image content may change without notice. Consider using specific version tags or at minimum a more stable tag: - image: "ghcr.io/y-scope/presto/coordinator:dev"
+ image: "ghcr.io/y-scope/presto/coordinator:${PRESTO_VERSION:-latest}"And similarly for the worker: - image: "ghcr.io/y-scope/presto/prestissimo-worker:dev"
+ image: "ghcr.io/y-scope/presto/prestissimo-worker:${PRESTO_VERSION:-latest}"🤖 Prompt for AI Agents |
||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "coordinator.env" | ||
| volumes: | ||
| - "./coordinator/scripts:/scripts:ro" | ||
| - "coordinator-config:/opt/presto-server/etc" | ||
| - "./coordinator/config-template:/configs:ro" | ||
| networks: | ||
| - "presto" | ||
|
|
||
| presto-worker: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should define a @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've filed issue #1504 to track adding a healthcheck to the Presto worker service. The issue has been assigned to you. |
||
| image: "ghcr.io/y-scope/presto/prestissimo-worker:dev" | ||
| depends_on: | ||
| presto-coordinator: | ||
| condition: "service_started" | ||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "worker.env" | ||
| volumes: | ||
| - "./worker/scripts:/scripts:ro" | ||
| - "worker-config:/opt/presto-server/etc" | ||
| - "./worker/config-template:/configs:ro" | ||
| - "${CLP_PACKAGE_ARCHIVES}:${CLP_PACKAGE_ARCHIVES}" | ||
| networks: | ||
| - "presto" | ||
|
|
||
| volumes: | ||
| coordinator-config: | ||
| worker-config: | ||
|
|
||
| networks: | ||
| presto: | ||
| driver: "bridge" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /.venv/ |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,120 @@ | ||||||||||||||||||||||||||||||||||||||||||||
| import argparse | ||||||||||||||||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||||
| from typing import Optional | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Set up console logging | ||||||||||||||||||||||||||||||||||||||||||||
| logging_console_handler = logging.StreamHandler() | ||||||||||||||||||||||||||||||||||||||||||||
| logging_formatter = logging.Formatter( | ||||||||||||||||||||||||||||||||||||||||||||
| "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| logging_console_handler.setFormatter(logging_formatter) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Set up root logger | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger = logging.getLogger() | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger.setLevel(logging.INFO) | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger.addHandler(logging_console_handler) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Create logger | ||||||||||||||||||||||||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+10
to
+23
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Consider using structured logging configuration The logging setup could be more maintainable using dictionary configuration or the logging.basicConfig approach. Consider simplifying the logging setup: -# Set up console logging
-logging_console_handler = logging.StreamHandler()
-logging_formatter = logging.Formatter(
- "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S"
-)
-logging_console_handler.setFormatter(logging_formatter)
-
-# Set up root logger
-root_logger = logging.getLogger()
-root_logger.setLevel(logging.INFO)
-root_logger.addHandler(logging_console_handler)
-
-# Create logger
-logger = logging.getLogger(__name__)
+# Set up logging
+logging.basicConfig(
+ level=logging.INFO,
+ format="%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s",
+ datefmt="%Y-%m-%dT%H:%M:%S"
+)
+logger = logging.getLogger(__name__)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| def main(argv=None) -> int: | ||||||||||||||||||||||||||||||||||||||||||||
| if argv is None: | ||||||||||||||||||||||||||||||||||||||||||||
| argv = sys.argv | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| args_parser = argparse.ArgumentParser( | ||||||||||||||||||||||||||||||||||||||||||||
| description="Generates an environment variables file for any user-configured properties." | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| args_parser.add_argument( | ||||||||||||||||||||||||||||||||||||||||||||
| "--clp-package-dir", help="CLP package directory.", required=True, type=Path | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| args_parser.add_argument( | ||||||||||||||||||||||||||||||||||||||||||||
| "--output-file", help="Path for the environment variables file.", required=True, type=Path | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| parsed_args = args_parser.parse_args(argv[1:]) | ||||||||||||||||||||||||||||||||||||||||||||
| clp_package_dir: Path = parsed_args.clp_package_dir.resolve() | ||||||||||||||||||||||||||||||||||||||||||||
| output_file: Path = parsed_args.output_file | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_config_file_path = clp_package_dir / "etc" / "clp-config.yml" | ||||||||||||||||||||||||||||||||||||||||||||
| with open(clp_config_file_path, "r") as clp_config_file: | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config = yaml.safe_load(clp_config_file) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| database_host = _get_config_value(clp_config, "database.host", "localhost") | ||||||||||||||||||||||||||||||||||||||||||||
| database_port = _get_config_value(clp_config, "database.port", 3306) | ||||||||||||||||||||||||||||||||||||||||||||
| database_name = _get_config_value(clp_config, "database.name", "clp-db") | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_archive_output_storage_type = _get_config_value( | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config, "archive_output.storage.type", "fs" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| if "fs" != clp_archive_output_storage_type: | ||||||||||||||||||||||||||||||||||||||||||||
| logger.error( | ||||||||||||||||||||||||||||||||||||||||||||
| "Expected CLP's archive_output.storage.type to be fs but found '%s'. Presto currently only supports" | ||||||||||||||||||||||||||||||||||||||||||||
| " reading archives from the fs storage type.", | ||||||||||||||||||||||||||||||||||||||||||||
| clp_archive_output_storage_type, | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_archives_dir = _get_config_value( | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config, | ||||||||||||||||||||||||||||||||||||||||||||
| "archive_output.storage.directory", | ||||||||||||||||||||||||||||||||||||||||||||
| str(clp_package_dir / "var" / "data" / "archives"), | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| credentials_file_path = clp_package_dir / "etc" / "credentials.yml" | ||||||||||||||||||||||||||||||||||||||||||||
| with open(credentials_file_path, "r") as credentials_file: | ||||||||||||||||||||||||||||||||||||||||||||
| credentials = yaml.safe_load(credentials_file) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| database_user = _get_config_value(credentials, "database.user") | ||||||||||||||||||||||||||||||||||||||||||||
| database_password = _get_config_value(credentials, "database.password") | ||||||||||||||||||||||||||||||||||||||||||||
| if not database_user or not database_password: | ||||||||||||||||||||||||||||||||||||||||||||
| logger.error( | ||||||||||||||||||||||||||||||||||||||||||||
| "database.user and database.password must be specified in '%s'.", credentials_file_path | ||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we tell user that to launch clp-package first? Because that will generate the credentials. |
||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| return 1 | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| with open(output_file, "w") as env_file: | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| "PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL" | ||||||||||||||||||||||||||||||||||||||||||||
| f"=jdbc:mysql://{database_host}:{database_port}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME={database_name}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER={database_user}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD={database_password}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write(f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX=clp_\n") | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write(f"CLP_PACKAGE_ARCHIVES={clp_archives_dir}\n") | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| return 0 | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| def _get_config_value(config: dict, key: str, default_value: Optional[str] = None) -> str: | ||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||
| Gets the value corresponding to `key` from `config` if it exists. | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| :param config: The config. | ||||||||||||||||||||||||||||||||||||||||||||
| :param key: The key to look for in the config, in dot notation (e.g., "database.host"). | ||||||||||||||||||||||||||||||||||||||||||||
| :param default_value: The value to return if `key` doesn't exist in `config`. | ||||||||||||||||||||||||||||||||||||||||||||
| :return: The value corresponding to `key` if it exists, otherwise `default_value`. | ||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| keys = key.split(".") | ||||||||||||||||||||||||||||||||||||||||||||
| value = config | ||||||||||||||||||||||||||||||||||||||||||||
| for k in keys: | ||||||||||||||||||||||||||||||||||||||||||||
| if isinstance(value, dict) and k in value: | ||||||||||||||||||||||||||||||||||||||||||||
| value = value[k] | ||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||
| return default_value | ||||||||||||||||||||||||||||||||||||||||||||
| return value | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| if "__main__" == __name__: | ||||||||||||||||||||||||||||||||||||||||||||
| sys.exit(main(sys.argv)) | ||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||||||
| PyYAML | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Pin dependency version to ensure deterministic builds Unversioned requirements can silently break the deployment if PyYAML ships a backwards-incompatible update. Pin to a tested version range, e.g.: -PyYAML
+PyYAML>=6.0,<7.0and revisit periodically. 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,28 @@ | ||||||||||||||||||
| #!/usr/bin/env bash | ||||||||||||||||||
|
|
||||||||||||||||||
| set -eu | ||||||||||||||||||
| set -o pipefail | ||||||||||||||||||
|
|
||||||||||||||||||
| script_dir=$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) | ||||||||||||||||||
|
|
||||||||||||||||||
| cUsage="Usage: ${BASH_SOURCE[0]} <clp-package-dir>" | ||||||||||||||||||
| if [ "$#" -lt 1 ] ; then | ||||||||||||||||||
| echo "$cUsage" | ||||||||||||||||||
| exit | ||||||||||||||||||
| fi | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Return non-zero on incorrect usage
- echo "$cUsage"
- exit
+ echo "$cUsage" >&2
+ exit 1📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| clp_package_dir=$1 | ||||||||||||||||||
|
|
||||||||||||||||||
|
Comment on lines
+13
to
+14
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Quote positional parameter when capturing it Keeps paths with spaces intact. -clp_package_dir=$1
+clp_package_dir="$1"📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| venv_dir=${script_dir}/.venv | ||||||||||||||||||
| if [ ! -d "${venv_dir}" ]; then | ||||||||||||||||||
| echo "Setting up Python venv in '${venv_dir}'..." | ||||||||||||||||||
| python3 -m venv "${script_dir}/.venv" | ||||||||||||||||||
| fi | ||||||||||||||||||
| source "${script_dir}/.venv/bin/activate" | ||||||||||||||||||
|
|
||||||||||||||||||
| echo "Installing required Python packages..." | ||||||||||||||||||
| pip3 install -r "${script_dir}/requirements.txt" | ||||||||||||||||||
|
|
||||||||||||||||||
| echo "Generating environment variables file for user-configured properties..." | ||||||||||||||||||
| python3 "${script_dir}/generate-user-env-vars-file.py" \ | ||||||||||||||||||
| --clp-package-dir "${clp_package_dir}" \ | ||||||||||||||||||
| --output-file "${script_dir}/../.env" | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the @coderabbitai create an issue to track
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 149 @junhaoliao, I've created issue #1422 to track adding the 🧠 Learnings used |
||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,4 @@ | ||||||||||||||||||
| PRESTO_WORKER_HTTPPORT="8080" | ||||||||||||||||||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||||||||||||||||||
|
|
||||||||||||||||||
| # node.properties | ||||||||||||||||||
| PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Rename variable for readability & future-proofing Using -PRESTO_WORKER_HTTPPORT="8080"
+PRESTO_WORKER_HTTP_PORT="8080"and update every reference in the generation scripts accordingly. 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| connector.name=clp |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should set @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1503 to track the |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,6 @@ | ||||||
| discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT} | ||||||
| presto.version=REPLACE_ME | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Prefer an env placeholder over hard-coded Replacing -presto.version=REPLACE_ME
+presto.version=${PRESTO_VERSION}The script can then simply export 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| http-server.http.port=${PRESTO_WORKER_HTTPPORT} | ||||||
| shutdown-onset-sec=1 | ||||||
| register-test-functions=false | ||||||
| runtime-metrics-collection-enabled=false | ||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,4 @@ | ||||||||||||||||||
| node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||||||||||||||||||
| node.internal-address=REPLACE_ME | ||||||||||||||||||
| node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | ||||||||||||||||||
| node.id=REPLACE_ME | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Environment variable name may be misleading for the worker template.
-node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
+node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT}📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| mutable-config=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Black/Ruff may recurse through large non-code trees under tools/deployment.
Running
black .fromtools/deploymentwill walk every sub-directory (configs, templates, logs). That can add noticeable time and, in worst cases, choke on extremely large data files a user may drop under that tree. Consider narrowing the path totools/deployment/**/*.pyor adding an--extend-excludefor obvious non-code directories (e.g.,config-template,scripts/generated).🤖 Prompt for AI Agents