diff --git a/docs/src/user-guide/guides-overview.md b/docs/src/user-guide/guides-overview.md index 02faefe2a1..20d26e0ab0 100644 --- a/docs/src/user-guide/guides-overview.md +++ b/docs/src/user-guide/guides-overview.md @@ -12,6 +12,13 @@ Using object storage Using CLP to ingest logs from object storage and store archives on object storage. ::: +:::{grid-item-card} +:link: guides-using-presto +Using Presto with CLP +^^^ +How to use Presto to query compressed logs in CLP. +::: + :::{grid-item-card} :link: guides-multi-node Multi-node deployment diff --git a/docs/src/user-guide/guides-using-presto.md b/docs/src/user-guide/guides-using-presto.md new file mode 100644 index 0000000000..59d464bc04 --- /dev/null +++ b/docs/src/user-guide/guides-using-presto.md @@ -0,0 +1,154 @@ +# Using Presto with CLP + +[Presto] is a distributed SQL query engine that can be used to query data stored in CLP (using SQL). +This guide describes how to set up and use Presto with CLP. + +:::{warning} +Currently, only the [clp-json](quick-start/clp-json.md) flavor of CLP supports queries through +Presto. +::: + +:::{note} +This integration with Presto is under development and may change in the future. It is also being +maintained in a [fork][yscope-presto] of the Presto project. At some point, these changes will have +been merged into the main Presto repository so that you can use official Presto releases with CLP. +::: + +## Requirements + +* [CLP][clp-releases] (clp-json) v0.4.0 or higher +* [Docker] v28 or higher +* [Docker Compose][docker-compose] v2.20.2 or higher +* Python +* python3-venv (for the version of Python installed) + +## Set up + +Using Presto with CLP requires: + +* [Setting up CLP](#setting-up-clp) and compressing some logs. +* [Setting up Presto](#setting-up-presto) to query CLP's metadata database and archives. + +### Setting up CLP + +Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A +sample dataset that works well with Presto is [postgresql]. + +### Setting up Presto + +1. Clone the CLP repository: + + ```bash + git clone https://github.com/y-scope/clp.git + ``` + +2. Navigate to the `tools/deployment/presto-clp` directory in your terminal. +3. Generate the necessary config for Presto to work with CLP: + + ```bash + scripts/set-up-config.sh + ``` + + * Replace `` with the location of the clp-json package you set up in the previous + section. + +4. Configure Presto to use CLP's metadata database as follows: + + * Open and edit `coordinator/config-template/metadata-filter.json`. + * For each dataset you want to query, add a filter config of the form: + + ```json + { + "clp.default.": [ + { + "columnName": "", + "rangeMapping": { + "lowerBound": "begin_timestamp", + "upperBound": "end_timestamp" + }, + "required": false + } + ] + } + ``` + + * Replace `` with the name of the dataset you want to query. (If you didn't specify a + dataset when compressing your logs, they would be compressed into the `default` dataset.) + * Replace `` with the timestamp key you specified when compressing logs for + this particular dataset. + * The complete syntax for this file is [here][clp-connector-docs]. + +5. Start a Presto cluster by running: + + ```bash + docker compose up + ``` + + * To use more than Presto worker, you can use the `--scale` option as follows: + + ```bash + docker compose up --scale presto-worker= + ``` + + * Replace `` with the number of Presto worker nodes you want to run. + +### Stopping the Presto cluster + +To stop the Presto cluster, use CTRL + C. + +To clean up the Presto cluster entirely: + +```bash +docker compose rm +``` + +## Querying your logs through Presto + +To query your logs through Presto, you can use the Presto CLI: + +```bash +docker compose exec presto-coordinator \ + presto-cli \ + --catalog clp \ + --schema default +``` + +Each dataset in CLP shows up as a table in Presto. To show all available datasets: + +```sql +SHOW TABLES; +``` + +If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored +in the `default` dataset. To query the logs in this dataset: + +```sql +SELECT * FROM default LIMIT 1; +``` + +All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs +contain the field `foo.bar`, you can query it using: + +```sql +SELECT foo.bar FROM default LIMIT 1; +``` + +## Limitations + +The Presto CLP integration has the following limitations at present: + +* Nested fields containing special characters cannot be queried (see [y-scope/presto#8]). Allowed + characters are alphanumeric characters and underscores. To get around this limitation, you'll + need to preprocess your logs to remove any special characters. +* Only logs stored on the filesystem, rather than S3, can be queried through Presto. + +These limitations will be addressed in a future release of the Presto integration. + +[clp-connector-docs]: https://docs.yscope.com/presto/connector/clp.html#metadata-filter-config-file +[clp-releases]: https://github.com/y-scope/clp/releases +[docker-compose]: https://docs.docker.com/compose/install/ +[Docker]: https://docs.docker.com/engine/install/ +[postgresql]: https://zenodo.org/records/10516401 +[Presto]: https://prestodb.io/ +[y-scope/presto#8]: https://github.com/y-scope/presto/issues/8 +[yscope-presto]: https://github.com/y-scope/presto diff --git a/docs/src/user-guide/index.md b/docs/src/user-guide/index.md index a637c8a650..4e4efb7479 100644 --- a/docs/src/user-guide/index.md +++ b/docs/src/user-guide/index.md @@ -62,6 +62,7 @@ quick-start/clp-text guides-overview guides-using-object-storage/index guides-multi-node +guides-using-presto ::: :::{toctree} diff --git a/taskfiles/lint.yaml b/taskfiles/lint.yaml index d740b610f3..61c1371169 100644 --- a/taskfiles/lint.yaml +++ b/taskfiles/lint.yaml @@ -103,7 +103,8 @@ tasks: components/package-template/src/etc \ docs \ taskfile.yaml \ - taskfiles + taskfiles \ + tools/deployment check-cpp-format: sources: &cpp_source_files @@ -772,6 +773,7 @@ tasks: - "components/clp-py-utils/clp_py_utils" - "components/core/tools/scripts/utils" - "components/job-orchestration/job_orchestration" + - "tools/deployment" - "tools/scripts" - "docs/conf" cmd: |- diff --git a/tools/deployment/presto-clp/coordinator-common.env b/tools/deployment/presto-clp/coordinator-common.env new file mode 100644 index 0000000000..db2499da1f --- /dev/null +++ b/tools/deployment/presto-clp/coordinator-common.env @@ -0,0 +1,5 @@ +PRESTO_COORDINATOR_HTTPPORT=8080 +PRESTO_COORDINATOR_SERVICENAME=presto-coordinator + +# node.properties +PRESTO_COORDINATOR_NODEPROPERTIES_ENVIRONMENT=production diff --git a/tools/deployment/presto-clp/coordinator.env b/tools/deployment/presto-clp/coordinator.env new file mode 100644 index 0000000000..c246fc7003 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator.env @@ -0,0 +1,14 @@ +# clp.properties +PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql +PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER=mysql + +# config.properties +PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY=1GB +PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY_PER_NODE=1GB + +# jvm.config +PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE=4G +PRESTO_COORDINATOR_JVMCONFIG_G1HEAPREGIONSIZE=32M + +# log.properties +PRESTO_COORDINATOR_LOGPROPERTIES_LEVEL=INFO diff --git a/tools/deployment/presto-clp/coordinator/config-template/clp.properties b/tools/deployment/presto-clp/coordinator/config-template/clp.properties new file mode 100644 index 0000000000..cefee52d39 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/clp.properties @@ -0,0 +1,9 @@ +connector.name=clp +clp.metadata-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE} +clp.metadata-db-url=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL} +clp.metadata-db-name=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_NAME} +clp.metadata-db-user=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER} +clp.metadata-db-password=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} +clp.metadata-table-prefix=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX} +clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER} +clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json diff --git a/tools/deployment/presto-clp/coordinator/config-template/config.properties b/tools/deployment/presto-clp/coordinator/config-template/config.properties new file mode 100644 index 0000000000..b9da2234f4 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/config.properties @@ -0,0 +1,13 @@ +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} +query.max-memory=${PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY} +query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIGPROPERTIES_QUERY_MAX_MEMORY_PER_NODE} +discovery-server.enabled=true +discovery.uri=${PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI} +optimizer.optimize-hash-generation=false +regex-library=RE2J +use-alternative-function-signatures=true +inline-sql-functions=false +nested-data-serialization-enabled=false +native-execution-enabled=true diff --git a/tools/deployment/presto-clp/coordinator/config-template/jvm.config b/tools/deployment/presto-clp/coordinator/config-template/jvm.config new file mode 100644 index 0000000000..7a18a0a951 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/jvm.config @@ -0,0 +1,9 @@ +-server +-Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE} +-XX:+UseG1GC +-XX:G1HeapRegionSize=${PRESTO_COORDINATOR_JVMCONFIG_G1HEAPREGIONSIZE} +-XX:+UseGCOverheadLimit +-XX:+ExplicitGCInvokesConcurrent +-XX:+HeapDumpOnOutOfMemoryError +-XX:+ExitOnOutOfMemoryError +-Djdk.attach.allowAttachSelf=true diff --git a/tools/deployment/presto-clp/coordinator/config-template/log.properties b/tools/deployment/presto-clp/coordinator/config-template/log.properties new file mode 100644 index 0000000000..7e79c774f0 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/log.properties @@ -0,0 +1 @@ +com.facebook.presto=${PRESTO_COORDINATOR_LOGPROPERTIES_LEVEL} diff --git a/tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json b/tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json new file mode 100644 index 0000000000..2c63c08510 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json @@ -0,0 +1,2 @@ +{ +} diff --git a/tools/deployment/presto-clp/coordinator/config-template/node.properties b/tools/deployment/presto-clp/coordinator/config-template/node.properties new file mode 100644 index 0000000000..dfde76b128 --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/config-template/node.properties @@ -0,0 +1,2 @@ +node.environment=${PRESTO_COORDINATOR_NODEPROPERTIES_ENVIRONMENT} +node.id=${PRESTO_COORDINATOR_SERVICENAME} diff --git a/tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh b/tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh new file mode 100755 index 0000000000..511881a22a --- /dev/null +++ b/tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +set -eu +set -o pipefail + +readonly PRESTO_CONFIG_DIR="/opt/presto-server/etc" + +# Substitute environment variables in config template +find /configs -type f | while read -r f; do + ( + echo "cat <"${PRESTO_CONFIG_DIR}/$(basename "$f")" +done + +# Remove existing catalog files that exist in the image and add the CLP catalog +rm -f "${PRESTO_CONFIG_DIR}/catalog/"* +mv "${PRESTO_CONFIG_DIR}/clp.properties" "${PRESTO_CONFIG_DIR}/catalog" diff --git a/tools/deployment/presto-clp/docker-compose.yaml b/tools/deployment/presto-clp/docker-compose.yaml new file mode 100644 index 0000000000..0051ce4fbf --- /dev/null +++ b/tools/deployment/presto-clp/docker-compose.yaml @@ -0,0 +1,48 @@ +services: + presto-coordinator: + image: "ghcr.io/y-scope/presto/coordinator:dev" + entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] + env_file: + - ".env" + - "coordinator-common.env" + - "coordinator.env" + volumes: + - "./coordinator/config-template:/configs:ro" + - "./coordinator/scripts:/scripts:ro" + - "coordinator-config:/opt/presto-server/etc" + networks: + - "presto" + healthcheck: + test: + - "CMD" + - "curl" + - "-f" + - "${PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI}/v1/info" + interval: "10s" + retries: 30 + + presto-worker: + image: "ghcr.io/y-scope/presto/prestissimo-worker:dev" + depends_on: + presto-coordinator: + condition: "service_healthy" + entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] + env_file: + - ".env" + - "coordinator-common.env" + - "worker.env" + volumes: + - "./worker/config-template:/configs:ro" + - "./worker/scripts:/scripts:ro" + - "${CLP_ARCHIVES_DIR}:${CLP_ARCHIVES_DIR}" + - "worker-config:/opt/presto-server/etc" + networks: + - "presto" + +volumes: + coordinator-config: + worker-config: + +networks: + presto: + driver: "bridge" diff --git a/tools/deployment/presto-clp/scripts/.gitignore b/tools/deployment/presto-clp/scripts/.gitignore new file mode 100644 index 0000000000..ef81b1e243 --- /dev/null +++ b/tools/deployment/presto-clp/scripts/.gitignore @@ -0,0 +1 @@ +/.venv/ diff --git a/tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py b/tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py new file mode 100644 index 0000000000..4e4a8ac046 --- /dev/null +++ b/tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py @@ -0,0 +1,184 @@ +import argparse +import logging +import sys +from pathlib import Path +from typing import Dict, Optional + +import yaml +from dotenv import dotenv_values + +# Set up console logging +logging_console_handler = logging.StreamHandler() +logging_formatter = logging.Formatter( + "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" +) +logging_console_handler.setFormatter(logging_formatter) + +# Set up root logger +root_logger = logging.getLogger() +root_logger.setLevel(logging.INFO) +root_logger.addHandler(logging_console_handler) + +# Create logger +logger = logging.getLogger(__name__) + + +def main(argv=None) -> int: + if argv is None: + argv = sys.argv + + args_parser = argparse.ArgumentParser( + description="Generates an environment variables file for any user-configured properties." + ) + args_parser.add_argument( + "--clp-package-dir", help="CLP package directory.", required=True, type=Path + ) + args_parser.add_argument( + "--output-file", help="Path for the environment variables file.", required=True, type=Path + ) + + parsed_args = args_parser.parse_args(argv[1:]) + clp_package_dir: Path = parsed_args.clp_package_dir.resolve() + output_file: Path = parsed_args.output_file + + env_vars: Dict[str, str] = {} + if not _add_clp_env_vars(clp_package_dir, env_vars): + return 1 + + script_dir = Path(__file__).parent.resolve() + if not _add_worker_env_vars(script_dir.parent / "coordinator-common.env", env_vars): + return 1 + + with open(output_file, "w") as output_file_handle: + for key, value in env_vars.items(): + output_file_handle.write(f"{key}={value}\n") + + return 0 + + +def _add_clp_env_vars(clp_package_dir: Path, env_vars: Dict[str, str]) -> bool: + """ + Adds environment variables for CLP config values to `env_vars`. + + :param clp_package_dir: + :param env_vars: + :return: Whether the environment variables were successfully added. + """ + env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX"] = "clp_" + + clp_config_file_path = clp_package_dir / "etc" / "clp-config.yml" + if not clp_config_file_path.exists(): + logger.error( + "'%s' doesn't exist. Is '%s' the location of the CLP package?", + clp_config_file_path, + clp_package_dir.resolve(), + ) + return False + + with open(clp_config_file_path, "r") as clp_config_file: + clp_config = yaml.safe_load(clp_config_file) + + database_type = _get_config_value(clp_config, "database.type", "mariadb") + if "mariadb" != database_type and "mysql" != database_type: + logger.error( + "CLP's database.type must be either mariadb or mysql but found '%s'. Presto" + " currently only supports reading metadata from a mariadb or mysql database.", + database_type, + ) + return False + + database_host = _get_config_value(clp_config, "database.host", "localhost") + database_port = _get_config_value(clp_config, "database.port", str(3306)) + database_name = _get_config_value(clp_config, "database.name", "clp-db") + env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_URL"] = ( + f"jdbc:mysql://{database_host}:{database_port}" + ) + env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_NAME"] = database_name + + clp_archive_output_storage_type = _get_config_value( + clp_config, "archive_output.storage.type", "fs" + ) + if "fs" != clp_archive_output_storage_type: + logger.error( + "Expected CLP's archive_output.storage.type to be fs but found '%s'. Presto" + " currently only supports reading archives from the fs storage type.", + clp_archive_output_storage_type, + ) + return False + + clp_archives_dir = _get_config_value( + clp_config, + "archive_output.storage.directory", + str(clp_package_dir / "var" / "data" / "archives"), + ) + env_vars["CLP_ARCHIVES_DIR"] = clp_archives_dir + + credentials_file_path = clp_package_dir / "etc" / "credentials.yml" + if not credentials_file_path.exists(): + logger.error("'%s' doesn't exist. Did you start CLP?", credentials_file_path) + return False + + with open(credentials_file_path, "r") as credentials_file: + credentials = yaml.safe_load(credentials_file) + + database_user = _get_config_value(credentials, "database.user") + database_password = _get_config_value(credentials, "database.password") + if not database_user or not database_password: + logger.error( + "database.user and database.password must be specified in '%s'.", credentials_file_path + ) + return False + env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER"] = database_user + env_vars["PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD"] = database_password + + return True + + +def _add_worker_env_vars(coordinator_common_env_file_path: Path, env_vars: Dict[str, str]) -> bool: + """ + Adds environment variables for worker config values to `env_vars`. + + :param coordinator_common_env_file_path: + :param env_vars: + :return: Whether the environment variables were successfully added. + """ + config = dotenv_values(coordinator_common_env_file_path) + + try: + env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = ( + f'http://{config["PRESTO_COORDINATOR_SERVICENAME"]}' + f':{config["PRESTO_COORDINATOR_HTTPPORT"]}' + ) + except KeyError as e: + logger.error( + "Missing required key '%s' in '%s'", + e, + coordinator_common_env_file_path, + ) + return False + + return True + + +def _get_config_value(config: dict, key: str, default_value: Optional[str] = None) -> str: + """ + Gets the value corresponding to `key` from `config` if it exists. + + :param config: The config. + :param key: The key to look for in the config, in dot notation (e.g., "database.host"). + :param default_value: The value to return if `key` doesn't exist in `config`. + :return: The value corresponding to `key` if it exists, otherwise `default_value`. + """ + + keys = key.split(".") + value = config + for k in keys: + if isinstance(value, dict) and k in value: + value = value[k] + else: + return default_value + return value + + +if "__main__" == __name__: + sys.exit(main(sys.argv)) diff --git a/tools/deployment/presto-clp/scripts/requirements.txt b/tools/deployment/presto-clp/scripts/requirements.txt new file mode 100644 index 0000000000..09eeec3cc9 --- /dev/null +++ b/tools/deployment/presto-clp/scripts/requirements.txt @@ -0,0 +1,2 @@ +python-dotenv +PyYAML diff --git a/tools/deployment/presto-clp/scripts/set-up-config.sh b/tools/deployment/presto-clp/scripts/set-up-config.sh new file mode 100755 index 0000000000..6c6fb8cced --- /dev/null +++ b/tools/deployment/presto-clp/scripts/set-up-config.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash + +set -eu +set -o pipefail + +script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd) + +cUsage="Usage: ${BASH_SOURCE[0]} " +if [ "$#" -lt 1 ]; then + echo "$cUsage" + exit +fi +clp_package_dir=$1 + +venv_dir=${script_dir}/.venv +if [ ! -d "${venv_dir}" ]; then + echo "Setting up Python venv in '${venv_dir}'..." + python3 -m venv "${script_dir}/.venv" +fi +source "${script_dir}/.venv/bin/activate" + +echo "Installing required Python packages..." +pip3 install -r "${script_dir}/requirements.txt" + +echo "Generating environment variables file for user-configured properties..." +python3 "${script_dir}/generate-user-env-vars-file.py" \ + --clp-package-dir "${clp_package_dir}" \ + --output-file "${script_dir}/../.env" diff --git a/tools/deployment/presto-clp/worker.env b/tools/deployment/presto-clp/worker.env new file mode 100644 index 0000000000..33e4e178f9 --- /dev/null +++ b/tools/deployment/presto-clp/worker.env @@ -0,0 +1,4 @@ +PRESTO_WORKER_HTTP_PORT=8080 + +# node.properties +PRESTO_WORKER_NODEPROPERTIES_LOCATION=worker-location diff --git a/tools/deployment/presto-clp/worker/config-template/clp.properties b/tools/deployment/presto-clp/worker/config-template/clp.properties new file mode 100644 index 0000000000..4ec6f9a4c7 --- /dev/null +++ b/tools/deployment/presto-clp/worker/config-template/clp.properties @@ -0,0 +1 @@ +connector.name=clp diff --git a/tools/deployment/presto-clp/worker/config-template/config.properties b/tools/deployment/presto-clp/worker/config-template/config.properties new file mode 100644 index 0000000000..ca89892d4e --- /dev/null +++ b/tools/deployment/presto-clp/worker/config-template/config.properties @@ -0,0 +1,5 @@ +discovery.uri=${PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI} +http-server.http.port=${PRESTO_WORKER_HTTP_PORT} +shutdown-onset-sec=1 +register-test-functions=false +runtime-metrics-collection-enabled=false diff --git a/tools/deployment/presto-clp/worker/config-template/node.properties b/tools/deployment/presto-clp/worker/config-template/node.properties new file mode 100644 index 0000000000..8b0fef4466 --- /dev/null +++ b/tools/deployment/presto-clp/worker/config-template/node.properties @@ -0,0 +1,2 @@ +node.environment=${PRESTO_COORDINATOR_NODEPROPERTIES_ENVIRONMENT} +node.location=${PRESTO_WORKER_NODEPROPERTIES_LOCATION} diff --git a/tools/deployment/presto-clp/worker/config-template/velox.properties b/tools/deployment/presto-clp/worker/config-template/velox.properties new file mode 100644 index 0000000000..8298bf6790 --- /dev/null +++ b/tools/deployment/presto-clp/worker/config-template/velox.properties @@ -0,0 +1 @@ +mutable-config=true diff --git a/tools/deployment/presto-clp/worker/scripts/generate-configs.sh b/tools/deployment/presto-clp/worker/scripts/generate-configs.sh new file mode 100755 index 0000000000..c04ae86182 --- /dev/null +++ b/tools/deployment/presto-clp/worker/scripts/generate-configs.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash + +set -eu +set -o pipefail + +# Emits a log event to stderr with an auto-generated ISO timestamp as well as the given level +# and message. +# +# @param $1: Level string +# @param $2: Message to be logged +log() { + local -r LEVEL=$1 + local -r MESSAGE=$2 + echo "$(date --utc --date="now" +"%Y-%m-%dT%H:%M:%SZ") [${LEVEL}] ${MESSAGE}" >&2 +} + +# Gets the Presto coordinator's version or exits on failure. +# +# @param $1 Path to the config.properties file. +# @return The Presto version. +get_coordinator_version() { + local config_properties_file=$1 + + local discovery_uri + discovery_uri=$(awk -F "=" '/^discovery.uri=/ {print $2}' "$config_properties_file") + if response=$( + wget --quiet --output-document - --timeout 10 "${discovery_uri}/v1/info" 2>/dev/null + ); then + version=$(echo "$response" | jq --raw-output '.nodeVersion.version') + if [[ "$version" = "null" ]]; then + log "ERROR" "Presto response is empty or doesn't contain version info." + exit 1 + fi + else + log "ERROR" "Couldn't get Presto version info." + exit 1 + fi + + echo "$version" +} + +# Sets/updates the given kv-pair in the given properties file. +# +# @param $1 Path to the properties file. +# @param $2 The key to set. +# @param $3 The value to set. +update_config_file() { + local file_path=$1 + local key=$2 + local value=$3 + + if grep --quiet "^${key}=.*$" "$file_path"; then + sed --in-place "s|^${key}=.*|${key}=${value}|" "$file_path" + else + echo "${key}=${value}" >>"$file_path" + fi + log "INFO" "Set ${key}=${value} in ${file_path}" +} + +apt-get update && apt-get install --assume-yes --no-install-recommends jq wget + +readonly PRESTO_CONFIG_DIR="/opt/presto-server/etc" + +# Substitute environment variables in config template +find /configs -type f | while read -r f; do + ( + echo "cat <"${PRESTO_CONFIG_DIR}/$(basename "$f")" +done + +# Remove existing catalog files that exist in the image and add the CLP catalog +rm -f "${PRESTO_CONFIG_DIR}/catalog/"* +mv "${PRESTO_CONFIG_DIR}/clp.properties" "${PRESTO_CONFIG_DIR}/catalog" + +# Update config.properties +readonly CONFIG_PROPERTIES_FILE="/opt/presto-server/etc/config.properties" +version=$(get_coordinator_version "$CONFIG_PROPERTIES_FILE") +log "INFO" "Detected Presto version: $version" +update_config_file "$CONFIG_PROPERTIES_FILE" "presto.version" "$version" + +# Update node.properties +readonly NODE_PROPERTIES_FILE="/opt/presto-server/etc/node.properties" +update_config_file "$NODE_PROPERTIES_FILE" "node.internal-address" "$(hostname -i)" +update_config_file "$NODE_PROPERTIES_FILE" "node.id" "$(hostname)"