diff --git a/cpp-ch/local-engine/Parser/aggregate_function_parser/BloomFilterAggParser.cpp b/cpp-ch/local-engine/Parser/aggregate_function_parser/BloomFilterAggParser.cpp
index 6c85bb374886..f0e7bb6c71e7 100644
--- a/cpp-ch/local-engine/Parser/aggregate_function_parser/BloomFilterAggParser.cpp
+++ b/cpp-ch/local-engine/Parser/aggregate_function_parser/BloomFilterAggParser.cpp
@@ -56,7 +56,7 @@ DB::Array AggregateFunctionParserBloomFilterAgg::parseFunctionParameters(
{
if (func_info.phase == substrait::AGGREGATION_PHASE_INITIAL_TO_INTERMEDIATE || func_info.phase == substrait::AGGREGATION_PHASE_INITIAL_TO_RESULT)
{
- auto get_parameter_field = [](const DB::ActionsDAG::Node * node, size_t /*paramter_index*/) -> DB::Field
+ auto get_parameter_field = [](const DB::ActionsDAG::Node * node, size_t /*parameter_index*/) -> DB::Field
{
Field ret;
node->column->get(0, ret);
diff --git a/docs/developers/HowTo.md b/docs/developers/HowTo.md
index 2ac2d9f44cba..147a64bc426d 100644
--- a/docs/developers/HowTo.md
+++ b/docs/developers/HowTo.md
@@ -156,7 +156,7 @@ gdb ${GLUTEN_HOME}/cpp/build/releases/libgluten.so 'core-Executor task l-2000883
Currently, we have no dedicated memory allocator implemented by jemalloc. User can set environment variable `LD_PRELOAD` for lib jemalloc
to let it override the corresponding C standard functions entirely. It may help alleviate OOM issues.
-`spark.executorEnv.LD_PREALOD=/path/to/libjemalloc.so`
+`spark.executorEnv.LD_PRELOAD=/path/to/libjemalloc.so`
# How to run TPC-H on Velox backend
diff --git a/docs/developers/UsingGperftoolsInCH.md b/docs/developers/UsingGperftoolsInCH.md
index 5a4bbea3fbbc..3923c2b6c307 100644
--- a/docs/developers/UsingGperftoolsInCH.md
+++ b/docs/developers/UsingGperftoolsInCH.md
@@ -11,7 +11,7 @@ We need using gpertools to find the memory or CPU issue. That's what this docume
Install gperftools as described in https://github.com/gperftools/gperftools.
We get the library and the command line tools.
-## Compiler libch.so
+## Compile libch.so
Disable jemalloc `-DENABLE_JEMALLOC=OFF` in cpp-ch/CMakeLists.txt, and recompile libch.so.
## Run Gluten with gperftools
diff --git a/docs/developers/UsingJemallocWithCH.md b/docs/developers/UsingJemallocWithCH.md
index 365a35dd39fe..e38cfa24b449 100644
--- a/docs/developers/UsingJemallocWithCH.md
+++ b/docs/developers/UsingJemallocWithCH.md
@@ -28,7 +28,7 @@ cd $Clickhouse_SOURCE_PATH/contrib/jemalloc && ./autogen.sh && ./configure.sh &&
```
Then we get jeprof in the directory `$Clickhouse_SOURCE_PATH/contrib/jemalloc/bin/jeprof`.
-## Compiler libch.so
+## Compile libch.so
Ensure to enable jemalloc `-DENABLE_JEMALLOC=ON` in cpp-ch/CMakeLists.txt, and compile libch.so.
## Run Gluten with jemalloc heap tools
diff --git a/docs/get-started/ClickHouse.md b/docs/get-started/ClickHouse.md
index 15c06abc0266..c0dd4002fc38 100644
--- a/docs/get-started/ClickHouse.md
+++ b/docs/get-started/ClickHouse.md
@@ -89,10 +89,10 @@ git submodule update --init --recursive
##### build
There are several ways to build the backend library.
-1. Build it direclty
+1. Build it directly
-If you have setup all requirements, you can use following command to build it direclty.
+If you have setup all requirements, you can use following command to build it directly.
```bash
cd $gluten_root
@@ -340,7 +340,7 @@ You need to add these additional configs to spark:
--config spark.hadoop.fs.s3a.access.key=YOUR_ACCESS_KEY
--config spark.hadoop.fs.s3a.secret.key=YOUR_SECRET_KEY
```
-where S3_ENDPOINT must follow the format of `https://s3.region-code.amazonaws.com`, e.g. `https://s3.us-east-1.amazonaws.com` (or `http://hostname:39090 for MINIO)
+where S3_ENDPOINT must follow the format of `https://s3.region-code.amazonaws.com`, e.g. `https://s3.us-east-1.amazonaws.com` (or `http://hostname:39090` for MINIO)
When you query the parquet files in S3, you need to add the prefix `s3a://` to the path, e.g. `s3a://your_bucket_name/path_to_your_parquet`.
diff --git a/docs/get-started/VeloxGCS.md b/docs/get-started/VeloxGCS.md
index 09e0a927cab4..77fe309a4646 100644
--- a/docs/get-started/VeloxGCS.md
+++ b/docs/get-started/VeloxGCS.md
@@ -10,7 +10,7 @@ Object stores offered by CSPs such as GCS are important for users of Gluten to s
## Installing the gcloud CLI
-To access GCS Objects using Gluten and Velox, first you have to [download an install the gcloud CLI] (https://cloud.google.com/sdk/docs/install).
+To access GCS Objects using Gluten and Velox, first you have to [download and install the gcloud CLI](https://cloud.google.com/sdk/docs/install).
## Configuring GCS using a user account
@@ -22,7 +22,7 @@ After these steps, no specific configuration is required for Gluten, since the a
## Configuring GCS using a credential file
For workloads that need to be fully automated, manually authorizing can be problematic. For such cases it is better to use a json file with the credentials.
-This is described in the [instructions to configure a service account]https://cloud.google.com/sdk/docs/authorizing#service-account.
+This is described in the [instructions to configure a service account](https://cloud.google.com/sdk/docs/authorizing#service-account).
Such json file with the credentials can be passed to Gluten:
diff --git a/docs/velox-configuration.md b/docs/velox-configuration.md
index 2202fed3d5bc..767875bb167e 100644
--- a/docs/velox-configuration.md
+++ b/docs/velox-configuration.md
@@ -15,7 +15,7 @@ nav_order: 16
| spark.gluten.sql.columnar.backend.velox.SplitPreloadPerDriver | 2 | The split preload per task |
| spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct | 90 | If partial aggregation aggregationPct greater than this value, partial aggregation may be early abandoned. Note: this option only works when flushable partial aggregation is enabled. Ignored when spark.gluten.sql.columnar.backend.velox.flushablePartialAggregation=false. |
| spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows | 100000 | If partial aggregation input rows number greater than this value, partial aggregation may be early abandoned. Note: this option only works when flushable partial aggregation is enabled. Ignored when spark.gluten.sql.columnar.backend.velox.flushablePartialAggregation=false. |
-| spark.gluten.sql.columnar.backend.velox.asyncTimeoutOnTaskStopping | 30000ms | Timeout for asynchronous execution when task is being stopped in Velox backend. It's recommended to set to a number larger than network connection timeout that the possible aysnc tasks are relying on. |
+| spark.gluten.sql.columnar.backend.velox.asyncTimeoutOnTaskStopping | 30000ms | Timeout for asynchronous execution when task is being stopped in Velox backend. It's recommended to set to a number larger than network connection timeout that the possible async tasks are relying on. |
| spark.gluten.sql.columnar.backend.velox.cacheEnabled | false | Enable Velox cache, default off. It's recommended to enablesoft-affinity as well when enable velox cache. |
| spark.gluten.sql.columnar.backend.velox.cachePrefetchMinPct | 0 | Set prefetch cache min pct for velox file scan |
| spark.gluten.sql.columnar.backend.velox.checkUsageLeak | true | Enable check memory usage leak. |
@@ -24,7 +24,7 @@ nav_order: 16
| spark.gluten.sql.columnar.backend.velox.cudf.enableValidation | true | Heuristics you can apply to validate a cuDF/GPU plan and only offload when the entire stage can be fully and profitably executed on GPU |
| spark.gluten.sql.columnar.backend.velox.cudf.memoryPercent | 50 | The initial percent of GPU memory to allocate for memory resource for one thread. |
| spark.gluten.sql.columnar.backend.velox.cudf.memoryResource | async | GPU RMM memory resource. |
-| spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes | 1028MB | Maximum bytes to prefetch in CPU memory during GPU shuffle read while waitingfor GPU available. |
+| spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes | 1028MB | Maximum bytes to prefetch in CPU memory during GPU shuffle read while waiting for GPU available. |
| spark.gluten.sql.columnar.backend.velox.directorySizeGuess | 32KB | Deprecated, rename to spark.gluten.sql.columnar.backend.velox.footerEstimatedSize |
| spark.gluten.sql.columnar.backend.velox.enableTimestampNtzValidation | true | Enable validation fallback for TimestampNTZ type. When true (default), any plan containing TimestampNTZ will fall back to Spark execution. Set to false during development/testing of TimestampNTZ support to allow native execution. |
| spark.gluten.sql.columnar.backend.velox.fileHandleCacheEnabled | false | Disables caching if false. File handle cache should be disabled if files are mutable, i.e. file content may change while file path stays the same. |
@@ -78,7 +78,7 @@ nav_order: 16
| spark.gluten.sql.enable.enhancedFeatures | true | Enable some features including iceberg native write and other features. |
| spark.gluten.sql.rewrite.castArrayToString | true | When true, rewrite `cast(array as String)` to `concat('[', array_join(array, ', ', null), ']')` to allow offloading to Velox. |
| spark.gluten.velox.broadcast.build.targetBytesPerThread | 32MB | It is used to calculate the number of hash table build threads. Based on our testing across various thresholds (1MB to 128MB), we recommend a value of 32MB or 64MB, as these consistently provided the most significant performance gains. |
-| spark.gluten.velox.castFromVarcharAddTrimNode | false | If true, will add a trim node which has the same sementic as vanilla Spark to CAST-from-varchar.Otherwise, do nothing. |
+| spark.gluten.velox.castFromVarcharAddTrimNode | false | If true, will add a trim node which has the same semantic as vanilla Spark to CAST-from-varchar.Otherwise, do nothing. |
## Gluten Velox backend *experimental* configurations
diff --git a/docs/velox-spark-configuration.md b/docs/velox-spark-configuration.md
index 6543ffd8ffe0..d1fe199b3b84 100644
--- a/docs/velox-spark-configuration.md
+++ b/docs/velox-spark-configuration.md
@@ -2,7 +2,7 @@ layout: page
title: Spark configurations status in Gluten Velox Backend
nav_order: 17
-The file lists the if Spark configurations are hornored by Gluten velox backend or not. Table is from Spark4.0 configuration page. The status are:
+The file lists the if Spark configurations are honored by Gluten velox backend or not. Table is from Spark4.0 configuration page. The status are:
- ✅ Supported
- ❌ Not Supported
- ⚠️ Partial Support