MDEV-39492 Parallel Query: Study how to create worker threads#5099
MDEV-39492 Parallel Query: Study how to create worker threads#5099mariadb-RexJohnston wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a framework for parallel query execution, introducing worker thread management, background thread creation, and an error propagation system. It adds Performance Schema instrumentation and a new system variable, parallel_worker_threads, to control the number of workers. The code review identified several critical issues, including a race condition in worker status checks, missing synchronization when accessing the global thread list, and potential double-free and memory leak scenarios during error handling. Suggestions were also made to use monotonic time for timeouts and to adopt more specific naming for global macros.
| #ifdef _WIN32 | ||
| #define MAX_THREAD_NAME 256 | ||
| #elif defined(__linux__) | ||
| #define MAX_THREAD_NAME 16 | ||
| #elif defined(__FreeBSD__) || defined(__OpenBSD__) | ||
| #define MAX_THREAD_NAME 19 | ||
| #include <pthread_np.h> | ||
| #elif defined(__apple_build_version__) | ||
| #include <sys/proc_info.h> | ||
| #define MAX_THREAD_NAME MAXTHREADNAMESIZE | ||
| #endif |
cf07846 to
96ccf39
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a framework for Parallel Worker Threads (PWT), adding the parallel_worker_threads system variable and management logic for worker threads. Feedback focuses on replacing placeholder instrumentation macros with specific Performance Schema keys for memory and synchronization. The reviewer also suggested providing more descriptive file headers, using server-consistent time functions, and correcting style violations regarding spacing in function signatures and class declarations.
| START: in lieu of work, wait 10 seconds, push out an error or a warning, | ||
| wait another 10 seconds then exit | ||
| */ | ||
| abs_timeout.tv_sec= time(0)+10; |
There was a problem hiding this comment.
96ccf39 to
30f7c41
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a framework for parallel worker threads (PWT), including the parallel_worker_threads system variable and logic for thread management, error handling, and synchronization. The review identifies several critical issues regarding thread safety, specifically the need to hold LOCK_thread_count when modifying the global server_threads list. Additionally, feedback highlights potential memory corruption from bypassing constructors when using my_malloc for objects inheriting from ilink, and suggests using more idiomatic practices such as THD::awake() for thread signaling and ensuring proper initialization of worker structures.
| *event= (pwt_queued_event*) my_malloc(key_memory_pwt_queued_event, | ||
| sizeof(pwt_queued_event), | ||
| MYF(0)); |
There was a problem hiding this comment.
The pwt_queued_event class inherits from ilink, which has a constructor that initializes its internal pointers. Using my_malloc bypasses this constructor, leaving the ilink part of the object uninitialized. This can lead to memory corruption or crashes when the object is inserted into an I_List. It is safer to use new (std::nothrow) or call the constructor via placement new.
*event= new (std::nothrow) pwt_queued_event();| on a copy of our THD pointer | ||
| */ | ||
| thd_detach_thd(save); | ||
| server_threads.erase(thd); |
There was a problem hiding this comment.
Accessing the global server_threads list requires holding the LOCK_thread_count mutex to prevent race conditions with other threads (e.g., SHOW PROCESSLIST or other thread creations/destructions).
mysql_mutex_lock(&LOCK_thread_count);
server_threads.erase(thd);
mysql_mutex_unlock(&LOCK_thread_count);| workers[i].thd->pwt_worker_info= workers+i; | ||
| workers[i].finished= workers[i].joined= false; | ||
| workers[i].killed= NOT_KILLED; | ||
| server_threads.insert(workers[i].thd); // show processlist |
| if (mysql_thread_create(key_thread_pwt, &workers[i].pthread, nullptr, | ||
| parallel_worker_thread_func, &workers[i])) | ||
| { | ||
| server_threads.erase(workers[i].thd); |
|
|
||
| mysql_mutex_lock(&LOCK_pwt_thread); | ||
| pwt_queued_event *event; | ||
| while ((event= parallel_messages.get())) |
There was a problem hiding this comment.
I_List<pwt_queued_event> parallel_messages;
template
class I_List :private base_ilist
...
inline T* get() ...
so no.
| workers= (pwt_worker *) my_malloc(key_memory_pwt_workers, | ||
| n * sizeof(pwt_worker), | ||
| MYF(0)); |
There was a problem hiding this comment.
The pwt_worker structure contains pointers and other members (like parallel_scan_job) that are not explicitly initialized. Using my_malloc without MY_ZEROFILL leaves these members with garbage values, which could cause issues if they are accessed before being set. Consider using MY_ZEROFILL to ensure a clean state.
workers= (pwt_worker *) my_malloc(key_memory_pwt_workers,
n * sizeof(pwt_worker),
MYF(MY_WME | MY_ZEROFILL));| workers[i].thd->query_string= CSET_STRING(workers[i].info, | ||
| strlen(workers[i].info), | ||
| workers[i].thd->query_charset()); |
There was a problem hiding this comment.
THD::query_string is a LEX_CSTRING, which consists of only two fields: str and length. The CSET_STRING macro is typically used for structures that include metadata (like charset) and will likely cause a compilation error here. You should assign the string and length directly.
workers[i].thd->query_string.str= workers[i].info;
workers[i].thd->query_string.length= strlen(workers[i].info);| mysql_mutex_lock(&workers[i].thd->LOCK_thd_kill); | ||
| workers[i].thd->killed= thd->killed; | ||
| mysql_mutex_unlock(&workers[i].thd->LOCK_thd_kill); |
| // process queue | ||
| mysql_mutex_lock(&LOCK_pwt_thread); | ||
| pwt_queued_event *event; | ||
| while ((event= parallel_messages.get())) |
Introduces parallel_worker_threads variable to control the number of worker threads created by a parallel execution query. 2 new files, sql_parallel_workers.h sql_parallel_workers.cc which contain structures for the creation, management and deletion of parallel worker threads (pwt_ in the name). Main management class created in the stack in JOIN::exec, implemented for the top level select. Current parallel_worker_thread_func sleeps for 10 seconds, generates a warning, signals the main thread, sleeps 10 seconds, signals the main thread again, sets it's finished flag and cleans it's THD. The main thread loops through worker threads, looking for finished thread and cleans them up if they have finished. It then waits for a signal, then processes it's message queue. The thread management data is allocated on the stack in JOIN::exec. Everything else is allocated using my_malloc() and my_free(). Threads are registed in server_threads, so are visible in information_schema.processlist and the show processlist command. We check that a kill query on a parallel worker is passed onto it's manager and the query is properly aborted, and that a kill connection is handled properly in parallel_worker.test.
30f7c41 to
37d424a
Compare
|
There is a compile failure: |
| Register our new threads in server_threads. | ||
|
|
||
| Called from the management thread for applicable queries at the top level. | ||
| */ |
There was a problem hiding this comment.
Please follow the convention and return true when there was an error, return false when everything went ok.
| workers[i].thd->pwt_worker_info= workers+i; | ||
| workers[i].finished= workers[i].joined= false; | ||
| workers[i].killed= NOT_KILLED; | ||
| server_threads.insert(workers[i].thd); // show processlist |
There was a problem hiding this comment.
We've called create_background_thd and the comment there says:
/**
Create a THD that only has auxiliary functions
It will never be added to the global connection list
server_threads. It does not represent any client connection.
Adjust that comment?
| wait_max.tv_sec= time(0)+1; // wait 1s | ||
| mysql_mutex_lock(&LOCK_pwt_manager); | ||
| thd->ENTER_COND(&COND_pwt_new_message, &LOCK_pwt_manager, | ||
| &stage_waiting_for_work_from_sql_thread, &old_stage); |
There was a problem hiding this comment.
This is Replication system's stage. This is confusing, let's add a new stage.
| ANALYZE_START_TRACKING(thd, &explain->time_tracker); | ||
| res= exec_inner(); | ||
| ANALYZE_STOP_TRACKING(thd, &explain->time_tracker); | ||
|
|
||
| if (pwt.workers) | ||
| pwt.join_parallel_workers(thd); | ||
|
|
There was a problem hiding this comment.
It turns out this is executed after the query result is sent to the client.
One can see that the query has returned the result (and so there is no way to return anything any new messages from the workers (e.g. errors) to the client anymore).
There was a problem hiding this comment.
Will provide a patch for this.
|
Please check and take this into your commit: |
|
Looking at I_S.PROCESSLIST, I can see: It's nice that one can see workers here. |
// signal manager there is something in the queue,
mysql_cond_signal(&worker->manager->COND_pwt_new_message);This looks like signaling while not owning the mutex that is used to wait on the condition? At the first glance, looks wrong. Gemini tells this when I ask it "What is the problem of using mysql_cond_signal without locking associated mutex?" What is the problem of using mysql_cond_signal wi....pdf Please clarify. |
I was trying to avoid using mutex's where not strictly required. The state transitions in the manager mean that if we loose the signal, we pick it up on the next iteration. Here is Claude's explanation.
|
Introduces parallel_worker_threads variable to control the number of worker threads created by a parallel execution query.
2 new files, sql_parallel_workers.h sql_parallel_workers.cc which contain structures for the creation, management and deletion of parallel worker threads (pwt_ in the name). Main management class created in the stack in JOIN::exec, implemented for the top level select.
Current parallel_worker_thread_func sleeps for 10 seconds, generates a warning, signals the main thread, sleeps 10 seconds, signals the main thread again, sets it's finished flag and cleans it's THD.
The main thread loops through worker threads, looking for finished thread and cleans them up if they have finished.
It then waits for a signal, then processes it's message queue.
The thread management data is allocated on the stack in JOIN::exec. Everything else is allocated using my_malloc() and my_free().
Threads are registed in server_threads, so are visible in information_schema.processlist and the show processlist command.
We check that a kill query on a parallel worker is passed onto it's manager and the query is properly aborted, and that a kill connection is handled properly in parallel_worker.test.