Skip to content

HBASE-29081: Add HBase Read Replica Cluster feature#8044

Open
anmolnar wants to merge 33 commits intomasterfrom
HBASE-29081
Open

HBASE-29081: Add HBase Read Replica Cluster feature#8044
anmolnar wants to merge 33 commits intomasterfrom
HBASE-29081

Conversation

@anmolnar
Copy link
Copy Markdown
Contributor

@anmolnar anmolnar commented Apr 8, 2026

Hi all,

We would like to propose merging the feature “Read Replica Cluster” into
the main branch.

Background

We’d like to implement the open source version of Amazon’s Read Replica
Cluster on S3
feature for Apache HBase. It adds the ability of running
another HBase cluster on the same cloud storage location in read-only mode,
allowing users to share the read workload between multiple clusters. Due
to the characteristics of the implementation and the lack of automated
synchronization between the active and read-replica clusters, read replicas
are eventually consistent, hence they’re not suitable for reading most
recent data. However we still believe that users of open source Apache HBase
could take advantage of this feature and there are use cases out there which
read replicas could help with. Please find more information about the
feature in the linked blog post.

Pros

  • Running multiple clusters in different Availability Zones adds HA to the
    entire workload,
  • No need for data movement or duplication (active-active replication setup)
    which is cost and time efficient,
  • No limit for the number of read replica clusters

Cons

  • Read Replica clusters are eventually consistent: in memory data is not
    visible from read replicas,
  • Read Replica clusters must be manually refreshed: flush on active cluster,
    refresh hfiles/meta on read replicas

A detailed description of the design and implementation can be found in the
following document.

Apache HBase Read Replica Cluster Feature

Please review and share your feedback or comments.

Best regards,
Andor

kabhishek4 and others added 30 commits April 8, 2026 09:48
* HBASE-29083: Add global read-only mode to HBase

Add hbase read-only property and  ReadOnlyController

(cherry picked from commit 49b678d)

* HBASE-29083. Allow test to update hbase:meta table

* HBASE-29083. Spotless apply

* Refactor code to have only passing tests

* Apply spotless

---------

Co-authored-by: Andor Molnar <[email protected]>
Change-Id: Ia04bb12cdaf580f26cb14d9a34b5963105065faa
* CDPD-84463 Add ruby shell commands for refresh_hfiles

* [CDPD-84466] Add hbase-client API code to refresh_hfiles

* CDPD-84465 Add protobuf messages for refresh_hfiles

* Add refreshHfile function in master rpc service and make call to its function

* CDPD-82553 Add function in Region Server to refresh Hfiles

* Add nonceGroup and nonce for the Master RPC request

* Refactor code with proper name for function

* Add region Server Procedure and callables

* Remove the refreshHFiles function which was intended to call as RS RPC

As we will be calling it through procedure framework

* Remove the unwanted comments

* Add line mistakenly removed in admin.proto

* Correct the wrong comment in Event Types

* Apply Spotless

* Address the review comments having small code changes

* Add separate function for master service caller

* Add retry mechanism for refresh_hfiles, send exception if retry threshold get breached

Also handle scenario in case the region is not online

* Add tablename into RefreshHFilesTableProcedureStateData

* CDPD-88507, CDPD-88508 Add procdure suspport for namespace as parameter and no parameter

* nit: Add meaningful name to method and remove comments

* Return exception if user is updating system table or reserved namespaces

* Send exception if tablename or namespace is invalid

Also remove redundant TODOs

* Add gatekeeper method to prevent execution because command before master initialize

* Return exception in case both TABLE_NAME and NAMESPACE is provided in arguments

* Run Spotless

* Add unit tests for refreshHfiles Procedure and admin calls

* Make the newly added HFiles available for reading immediately

* Revert "Make the newly added HFiles available for reading immediately"

This reverts commit c25cc9a.

* Address review comments

* Create test base class to avoid code duplication

* Add integration test which enable readonly mode before refresh

* Added test rule and rebased the upstream

* Apply spotless
#7325)

* HBASE-29597 Supply meta table name for replica to the tests in TestMetaTableForReplica class

* HBASE-29597 Supply meta table name for replica to the tests in TestMetaTableForReplica class
)

Change-Id: I2bca05b3f2ef4450bfcbb3b7608b829348c37bde
…filelist is n… (#7361)

* HBASE-29611: With FILE based SFT, the list of HFiles we maintain in .filelist is not getting updated for read replica

Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29611

Description:
Steps to Repro (For detailed steps, check JIRA):
- Create two clusters on the same storage location.
- Create table on active, then refresh meta on the read replica to get the table meta data updated.
- Add some rows and flush on the active cluster, do refresh_hfiles on the read replica and scan table.
- If you now again add the rows in the table on active and do refresh_hfiles then the rows added are not visible in the read replica.

Cause:
The refresh store file is a two step process:
1. Load the existing store file from the .filelist (choose the file with higher timestamp for loading)
2. refresh store file internals (clean up old/compacted files, replace store file in .filelist)

In the current scenario, what is happening is that for the first time read-replica is loading the list of Hfiles from the file in .filelist created by active cluster but then it is creating the new file with greater timestamp. Now we have two files in .filelist. On the subsequent flush from active the file in .filelist created by the active gets updated but the file created by read-replica is not. While loading in the refresh_hfiles as we take the file with higher timestamp the file created by read-replica for the first time gets loaded which does not have an updated list of hfiles.

Fix:
As we just wanted the file from active to be loaded anytime we perform refresh store files, we must not create a new file in the .filelist from the read-replica, in this way we will stop the timestamp mismatch.

NOTE:
Also we don't want to initialize the tracker file (StoreFileListFile.java:load()) from read-replica as we are not writing it hence we have added check for read only property in StoreFileTrackerBase.java:load()

* Make read only cluster to behave like secondary replica
Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29644

Description:
Consider the two cluster setup with one being active and one read replica. If active cluster create a table with FILE based SFT. If you add few rows through active and do flushes to create few Hfiles and then do refresh_meta from read replica its triggering minor compaction. Which should not happen via read replica, it may create inconsitencies because active is not aware of that event.

Cause:
This is happening because we should block the compaction event in ReadOnlyController but we missed adding read only guard to preCompactSelection() function.

Fix:
Add internalReadOnlyGuard to preCompactSelection() in ReadOnlyController
#7437)

* HBASE-29642 Active cluster file is not being updated after promoting a new active cluster

* HBASE-29642 Active cluster file is not being updated after promoting a new active cluster

* HBASE-29642 Active cluster file is not being updated after promoting a new active cluster
…y controller (#7464)

* HBASE-29693: Implement the missing observer functions in the read-only controller

* Remove setter method to set read-only configuration
…r's tables before refreshing meta and hfiles (#7474)

Signed-off-by: Tak Lon (Stephen) Wu <[email protected]>
Signed-off-by: Andor Molnár <[email protected]>
Reviewed by: Kota-SH <[email protected]>
…de (#7554)

* HBASE-29778: Abort the retry operation if not allowed in read-only mode

Currenly, if we discover that the operation is not allowed in Read-Only Mode then we are sending exception, but the context does not get aborted leading to multiple same exceptions gets thrown.
The real reason this is happening because we are sending IOException hence client retries same operation which is causing multiple similar exception.

If we abort then it can lead to RS instability or corruption and using context.bypass will lead directly go to perform operation directly instead of aborting it, hence safer is to use DoNotRetryIOException.

* Commit to rerun the job
…able (#7555)

* HBASE-29779: Call super coprocessor instead of returning for system tables

* Instead of system table just allow operation for hbase meta only

There are some other system tables such as acl or namespace which are shared with
active cluster hence allowing operation with them in readonly cluster will
make system inconsistent.

* Invert the methods name and add negation to caller

* Instead of static variable comparison use the API from TableName class

This is done do avoid any conflicts after the changes in
HBASE-29691: Change TableName.META_TABLE_NAME from being a global static
…t uses the filesystem (#7702)

Change-Id: I776f956c830a7f4671cfae265269a21fa61d0bdf
…trollers (#7661)

* HBASE-29841: Split bulky ReadOnlyController into multiple smaller controllers

Currently we have created a single ReadOnlyController which needs to get added as coprocessor for master, region and region server. In this task we will be breaking ReadOnlyController into multiple smaller controller to avoid unnecessarily adding methods which are not relevant for particular role for example, master copocessor should only register methods which may run on master and not on region or region server.

* Addres review Comments
…itialization (#7743)

* HBASE-29756: Programmatically register related co-processor during initialization

* Apply Spotless

* Remove the cached globalReadOnlyMode variable and make manageclusterIDFile static

* Address review comments

* Address review comments

* Make coprocessor addition and removal generic

* Make manageClusterIdFile Idempotent

* Address review comments

* Avoid intelliJ warning about fixed size array creation
…ix file during startup (#7881)

* HBASE-29959 Cluster started in read-only mode mistakenly deletes suffix file during startup

* Move log message to if block.

* Close file input stream

* Change the getter which does not mutate the suffix data
…riter on secondary replicas or in read-only mode (#7920)

* Remove unused variable

* HBASE-29960 java.lang.IllegalStateException: Should not call create writer on secondary replicas or in read-only mode
* HBASE-29958 Improve log messages

* Address review comments

* Update hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java

Co-authored-by: Kota-SH <[email protected]>

* Update hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java

Co-authored-by: Kota-SH <[email protected]>

* Update hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java

Co-authored-by: Kota-SH <[email protected]>

* HBASE-29961 Secondary cluster is unable to replayWAL for meta (#7854)

* Add <blank> when no suffix provided

* Address few review comments

* HBASE-29958. Refactor ActiveClusterSuffix to use protobuf, refactor logging

* HBASE-29958. Remove more redundant logic, test cleanup

* HBASE-29958. Spotless apply

* HBASE-29958. Revert mistake

* HBASE-29958 Improve log messages

* Address Kevin's review comment to address multiple : in active cluster suffix

* As getClusterSuffixFromConfig() changed we need to change the code for file deletion

* Use ActiveClusterSuffix object based comparison instead of byte Array comparison

---------

Co-authored-by: Kota-SH <[email protected]>
Co-authored-by: Andor Molnar <[email protected]>
kgeisz and others added 3 commits April 8, 2026 09:49
* HBASE-29965: Unable to dynamically change readonly flag

Change-Id: I5b5479e37921ea233f586f0f02d2606320e16139

* Refactor repeated code

Change-Id: I9a0269b786f7282686d60ceff47a538d2b0b88fa

* Add docstrings

Change-Id: I3b456e0b2689dfad09d1f5a4b47fe8fd85d06bf9
…ng in FSUtils (#8006)

* HBASE-29993. Refactor cluster id and suffix in FSUtils

* HBASE-29993. Spotless apply

* HBASE-29993. Renaming

* Fix typo

Co-authored-by: Kevin Geiszler <[email protected]>

* HBASE-29993. Spotless apply

---------

Co-authored-by: Kevin Geiszler <[email protected]>
}

protected void internalReadOnlyGuard() throws DoNotRetryIOException {
throw new DoNotRetryIOException("Operation not allowed in Read-Only Mode");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to subclass DoNotRetryIOException with something more specific to this situation, like WriteAttemptedOnReadOnlyClusterException

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. cc @sharmaar12

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Will update code accordingly.

*/
public static final TableName META_TABLE_NAME;
static {
Configuration conf = HBaseConfiguration.create();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When possible I try to avoid using HBaseConfiguration.create() because it's so inflexible about what config files/values it will read. How necessary is it to load the meta table name at static initialization time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a refactoring patch open to make the meta table name non-static. Until that I think static initialization is a must.

Comment on lines +53 to +55
} catch (IOException ioe) {
LOG.warn("Exception while trying to refresh store files: ", ioe);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you talk about the decision to swallow the error here? I'm on the fence if that is the right choice.

LOG.debug("Scanning namespace {}", namespacePath.getName());
List<Path> tableDirs = FSUtils.getLocalTableDirs(fs, namespacePath);

return tableDirs.parallelStream().flatMap(tableDir -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delegating into the common ForkJoinPool feels dicey here. I would feel safer if this was a regular stream().

CoprocessorConfigurationUtil.checkConfigurationChange(this.cpHost, newConf,
CoprocessorHost.MASTER_COPROCESSOR_CONF_KEY) && !maintenanceMode
) {
LOG.info("Update the master coprocessor(s) because the configuration has changed");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to keep this logging?

Consumer<Boolean> stateSetter, CoprocessorReloadTask reloadTask) {

boolean maybeUpdatedReadOnlyMode = ConfigurationUtil.isReadOnlyModeEnabled(newConf);
boolean hasReadOnlyModeChanged = originalIsReadOnlyEnabled != maybeUpdatedReadOnlyMode;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if this method/class must know whether the read-only mode has changed, it should track that itself, and depend on the caller to help track it. Maybe you could look at what coprocessors are currently loaded to find out whether read-only mode is enabled without explicitly tracking it.

Comment on lines +141 to +165
public void registerConfigurationObservers(ConfigurationManager configurationManager) {
Coprocessor foundCp;
Set<String> coprocessors = this.getCoprocessors();
for (String cp : coprocessors) {
foundCp = this.findCoprocessor(cp);
if (foundCp instanceof ConfigurationObserver) {
configurationManager.registerObserver((ConfigurationObserver) foundCp);
}
}
}

/**
* Deregisters relevant coprocessors from the {@link ConfigurationManager}. Coprocessors are
* considered "relevant" if they implement the {@link ConfigurationObserver} interface.
* @param configurationManager the ConfigurationManager the coprocessors get deregistered from
*/
public void deregisterConfigurationObservers(ConfigurationManager configurationManager) {
Coprocessor foundCp;
Set<String> coprocessors = this.getCoprocessors();
for (String cp : coprocessors) {
foundCp = this.findCoprocessor(cp);
if (foundCp instanceof ConfigurationObserver) {
configurationManager.deregisterObserver((ConfigurationObserver) foundCp);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like none of your coprocessors implement ConfigurationObserver. Was this meant as speculative infrastructure, or left in by accident?

CoprocessorConfigurationUtil.maybeUpdateCoprocessors(newConf, this.isGlobalReadOnlyEnabled,
this.cpHost, CoprocessorHost.MASTER_COPROCESSOR_CONF_KEY, this.maintenanceMode,
this.toString(), val -> this.isGlobalReadOnlyEnabled = val,
conf -> initializeCoprocessorHost(newConf));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the captured conf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants