Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17531. RBF: Aynchronous router RPC. #7308

Open
wants to merge 16 commits into
base: trunk
Choose a base branch
from
Open

HDFS-17531. RBF: Aynchronous router RPC. #7308

wants to merge 16 commits into from

Conversation

KeeProMise
Copy link
Member

@KeeProMise KeeProMise commented Jan 21, 2025

Description of PR

seeAlso: https://issues.apache.org/jira/browse/HDFS-17531
I. Overview

The asynchronous router aims to address the performance bottleneck issues of the synchronous router in high - concurrency and multi - nameservices scenarios. By introducing an asynchronous processing mechanism, it optimizes the request handling process, improves the system's concurrency ability and resource utilization, and is particularly suitable for the federated scenarios where multiple downstream services (NS) need to be processed.
II. Problems of the Synchronous Router

  • Performance Bottleneck: The performance of the synchronous router is limited by the number of handler threads. Even if the connection thread can still forward requests to the downstream namenode, the handler must wait for each request to complete before processing the next one, resulting in limited processing capacity.
  • Thread Resource Waste: To improve performance, increasing the number of handler threads will lead to more thread switches, which instead reduces the system efficiency. At the same time, a large number of handler threads are in a blocked state, wasting thread resources.
  • Poor Isolation in Multi - ns: If the performance of a certain nameservice in the downstream nameservice is poor, it will cause the handler to wait for a long time, thus affecting the forwarding of requests to other normal - performance ns, resulting in a decrease in the overall performance of the downstream ns services perceived by the client.
  • Ineffective Utilization of Federation Multi - ns Performance: In high - concurrency scenarios, a large number of requests may be backlogged in the router's request queue, while the queues of downstream services are not fully utilized, leading to unreasonable resource allocation.

III. Design and Improvements of the Asynchronous Router

The asynchronous router solves the above problems by redesigning the request handling process and introducing an asynchronous processing mechanism. Its core improvements include:

  • Handler: Retrieves requests from the request queue for preliminary processing. If there are exceptions in the request (such as the mount point does not exist, etc.), it directly puts the response into the response queue; otherwise, it sends the request to the asynchronous handler thread pool.
  • Async Handler: Puts the request into the call queue (connection.calls) of the connection thread and returns immediately without blocking and waiting.
  • Async Responder: Is responsible for processing the responses received by the connection thread. If the request needs to be re - initiated (such as the downstream service returns a standby exception), it re - adds the request to the asynchronous handler thread pool; otherwise, it puts the response into the response queue.
  • Responder: Retrieves the response from the response queue and returns it to the client.

IV. Advantages of the Asynchronous Router

  • High - Concurrency Performance: Through the asynchronous processing mechanism, the asynchronous router can handle a large number of requests simultaneously, significantly improving the system's concurrent processing ability.
  • High Resource Utilization: It avoids thread blocking and frequent switching, reduces thread resource waste, and improves the overall efficiency of the system.
  • Isolation: Different ns are processed by different async handler thread pools, achieving isolation of different downstream services. Even if the performance of a certain service is poor, it will not affect the processing ability of other services.

V. Summary

The asynchronous router solves the performance bottleneck problem of the traditional synchronous router in high - concurrency scenarios by introducing an asynchronous processing mechanism. It not only improves the system's concurrency ability and resource utilization but also achieves isolation of downstream services through the queue mechanism, enhancing the system's stability and adaptability. In the federated scenarios where multiple downstream services need to be processed, the asynchronous router is a more efficient and reliable solution.

PRs:
HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and easier.
HADOOP-19235. IPC client uses CompletableFuture to support asynchronous operations.
HDFS-17544. [ARR] The router client rpc protocol PB supports asynchrony.
HDFS-17545. [ARR] router async rpc client.
HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc.
HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc.
HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc.
HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc.
HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc.
HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol supports asynchronous rpc.
HDFS-17659. [ARR]Router Quota supports asynchronous rpc.
HDFS-17672. [ARR] Move asynchronous related classes to the async package.
HADOOP-19361. RPC DeferredMetrics bugfix.
HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc.
HDFS-17650. [ARR] The router server-side rpc protocol PB supports asynchrony.
HDFS-17651.[ARR] Async handler executor isolation.
HDFS-17715. [ARR] Add documentation for asynchronous router.

How was this patch tested?

Async Router RPC: single nameservice performance testing report

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

KeeProMise and others added 12 commits January 20, 2025 14:34
…easier. (#6868). Contributed by Jian Zhang.

Reviewed-by: hfutatzhanghb <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
…ny. (#6870). Contributed by Jian Zhang.

Signed-off-by: He Xiaoqiao <[email protected]>
…an Zhang.

Reviewed-by: hfutatzhanghb <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
…. Contributed by Archie73.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
…Contributed by Wenqi Li.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
…ontributed by hfutatzhanghb.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
… Contributed by hfutatzhanghb.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
). Contributed by hfutatzhanghb.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
…rts asynchronous rpc. (#7159). Contributed by Jian Zhang.

Reviewed-by: Jian Zhang <[email protected]>
Signed-off-by: Jian Zhang <[email protected]>
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 25 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 6m 4s Maven dependency ordering for branch
+1 💚 mvninstall 33m 12s trunk passed
+1 💚 compile 16m 47s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 20s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 4m 15s trunk passed
+1 💚 mvnsite 5m 11s trunk passed
+1 💚 javadoc 4m 27s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 4m 32s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 9m 58s trunk passed
+1 💚 shadedclient 34m 34s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 35m 0s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 3m 19s the patch passed
+1 💚 compile 16m 36s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 16m 36s the patch passed
+1 💚 compile 15m 23s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 15m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 9s /results-checkstyle-root.txt root: The patch generated 1 new + 94 unchanged - 73 fixed = 95 total (was 167)
+1 💚 mvnsite 5m 17s the patch passed
+1 💚 javadoc 4m 18s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 4m 30s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 10m 53s the patch passed
+1 💚 shadedclient 35m 35s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 48s hadoop-common in the patch passed.
+1 💚 unit 2m 45s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 245m 52s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 unit 38m 43s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 1m 10s The patch does not generate ASF License warnings.
547m 56s
Reason Tests
Failed junit tests hadoop.hdfs.TestDecommission
hadoop.hdfs.TestDecommissionWithBackoffMonitor
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/1/artifact/out/Dockerfile
GITHUB PR #7308
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux be8bf2d77493 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 616d707
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/1/testReport/
Max. process+thread count 3444 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@KeeProMise
Copy link
Member Author

Hi, all, the failed UT is not related to this pr. You can see that another pipeline is successful. https://ci-hadoop.apache.org/blue/organizations/jenkins/hadoop-multibranch/detail/PR-7301/1/pipeline

@Hexiaoqiao
Copy link
Contributor

@KeeProMise we need to fix checkstyle first and trigger Yetus again, it is better to collect one green result.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 25 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 5m 47s Maven dependency ordering for branch
+1 💚 mvninstall 36m 42s trunk passed
+1 💚 compile 19m 30s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 17m 16s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 4m 40s trunk passed
+1 💚 mvnsite 5m 11s trunk passed
+1 💚 javadoc 4m 28s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 4m 21s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 10m 47s trunk passed
+1 💚 shadedclient 42m 31s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 43m 0s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 3m 21s the patch passed
+1 💚 compile 19m 25s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 19m 25s the patch passed
+1 💚 compile 17m 16s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 17m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 36s root: The patch generated 0 new + 94 unchanged - 73 fixed = 94 total (was 167)
+1 💚 mvnsite 5m 13s the patch passed
+1 💚 javadoc 4m 23s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 4m 22s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 10m 52s the patch passed
+1 💚 shadedclient 40m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 41s hadoop-common in the patch passed.
+1 💚 unit 2m 40s hadoop-hdfs-client in the patch passed.
+1 💚 unit 278m 39s hadoop-hdfs in the patch passed.
+1 💚 unit 40m 1s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
608m 24s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/2/artifact/out/Dockerfile
GITHUB PR #7308
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux de55d04f3e21 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 15cc381
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/2/testReport/
Max. process+thread count 3217 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7308/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants