Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50666][SQL] Support hint for reading in JDBC data source #49564

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wayneguow
Copy link
Contributor

What changes were proposed in this pull request?

This PR aims to add a hint option for JDBC data source. This option is used to specify the hint for reading. It will apply only if the underlying DBMS supports the hint feature. Currently, this option is only supported by OracleDialect and MySQLDialect.

Why are the changes needed?

It's useful for performance tuning when reading from DBMS.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Passed GA and add a new test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@wayneguow wayneguow marked this pull request as draft January 19, 2025 15:36
@wayneguow wayneguow marked this pull request as ready for review January 19, 2025 17:55
@LuciferYang
Copy link
Contributor

cc @wangyum

@@ -321,4 +323,5 @@ object JDBCOptions {
val JDBC_CONNECTION_PROVIDER = newOption("connectionProvider")
val JDBC_PREPARE_QUERY = newOption("prepareQuery")
val JDBC_PREFER_TIMESTAMP_NTZ = newOption("preferTimestampNTZ")
val JDBC_HINT_STRING = newOption("hint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to standardize the type of hints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make the option is available for each of the built-in JDBC dialects ?

@LuciferYang
Copy link
Contributor

also cc @beliefer

@@ -406,7 +406,7 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHelper with No
}

options.prepareQuery +
s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" +
s"SELECT ${options.hint}$columnList FROM ${options.tableOrQuery} $tableSampleClause" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an spec to say that the "HINT" could only appear in the SELECT clause?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @pan3793 , it's just a comment, but Apache Spark's HINT is also in SELECT clause only.

https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#hints

@@ -249,6 +249,8 @@ class JDBCOptions(
.map(_.toBoolean)
.getOrElse(SQLConf.get.timestampType == TimestampNTZType)

val hint = parameters.get(JDBC_HINT_STRING).map(_ + " ").getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the JDBC url starts with jdbc:oracle or jdbc:mysql.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @beliefer 's comment to be robust. (I deleted my previous comment.)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @wayneguow .

This looks like a valid suggestion if we have a validation for the user-provided hint string is a valid comment like /*+... */ always.

BTW, did you exclude MSSQL Hint because the syntax style is different and not a comment?

SELECT * 

FROM Customers 

WHERE City = 'San Francisco' 

OPTION (USE  INDEX = IX_City); 

@dongjoon-hyun
Copy link
Member

Also, cc @yaooqinn because this could be a nice improvement as a part of the umbrella JIRA, SPARK-47361 (
Improve JDBC data sources).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants