JDBC Connector for Apache Hive on Cloud Dataproc

DataprocDriver provides a secure, easily accessible way to connect to Apache Hive on Dataproc cluster from everywhere via Java Database Connection through Component Gateway.

Supported Java Version: 1.8

Before you begin

In order to use this library, you first need to go through the following steps:

Set up a Cloud project
Setup Authentication
Note that we recommend using service account login rather than end user login.
Make sure Component Gateway is enabled for the cluster(s) you are trying to connect to within the project
Make sure Hive is running in HTTP mode for the cluster(s)

For Step 4 above, we have provided an example initialization action file at gs://hive-http-mode-init-action/hive-http-config.sh that would configure Hive to be running in HTTP mode during cluster creation.

How to use Dataproc Driver

Clone this repo

git clone https://github.com/GoogleCloudDataproc/dataproc-jdbc-connector.git
cd dataproc-jdbc-connector

Build Dataproc Driver JAR
```
./gradlew -p jdbc-driver shadowJar
```
Note that this step might take around 45 minutes.
Compiled Dataproc Driver JAR will be at dataproc-jdbc-connector/jdbc-driver/build/libs/jdbc-driver-1.0-SNAPSHOT-all.jar.
Build example-client JAR
```
./gradlew -p example-client shadowJar
```

Connection URL format

Dataproc Driver for Hive accepts JDBC URL string with prefix jdbc:dataproc://hive/
Client can pass in database name or leave it as empty to use default database
jdbc:dataproc://hive/; or jdbc:dataproc://hive/dbName;
projectId and region are required parameters. To find your project and region, refer back to Set up a Cloud project in the Before you begin section above
jdbc:dataproc://hive/;projectId={pid};region={region}
Cluster specific parameters: clusterName or clusterPoolLabel
Client can set parameters to pick a particular cluster by setting either one of clusterName or clusterPoolLabel
Note that clusterName has a higher priority than clusterPoolLabel, if you pass in both parameters, the DataprocDriver will only look for the cluster by name.
- clusterName is an optional parameter that allows client to specify the name of the cluster to connect to
  jdbc:dataproc://hive/;projectId={pid};region={region};clusterName={cluster-name}
- clusterPoolLabel is an optional parameter that supports submitting to a cluster pool
  Client can pass in cluster pool labels and the Dataproc Driver will pick a healthy cluster (cluster with status.state = ACTIVE) within the pool to connect to. Please do not put status.state as one of the cluster pool label.
Labels can be specified in the format clusterPoolLabel=label1=value1[:label2=value2]

Example:
```
jdbc:dataproc://hive/;projectId={pid};region={region};clusteroPoolLabel=com=google:team=dataproc`
```
DataprocDriver also accepts other semicolon separated list of session variables, Hive configuration variables or Hive variables that Hive supports
```
jdbc:dataproc://hive/{dbName};projectId={pid};region={region};clusterName={name};sess_var_list?hive_conf_list#hive_var_list
```

Connecting to Dataproc through Component Gateway

DataprocDriver uses Google OAuth 2.0 APIs for authentication and authorization.

To connect to Dataproc cluster through Component Gateway, the Dataproc JDBC Driver will include an authentication token. For security reasons, it puts the token in the Proxy-Authorization:Bearer header.

To get the access token set in Before you begin step 2 section above, DataprocDriver will use the Application Default Credentials.

Use with Beeline

# update the HADOOP_CLASSPATH to include the Dataproc JDBC Driver JARs
export HADOOP_CLASSPATH=`hadoop classpath`:{path-to-driver-jar}jdbc-driver-1.0-SNAPSHOT-all.jar

# tell beeline the class name for the driver using “-d” option
beeline -d "com.google.cloud.dataproc.jdbc.DataprocDriver" -u "jdbc:dataproc://hive/;projectId={pid};region={region};{other-parameters}"

Example:

export HADOOP_CLASSPATH=`hadoop classpath`:/usr/local/home/Downloads/dataproc-jdbc-connector/jdbc-driver/build/libs/jdbc-driver-1.0-SNAPSHOT-all.jar
beeline -d "com.google.cloud.dataproc.jdbc.DataprocDriver" -u "jdbc:dataproc://hive/;projectId=demo-dataproc;region=us-central1;clusterName=demo-cluster"

Use with example client

We have provided an example client -- an example usage of the JDBC driver from Java that connects to Hive using our Dataproc Driver.

Connection connection = DriverManager.getConnection("jdbc:dataproc://hive/default;projectId=pid;region=us-central1;clusterName=my-cluster");
try (Statement stmt = connection.createStatement()) {
  ResultSet res = stmt.executeQuery("SHOW TABLES");
  while (res.hasNext()) {
    System.out.println(res.getString(1));
  }
}

To run the example client:

# build example client JAR
./gradlew -p example-client shadowJar

# run the JAR, it will prompt you to enter the JDBC URL string
java -jar example-client/build/libs/example-client-1.0-SNAPSHOT-all.jar

Running the tests

To run all unit tests, use ./gradlew test

To run specific unit test, use

./gradlew test --tests="com.google.cloud.dataproc.jdbc.{test name}"

To run system test, use
```
./gradlew systemTest --tests="com.google.cloud.dataproc.jdbc.DataprocSystemTest" -DprojectId="{projectId}" -Dregion="{region}" -Duser="{user}"
```
Note that projectId and region are the parameters for the specific project you are working on; user can be an arbitrary username, used as a prefix for the cluster name to create, delete and avoid cluster conflict during testing.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
docs		docs
example-client		example-client
gradle/wrapper		gradle/wrapper
hive		hive
jdbc-driver		jdbc-driver
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JDBC Connector for Apache Hive on Cloud Dataproc

Before you begin

How to use Dataproc Driver

Connection URL format

Connecting to Dataproc through Component Gateway

Use with Beeline

Use with example client

Running the tests

About

Releases

Packages

Contributors 2

Languages

License

GoogleCloudDataproc/dataproc-jdbc-connector

Folders and files

Latest commit

History

Repository files navigation

JDBC Connector for Apache Hive on Cloud Dataproc

Before you begin

How to use Dataproc Driver

Connection URL format

Connecting to Dataproc through Component Gateway

Use with Beeline

Use with example client

Running the tests

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages