-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANALYZE command fails with Spark server #41
Comments
gabbasb
added a commit
to gabbasb/hdfs_fdw
that referenced
this issue
Aug 23, 2017
Problem Statement: Hive and Spark both support HiveQL, and are compatible except for the behaviour of the ANALYZE command. The difference is as follows: In Hive, ANALYZE is a utility command and does not return any result set whereas in Spark it returns a result set. For example: In Hive we get this output: -------------------------- 0: jdbc:hive2://localhost:10000/testdb> analyze table names_tab compute statistics; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1488090103001_0007 INFO : The url to track the job: http://localhost:8088/proxy/application_1488090103001_0007/ INFO : Starting Job = job_1488090103001_0007, Tracking URL = http://localhost:8088/proxy/application_1488090103001_0007/ INFO : Kill Command = /home/abbasbutt/Projects/hadoop_fdw/hadoop/bin/hadoop job -kill job_1488090103001_0007 INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 INFO : 2017-08-22 19:08:11,328 Stage-0 map = 0%, reduce = 0% No rows affected (11.949 seconds) INFO : 2017-08-22 19:08:15,465 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec INFO : MapReduce Total cumulative CPU time: 930 msec INFO : Ended Job = job_1488090103001_0007 INFO : Table testdb.names_tab stats: [numFiles=2, numRows=12, totalSize=76, rawDataSize=64] 0: jdbc:hive2://localhost:10000/testdb> [abbasbutt@localhost bin]$ In Spark we get this output: --------------------------- 0: jdbc:hive2://localhost:10000/my_spark_db> analyze table junk_table compute statistics; +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (1.462 seconds) Solution: The CREATE SERVER command already has a client_type parameter that currently supports one value 'hiveserver2'. To support ANALYZE on Spark the client type can also have the value 'spark'. If the client_type is not specified the default will be hive and analyze command will fail when Spark is used. Otherwise if correct client_type is specified ANALYZE will work fine with Spark. For Example: postgres=# CREATE EXTENSION hdfs_fdw; CREATE EXTENSION postgres=# CREATE SERVER hdfs_svr FOREIGN DATA WRAPPER hdfs_fdw OPTIONS (host '127.0.0.1',port '10000',client_type 'spark'); CREATE SERVER postgres=# CREATE USER MAPPING FOR abbasbutt server hdfs_svr OPTIONS (username 'ldapadm', password 'ldapadm'); CREATE USER MAPPING postgres=# CREATE FOREIGN TABLE fnt( a int, name varchar(255)) SERVER hdfs_svr OPTIONS (dbname 'my_spark_db', table_name 'junk_table'); CREATE FOREIGN TABLE postgres=# ANALYZE fnt; ANALYZE
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ANALYZE jobhist;
ERROR: failed to fetch execute query: This function is supposed to execute queries that do not generate any result set
ANALYZE emp(empno);
ERROR: failed to fetch execute query: This function is supposed to execute queries that do not generate any result set
VACUUM ANALYZE emp;
The text was updated successfully, but these errors were encountered: