-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path4. Partitioning (Intermediate).HQL
63 lines (48 loc) · 1.33 KB
/
4. Partitioning (Intermediate).HQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
a. Create a partitioned table by Contract and load the data from the
original table.
CREATE TABLE telecom_data_partition (
customerID STRING,
gender STRING,
SeniorCitizen INT,
Partner STRING,
Dependents STRING,
tenure INT,
PhoneService STRING,
MultipleLines STRING,
InternetService STRING,
OnlineSecurity STRING,
OnlineBackup STRING,
DeviceProtection STRING,
TechSupport STRING,
StreamingTV STRING,
StreamingMovies STRING,
PaperlessBilling STRING,
PaymentMethod STRING,
MonthlyCharges FLOAT,
TotalCharges FLOAT,
Churn STRING
)
PARTITIONED BY (Contract string);
set hive.exec.dynamic.partition.mode=nonstrict
(((
note : we use this because we are doing dynamic partition
)))
insert overwrite table telecom_data_partition partition(Contract) select*from telecom_data;
----------------
b. Write a HiveQL query to find the number of customers who have
churned in each Contract type using the partitioned table.
select Contract,count(Churn)
from telecom_data_partition
where Churn = 'Yes'
group by Contract;
--------------
c. Find the average MonthlyCharges for each type of Contract using the
partitioned table.
select Contract,avg(MonthlyCharges)
from telecom_data
group by Contract;
--------------
d. Determine the maximum tenure in each Contract type partition.
select Contract,max(tenure) as max_tenure
from telecom_data_partition
group by Contract;