-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path0_correlation_analysis.py
27 lines (18 loc) · 1.35 KB
/
0_correlation_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import pandas as pd
# Read the CSV data into a DataFrame
data = pd.read_csv('your_data.csv')
# Calculate the correlation matrix
correlation_matrix = data.corr()
# Get the absolute correlation values for each variable pair
abs_correlation = correlation_matrix.abs()
# Find the pairs with high correlation
high_correlation_pairs = (abs_correlation > 0.7) & (abs_correlation < 1.0)
high_correlation_variables = abs_correlation[high_correlation_pairs].stack().reset_index()
high_correlation_variables.columns = ['Variable 1', 'Variable 2', 'Correlation']
# Sort the pairs by correlation strength
sorted_pairs = high_correlation_variables.sort_values('Correlation', ascending=False)
# Display the strongly correlated pairs
print("Strongly Correlated Variable Pairs:")
print(sorted_pairs)
'''In both examples, we read the CSV data into a DataFrame (or data frame in R), calculate the correlation matrix using the corr function, and obtain the absolute correlation values for each variable pair. We then filter for pairs with correlation values above a certain threshold (in this case, > 0.7 and < 1.0) and sort the pairs based on the correlation strength.
The output will provide you with the strongly correlated variable pairs, along with their correlation coefficients. By examining these pairs, you can identify the most interesting variables that exhibit strong correlations.'''