-
Notifications
You must be signed in to change notification settings - Fork 0
/
Oct-27-2016.Rmd
531 lines (455 loc) · 23.5 KB
/
Oct-27-2016.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
---
title: "Generating Cash Flow Expectations For Lending Club"
author: "Bryan Travis Smith, Ph.D"
date: "10/28/2016"
output: html_document
---
# OC R User Group Meet-Up
This a a summary of the presentation I gave at the October [OC R User Group Meet](https://www.meetup.com/OC-RUG/). The focus on the talk was to solve the following problem statment using R.
> I am investing in Lending Club notes. Specifically Grade A, 36 Month loans. I want to generate an expected cash flow for my investment.
## Setup
I want to generate my expectations for grade A, 36 months loan. The main goal is to generate a framework for the solution that it can also be applied to other loans. I am starting off creating a data frame from all the join [Lending Club Data](https://www.lendingclub.com/info/download-data.action) from $Jun \ 30^{th}, 2016$ when I selected the target loans from tranches of loan that are mature.
```{r message=FALSE}
library(dplyr)
library(reshape2)
library(ggplot2)
library(zoo)
library(stringr)
library(DT)
# Previously loaded all LC data into a single data frame and saved as RData
load("/opt/science/Datasets/LendingClub/2016-06-30/lcCombinded.RData")
lcA <- lc %>% mutate(
#Crate a Data Object for the Issue Date
issue_yearmon = as.Date(as.yearmon(issue_d,format="%b-%Y")),
#Create a Numeric Issue Year For Group By Calculations
issue_year = as.numeric(as.character(issue_yearmon,format="%Y")),
#Convert to Date object to calculate the 'Age' of the loan
last_pymnt_yearmon = as.Date(as.yearmon(last_pymnt_d,format="%b-%Y")),
#Converte interest rate to ta numeric value
interest = as.numeric(str_replace(int_rate,"%",""))
) %>%
mutate(
#Make what are roughly 1 month wide in the time between the loan was
#originated and the time it stopped generating cash flows
AgeBucket = cut(as.numeric((last_pymnt_yearmon-issue_yearmon)/365),
breaks=seq(0,6,1/12),
include.lowest=T,
right=F)
) %>%
mutate(
#Converte the Age Bucket to a Numeric Month or Statement value
Age = match(AgeBucket,levels(AgeBucket))
) %>%
mutate(
#Get the total recieved payments for each loan
total_rec = total_rec_prncp+total_rec_int
) %>%
#Limit to the data we are interested in for the problem
#Remove 2007 - stands out in graphs, actually doesn't change the results too much.
filter(term == " 36 months",grade == "A", issue_year < 2013, issue_year > 2007)
```
## Cash Flow Expectation : Paid As Agreed
The simplest expectation is to just assume that everyone will make every schedule payment. The equation for the payment of a fix term loan is:
$$ Payment = \frac{LoanAmount * MonthlyInterestRate}{1 - \frac{1}{(1+MonthlyInterestRate)^{Term}}} $$
You can then use this to get the amortization schedule for the loan.
```{r}
# Generates the expected payment for a fixed term loan.
payment_value <- function(loan_amount,interest_rate,term_number) {
loan_amount*interest_rate/12/(1 - (1+interest_rate/12)^(-1*term_number))
}
#Gereates the scheduled payment behavior for a loan that is paid as agreed
get_amortization <- function(term_number,interest_rate){
amortization <- data.frame(statement = seq(0,term_number,1),
payment = c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment = 0,
start_balance = 0,
end_balance = 1)
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
return(amortization)
}
#Helper function to round numeric values when writing to data table
round_df <- function(df, digits) {
nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))
df[,nums] <- round(df[,nums], digits = digits)
(df)
}
#Printing values
get_amortization(36,0.078) %>%
round_df(4) %>%
datatable(rownames=F)
```
## Results: Paid As Agreed
I am now going to use the Lending Club data to caluclate actual yield of grade A, 36 month loans and compare it to the expectations generated by the above payment schedule. I will do this calcuation by year for each year of mature loans in Lending Club. We are assuming there is no prepayment and charge off behavior, so I will also calculate these to see scale of these behaviors.
```{r}
lcA %>%
# Perform the following calculation for each year
group_by(issue_year) %>%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl("Charged Off",loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl("Fully Paid",loan_status)&(Age < 36),1,0))/Count*100,1),
#The Percent Increase on the Dollars Loans
Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),
#Dollar weighted interest rate for each year
Interest = round(weighted.mean(interest,funded_amnt),1)) %>%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization(36,
as.numeric(x['Interest'])/100)$payment))) %>%
mutate(
#Rounds Expected Yield to a percent
ExpectedYield = round(100*(ExpectedYield-1),1)) %>%
#Filter to relevant variables
select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %>%
#Convert to % for display
mutate(Prepays=paste0(Prepays,"%"),
Chargeoffs=paste0(Chargeoffs,"%"),
Interest=paste0(Interest,"%"),
Yield=paste0(Yield,"%"),
ExpectedYield = paste0(ExpectedYield,"%")) %>%
#Create a data table
datatable(rownames=F)
```
## Review: Paid As Agreed
I can see that there are signficant Prepayments and Charge Offs for each year of Lending Club, so I'll try to include these in the expectations.
## Cash Flow Expectation : Account For Charge Offs
I am going to include charge offs in the payment expectations. Once a loan charges off, it no longer makes payments. This makes it straight forward to include in the amortatization schedule because we only need to keep track of how many loans are left, and add the payments of the left over loans. In reality there is also deliquency behavior that should be accounted for, but we will have to do that another time.
To execute this, I need to get get a feel for when the loans charge off in a given traunch of loans.
```{r}
# Charege Offs
lcA %>%
#Perform the following calculations for each year
group_by(issue_year) %>%
## Count the number of leans each year
mutate(count=n()) %>%
## Perform a summary for each year and age of loan
group_by(issue_year,Age) %>%
## Find the number of loans that charge off each month
summarise(bad = sum(ifelse(grepl("Charged Off",loan_status),1,0)),
#Get the count value for the year
#count is a vector of the save value for each group
count = max(count)) %>%
## ORder by the age for the cumsum
arrange(Age) %>%
## Get the cumlative total of bad loans over the course of the term
mutate(total_bad = cumsum(bad)) %>%
## Group by the traunch of loans
group_by(issue_year) %>%
##Get the total number of bad loans for the traunch
mutate(max_total_bad = max(total_bad)) %>%
##Plot the cumlative percentage of chargeoffs for each traunch
ggplot(aes(Age,total_bad/max_total_bad,color=factor(issue_year)))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Chargeoffs')
```
The cumlative chargeoff behavior has a characteristic shape, so we can use this shape to include in the expected number of units making payments in a given month. We can generate this expecations by just doing a weighted average on these curves (thought there are more thoughtful ways to do this as well). I'm going to take this average, and smoothing fit, and make a function of out it to include in the amortization of the loan.
```{r}
percentChargeoffFunc <- lcA %>%
#Perform the following calculations for each year
group_by(issue_year) %>%
## Count the number of leans each year
mutate(count=n()) %>%
## Perform a summary for each year and age of loan
group_by(issue_year,Age) %>%
## Find the number of loans that charge off each month
summarise(bad = sum(ifelse(grepl("Charged Off",loan_status),1,0)),
#Get the count value for the year
#count is a vector of the save value for each group
count = max(count)) %>%
## ORder by the age for the cumsum
arrange(Age) %>%
## Limit to 36 months
filter(Age <= 36) %>%
## Get the cumlative total of bad loans over the course of the term
mutate(total_bad = cumsum(bad)) %>%
## Group by the traunch of loans
group_by(issue_year) %>%
##Get the total number of bad loans for the traunch
mutate(max_total_bad = max(total_bad)) %>%
##Calculate percent bad
mutate(percent_bad = total_bad/max_total_bad) %>%
## Group by Age to do the weighted average of all the curves
group_by(Age) %>%
## Calculate the weighted average by the traunch size
summarise(avg_percent_total_bad = weighted.mean(percent_bad,count)) %>%
## Return Approx Function to describe Chargeoff Behavior
(function(df) {
## Make a loess fit of the average to smooth out
mod <- loess(avg_percent_total_bad ~ Age,df,span=1/4)
## make the predictions
pred <- predict(mod,df)
## create a data frame of the age and smoothed predictions
tmp <- data.frame(Age = df$Age,pred_percent_total_bad=pred)
## make sure the new smooth values are between 0 and 1
tmp$pred_percent_total_bad = tmp$pred_percent_total_bad-min(tmp$pred_percent_total_bad)
tmp$pred_percent_total_bad = tmp$pred_percent_total_bad/max(tmp$pred_percent_total_bad)
# return approx function of the smooth function
approxfun(tmp$Age,tmp$pred_percent_total_bad,method='linear',yleft = 0,yright=1)
})
## Plot Function Output
data.frame(Age=seq(0,36),
pco = percentChargeoffFunc(seq(0,36))) %>%
ggplot(aes(Age,pco))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Chargeoffs')
```
Now that I have the function, I can include the expected charge off function times the actual scale to update our cash flow expectations.
```{r}
#Ammortizaiton Schedule for loans with Charge Off Expectations
get_amortization_chargeoff <- function(term_number,interest_rate,percent_chargeoff){
amortization <- data.frame(statement=seq(0,term_number,1),
payment=c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment=0,
start_balance=0,
end_balance=1,
#Added a units chargeoff which is the new curve times the measured scale
unit_chargeoff = percent_chargeoff*percentChargeoffFunc(seq(0,36)))
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
#Adds a payment recieved which is the schedule payments times the number of units left on a given statement
amortization <- amortization %>% mutate(payment_recieved = (1-unit_chargeoff)*payment)
return(amortization)
}
#Print output
get_amortization_chargeoff(36,0.07,0.06) %>%
round_df(4) %>%
datatable(rownames=F)
```
## Results: Account For Charge Offs
```{r}
lcA %>%
# Perform the following calculation for each year
group_by(issue_year) %>%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl("Charged Off",loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl("Fully Paid",loan_status)&(Age < 36),1,0))/Count*100,1),
#The Percent Increase on the Dollars Loans
Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),
#Dollar weighted interest rate for each year
Interest = round(weighted.mean(interest,funded_amnt),1)) %>%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization_chargeoff(36,
as.numeric(x['Interest'])/100,
as.numeric(x['Chargeoffs'])/100
)$payment_recieved))) %>%
mutate(
#Rounds Expected Yield to a percent
ExpectedYield = round(100*(ExpectedYield-1),1)
) %>%
#Filter to relevant variables
select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %>%
#Convert to % for display
mutate(Prepays=paste0(Prepays,"%"),
Chargeoffs=paste0(Chargeoffs,"%"),
Interest=paste0(Interest,"%"),
Yield=paste0(Yield,"%"),
ExpectedYield = paste0(ExpectedYield,"%")) %>%
#Create a data table
datatable(rownames=F)
```
## Review: Account For Charge Offs
I see that the expected values have decreased from the previous expectations. The expected and actuals are closer in line, but there is room for improvement. I have accounted for the charge offs, but not the prepayment behavior.
## Cash Flow Expectation : Account For Prepayments & Charge Offs
I can now repeat the process I did for including the effect of chargeoffs in our cash flow expectations. I can see if there is a characteristic cumlative behavior, generate a funciton that approximates the behavior, and incorporate it into the ammortization.
```{r}
# Prepays
lcA %>%
## Perform the following calculations for each year
group_by(issue_year) %>%
## Count the number of leans each year
mutate(count=n()) %>%
## Perform a summary for each year and age of loan
group_by(issue_year,Age) %>%
## Find the number of loans that prepayments each month
summarise(prepay = sum(ifelse(grepl("Fully Paid",loan_status),1,0)),
## Get the count value for the year
## count is a vector of the save value for each group
count = max(count)) %>%
## ORder by the age for the cumsum
arrange(Age) %>%
## Can only prepay before the loan is mature
filter(Age < 36) %>%
## Get the cumlative total of prepaid loans over the course of the term
mutate(total_prepay = cumsum(prepay)) %>%
## Group by the traunch of loans
group_by(issue_year) %>%
## Get the total number of prepayments for the traunch
mutate(max_total_prepay = max(total_prepay)) %>%
## Plot the cumlative percentage of prepaid loans for each traunch
ggplot(aes(Age,total_prepay/max_total_prepay,color=factor(issue_year)))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Prepayment')
```
The cumlative prepayment behavior over the life does have a characterist shape, so I will generate an approximation function similar to charge offs.
```{r}
percentPrepaymentFunc <- lcA %>%
## Perform the following calculations for each year
group_by(issue_year) %>%
## Count the number of leans each year
mutate(count=n()) %>%
## Perform a summary for each year and age of loan
group_by(issue_year,Age) %>%
## Find the number of loans that prepayments each month
summarise(prepay = sum(ifelse(grepl("Fully Paid",loan_status),1,0)),
## Get the count value for the year
## count is a vector of the save value for each group
count = max(count)) %>%
## ORder by the age for the cumsum
arrange(Age) %>%
## Can only prepay before the loan is mature
filter(Age < 36) %>%
## Get the cumlative total of prepaid loans over the course of the term
mutate(total_prepay = cumsum(prepay)) %>%
## Group by the traunch of loans
group_by(issue_year) %>%
## Get the total number of prepayments for the traunch
mutate(max_total_prepay = max(total_prepay)) %>%
## Calculate the percentage of prepayments that happen in a given
## month out of all of the prepayments that happen in the traunch
mutate(percent_prepay = total_prepay/max_total_prepay) %>%
## Calculate for eeach age
group_by(Age) %>%
## The weighted mean of the percentage by the size of the traunch
summarise(avg_percent_total_prepay = weighted.mean(percent_prepay,count)) %>%
## Return an approx function
(function(df) {
## perform a loess fit to smoooth the function
mod <- loess(avg_percent_total_prepay ~ Age,df,span=1/4)
## predict the results
pred <- predict(mod,df)
## create a data frame to use to make the approx function
tmp <- data.frame(Age = df$Age,pred_percent_total_prepay=pred)
## Make sure the fit produces values between 0 and 1
tmp$pred_percent_total_prepay = tmp$pred_percent_total_prepay-min(tmp$pred_percent_total_prepay)
tmp$pred_percent_total_prepay = tmp$pred_percent_total_prepay/max(tmp$pred_percent_total_prepay)
## Return the approx function
approxfun(tmp$Age,tmp$pred_percent_total_prepay,method='linear',yleft = 0,yright=1)
})
## Plot the approx functions
data.frame(Age=seq(0,36),
ppp = percentPrepaymentFunc(seq(0,36))) %>%
ggplot(aes(Age,ppp))+
geom_line()+
theme_bw()+
xlim(0,36)+
ylab('Percent Total Chargeoffs')
```
## Results: Account For Prepayments & Charge Offs
I have the characteristic prepayment function and charge off function, so I can append it to the ammortization schedule. Like loans that chargeoff, a prepaid loan does not generate payments after it is prepaid. Unlike charged off loans, it does return the prinipal. Once this is included, I suspect I will see higher than previously expected cashflows earlier due to returned principal.
```{r}
#Ammortizaiton Schedule for loans with Charge Off and Prepayment Expectations
get_amortization_chargeoff_prepay <- function(term_number,interest_rate,percent_chargeoff,percent_prepay){
amortization <- data.frame(statement=seq(0,term_number,1),
payment=c(0,rep(payment_value(1,interest_rate,term_number),term_number)),
interest_payment = 0,
principal_payment=0,
start_balance=0,
end_balance=1,
#Units chargeoffed is measures scale * expected percent of total
unit_chargeoff = percent_chargeoff*percentChargeoffFunc(seq(0,36)),
#Units prepyament is measured scale * expected percent of total
unit_prepay = percent_prepay*percentPrepaymentFunc(seq(0,36)))
for(i in (1:term_number+1)){
amortization[i,'start_balance'] = amortization[i-1,'end_balance']*(1+interest_rate/12)
amortization[i,'interest_payment'] = interest_rate*amortization[i-1,'end_balance']/12
amortization[i,'principal_payment'] = amortization[i,'payment'] - amortization[i,'interest_payment']
amortization[i,'end_balance'] = amortization[i,'start_balance']-amortization[i,'payment']
}
amortization[term_number+1,'end_balance']=0
amortization <- amortization %>%
mutate(monthly_prepay = ifelse(is.na(lag(unit_prepay)),0,unit_prepay-lag(unit_prepay))) %>%
mutate(payment_recieved = (1-unit_chargeoff-unit_prepay)*payment+monthly_prepay*(end_balance+payment))
return(amortization)
}
get_amortization_chargeoff_prepay(36,0.07,0.06,0.50) %>%
round_df(4) %>%
datatable(rownames=F)
```
Now I can add the prepayment behavior into our cashflows and compare the expected to the actuals.
```{r}
lcA %>%
# Perform the following calculation for each year
group_by(issue_year) %>%
# Aggregrate for each year
summarise(
#Number of Loans
Count = n(),
#Dollars Loaned Out
DollarsFunded = sum(funded_amnt),
#Dollars collected from payments
DollarsRecieved = sum(total_rec),
#Percent Charge Offs
Chargeoffs = round(sum(ifelse(grepl("Charged Off",loan_status),1,0))/Count*100,1),
#Percent Prepayments
Prepays = round(sum(ifelse(grepl("Fully Paid",loan_status)&(Age < 36),1,0))/Count*100,1),
#The Percent Increase on the Dollars Loans
Yield = round(100*(DollarsRecieved/DollarsFunded-1),1),
#Dollar weighted interest rate for each year
Interest = round(weighted.mean(interest,funded_amnt),1)) %>%
mutate(
#Applys the amortization schedule and sumes payments recieved for each loan
ExpectedYield = apply( (.),
1,
function(x) sum(get_amortization_chargeoff_prepay(36,
as.numeric(x['Interest'])/100,
as.numeric(x['Chargeoffs'])/100,
as.numeric(x['Prepays'])/100
)$payment_recieved))) %>%
mutate(
#Rounds Expected Yield to a percent
ExpectedYield = round(100*(ExpectedYield-1),1)
) %>%
#Filter to relevant variables
select(issue_year,DollarsFunded,Prepays,Chargeoffs,Interest,Yield,ExpectedYield) %>%
#Convert to % for display
mutate(Prepays=paste0(Prepays,"%"),
Chargeoffs=paste0(Chargeoffs,"%"),
Interest=paste0(Interest,"%"),
Yield=paste0(Yield,"%"),
ExpectedYield = paste0(ExpectedYield,"%")) %>%
#Create a data table
datatable(rownames=F)
```
## Review: Account For Prepayments & Charge Offs
Now I have a framework for cashflow expectionats that comes within error of the actuals. This method requires that I generate expections for the expected charge off and prepayment scales to forecast cashflows, but once I have that, the rest is just straight forward math. The cashflow model that comes describes lending club actuals is using the paid as agreed model, then account for charge off and prepayment behavior. Even though there are a number of simplication and assumptions in this framework, it is an improved understanding of the expected cashflows for a grade A, 36 month term loan in Lending Club.