ex2.Rmd

---
title: "Exercise 2"
author: "Jai Broome"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
    html_document:
        toc: true
        number_sections: true
---

```{r dependencies, message=FALSE}
require(ggplot2)
require(knitr)
```

```{r read_chunks}
read_chunk("ex2/ex2_chunks.R")
```

# Logistic Regression

## Visualizing the Data

Read in data and initialize theta

```{r read-data}
ex2data1 <- read.table("data/ex2data1.txt", sep = ",")
ex2data1 <- cbind(1, ex2data1)
initial_theta <- rep(0, times = 3)
```

```{r vizualize-data}
g1 <- ggplot(ex2data1, aes(x = V1, y = V2, color = as.factor(V3))) + 
    geom_point() +
    labs(x = "Exam 1 Score",
         y = "Exam 2 Score",
         color = "Legend\n") +
    scale_color_manual(labels = c("Not Admitted", "Admitted"), values = c(1,2))
g1 + ggtitle("Figure 1: Scatter plot of training data")
```


## Implementation

### Sigmoid function

```{r sig}
```

### Cost function and gradient

Note that `costFunction` takes a matrix `M` with the outcome in the last column, and separates it into Xs and Y. Note that I've implemented regularization for the first exercise so the function can be reused for the second part of this exercise.

```{r cost-function}
```

### Learning parameters using `optim`

I used R's `optim` as an alternative to Matlab's `fminunc`. `costFunction` returns both the cost and the gradient.

```{r learn-theta}
ex2data1 <- as.matrix(ex2data1)
newTheta <- optim(par = initial_theta,
      fn = function(x){costFunction(ex2data1, x)$J},
      gr = function(x){costFunction(ex2data1, x)$grad},
      method = "BFGS", control = list(maxit = 400))
```

### Evaluating Logistic Regression

```{r test-scores}
testScores <- c(45, 85)
```

With test scores of `r testScores`, we should see an admission probability of 0.776.

```{r h}
```


```{r numeric-test}
h(newTheta$par, c(1, testScores)) # Include the x0 term
```

While the in-sample error isn't reported in the excercise sheet, the above test makes me reasonably sure that this is working correctly. I'll update this if I hear from anyone who has taken this course for credit.

```{r pred-log-reg}
```

```{r train-accuracy}
ex2data1Pred <- predLogReg(ex2data1, newTheta$par)
sum(ex2data1[, 4]==round(ex2data1Pred, 0)) / nrow(ex2data1)
```

By eyeballing the plot in the exercise sheet, we can see the decision boundary lies about on (50, 75) and (75, 50). By using those points as initial parameters, I find two points on the decision boundary using `optim`.

```{r find-decision-boundary}
```


```{r calc-decision-boundary}
decPoints <- findDecisionBoundary(theta = newTheta$par)
```

The prediction should be 0.5 for both points (definitionally), so we can see if we made a mistake

```{r test-dec-points}
apply(decPoints[, c(1, 3)], 1, function(x){h(c(1, x), newTheta$par)})
```

```{r plot-dec-boundary}
dec.lm <- lm(x ~ newY, decPoints)
g2 <- g1 +
    geom_abline(slope = dec.lm$coefficients[[2]], intercept = dec.lm$coefficients[[1]])
g2 + ggtitle("Figure 2: Training data with decision boundary")
```


# Regularized Logistic Regression

## Visualizing the data

```{r plot-ex2data2}
ex2data2 <- read.table("data/ex2data2.txt", sep = ",")
g3 <- ggplot(ex2data2, aes(V1, V2)) + 
    geom_point(aes(color = as.factor(V3))) +
    labs(x = "Microchip Test 1",
         y = "Microchip Test 2",
         color = "Legend\n") +
    scale_color_manual(labels = c("Fail", "Pass"), values = c(1, 2))
g3 + ggtitle("Figure 3: Plot of training data")
```

## Feature mapping

```{r feature-map}
y <- ex2data2$V3

x1s <- ex2data2[, 1]
x2s <- ex2data2[, 2]

for(i in 2:6){
    x1s <- cbind(x1s, ex2data2[,1] ^ i)
    }
x1s <- cbind(1, x1s)

for(i in 2:6){
    x2s <- cbind(x2s, ex2data2[,1] ^ i)
}
x2s <- cbind(1, x2s)

allxs <- vector()

for(i in 1:ncol(x2s)){
    allxs <- cbind(allxs, x1s[, 1:(8-i)] * x2s[,i])
}

ex2data2.full <- cbind(allxs, y)
```

## Cost function and gradient

From the exercise sheet, we should expect an initial cost of about 0.693. 

```{r test-initial-params}
initial_theta <- rep(0, times = 28)
costFunction(ex2data2.full, initial_theta)$J
```

###  Learning parameters using `optim`

```{r}
newTheta2 <- optim(par = initial_theta,
      fn = function(x){costFunction(ex2data2.full, x)$J},
      gr = function(x){costFunction(ex2data2.full, x)$grad},
      method = "BFGS", control = list(maxit = 1000))
```

Unfortunately, there isn't a numerical check I can do to make sure this is correct.

## Plotting the decision boundary

## Optional (ungraded) exercises