-
Notifications
You must be signed in to change notification settings - Fork 24
/
05_optimization.qmd
877 lines (645 loc) · 45.3 KB
/
05_optimization.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
# Optimization {#sec-optim}
```{r}
#| include: false
library(ggplot2)
library(tibble)
library(patchwork)
```
To optimize, we use derivatives and calculus. Optimization is to find the maximum or minimum of a functon, and to find what value of an input gives that extremum. This has obvious uses in engineering. Many tools in the statistical toolkit use optimization. One of the most common ways of estimating a model is through "Maximum Likelihood Estimation", done via optimizing a function (the likelihood).
Optimization also comes up in Economics, Formal Theory, and Political Economy all the time. A go-to model of human behavior is that they optimize a certain utility function. Humans are not pure utility maximizers, of course, but nuanced models of optimization -- for example, adding constraints and adding uncertainty -- will prove to be quite useful.
## Example: Meltzer-Richard {.unnumbered}
A standard backdrop in comparative political economy, the Meltzer-Richard (1981) model states that redistribution of wealth should be higher in societies where the median income is much smaller than the average income. More to the point, typically income distributions wher ethe median is very different from the average is one of high inequality. In other words, the Meltzer-Richard model says that highly unequal economies will have more re-distribution of wealth. Why is that the case? Here is a simplified example that is not the exact model by Meltzer and Richard[^05_optimization-1], but adapted from Persson and Tabellini[^05_optimization-2]
[^05_optimization-1]: Allan H. Meltzer and Scott F. Richard. ["A Rational Theory of the Size of Government"](https://www.jstor.org/stable/1830813). *Journal of Political Economy* 89:5 (1981), p. 914-927
[^05_optimization-2]: Adapted from Torsten Persson and Guido Tabellini, *Political Economics: Explaining Economic Policy*. MIT Press.
We will set the following things about our model human and model democracy.
- Individuals are indexed by $i$, and the total population is normalized to unity ("1") without loss of generality.
- $U(\cdot)$, u for "utility", is a function that is concave and increasing, and expresses the utility gained from public goods. This tells us that its first derivative is *positive*, and its second derivative is **negative**.
- $y_i$ is the income of person $i$
- $W_i$, w for "welfare", is the welfare of person $i$
- $c_i$, c for "consumption", is the consumption utility of person $i$
Also, the government is democratically elected and sets the following redistribution output:
- $\tau$, t for "tax", is a flat tax rate between 0 and 1 that is applied to everyone's income.
- $g$, "g" for "goods", is the amount of public goods that the government provides.
Suppose an individual's welfare is given by: $$W_i = c_i + U(g)$$
The consumption good is the person's post-tax income.
$$c_i = (1 - \tau) y_i$$
Income varies by person (In the next section we will cover probability, by then we will know that we can express this by saying that $y$ is a random variable with the cumulative distribution function $F$, i.e. $y \sim F$.). Every distribution has a mean and an median.
- $E(y)$ is the average income of the society.
- $\text{med}(y)$ is the **median income** of the society.
What will happen in this economy? What will the tax rate be set too? How much public goods will be provided?
We've skipped ahead of some formal theory results of demoracy, but hopefully these are conceptually intuitive. First, if a democracy is competitive, there is no slack in the government's goods, and all tax revenue becomes a public good. So we can go ahead and set the constraint:
$$g = \sum_{i} \tau y_i P(y_i) = \tau E(y)$$
We can do this trick because of the "normalizes to unity" setting, but this is a general property of the average.
Now given this constraint we can re-write an individual's welfare as
```{=tex}
\begin{align*}
W_i &= \left(1 - \frac{g}{E(y)}\right)y_i + U(g)\\
&= \left(E(y) - g\right) \frac{1}{E(y)} y_i + U(g)\\
&= \left(E(y) - g\right) \frac{y_i}{E(y)} + U(g)\\
\end{align*}
```
When is the individual's welfare maximized, **as a function of the public good**? \begin{align*}
\frac{d}{dg}W_i &= - \frac{y_i}{E(y)} + \frac{d}{dg}U(g)\\
\end{align*}
$\frac{d}{dg}W_i = 0$ when $\frac{d}{dg}U(g) = \frac{y_i}{E(y)}$, and so after expressing the derivative as $U_g = \frac{d}{dg}U(g)$ for simplicity,
$$g_i^\star = {U_g}^{-1}\left(\frac{y_i}{E(y)}\right)$$
Now recall that because we assumed concavity, $U_g$ is a negative sloping function whose value is positive. It can be shown that the inverse of such a function is also decreasing. Thus an individual's preferred level of government is determined by a single continuum, the person's income divided by the average income, and the function is **decreasing** in $y_i$. This is consistent with our intuition that richer people prefer less redistribution.
That was the amount for any given person. The government has to set one value of $g$, however. So what will that be? Now we will use another result, the median voter theorem. This says that under certain general electoral conditions (single-peaked preferences, two parties, majority rule), the policy winner will be that preferred by the median person in the population. Because the only thing that determines a person's preferred level of government is $y_i / E(y)$, we can presume that the median voter, whose income is $\text{med}(y)$ will prevail in their preferred choice of government. Therefore, we will see
$$g^\star = {U_g}^{-1}\left(\frac{\text{med}(y)}{E(y)}\right)$$
What does this say about the level of redistribution we observe in an economy? The higher the average income is than the median income, which often (but not always) means *more* inequality, there should be *more* redistribution.
## Maxima and Minima
The first derivative, $f'(x)$, quantifies the slope of a function. Therefore, it can be used to check whether the function $f(x)$ at the point $x$ is increasing or decreasing at $x$.
1. **Increasing:** $f'(x)>0$
2. **Decreasing:** $f'(x)<0$
3. **Neither increasing nor decreasing**: $f'(x)=0$ i.e. a maximum, minimum, or saddle point
So for example, $f(x) = x^2 + 2$ and $f^\prime(x) = 2x$
```{r}
#| echo: false
#| fig-cap: Maxima and Minima
range <- tibble::tibble(x = c(-3, 3))
fx0 <- ggplot(range, aes(x = x)) +
labs(
x = expression(x), y = expression(f(x)),
labs = expression(f(x) == x^2 + 2)
)
fx <- fx0 +
stat_function(fun = function(x) x^2 + 2, linewidth = 0.5) +
expand_limits(y = 0)
fprimex <- fx0 + stat_function(fun = function(x) 2 * x, linewidth = 0.5, linetype = "dashed") +
labs(x = expression(x), y = expression(f ~ plain("'") ~ (x)))
fx + fprimex + plot_layout(nrow = 1)
```
::: {#exr-maxminplot}
## Plotting a maximum and minimum
Plot $f(x)=x^3+ x^2 + 2$, plot its derivative, and identify where the derivative is zero. Is there a maximum or minimum?
:::
```{r}
#| echo: false
#| eval: false
range <- tibble::tibble(x = c(-3, 3))
fx0 <- ggplot(range, aes(x = x)) +
labs(x = expression(x), y = expression(f(x))) +
theme_bw()
fx <- fx0 +
stat_function(fun = function(x) x^3 + x^2 + 2, size = 0.5) +
expand_limits(y = 0)
fprimex <- fx0 + stat_function(fun = function(x) 3 * x^2, size = 0.5, linetype = "dashed") +
labs(x = expression(x), y = expression(f ~ plain("'") ~ (x)))
fx + fprimex + plot_annotation(expression(f(x) == x^3 + 2))
```
The second derivative $f''(x)$ identifies whether the function $f(x)$ at the point $x$ is
1. Concave down: $f''(x)<0$
2. Concave up: $f''(x)>0$
**Maximum (Minimum)**: $x_0$ is a **local maximum (minimum)** if $f(x_0)>f(x)$ ($f(x_0)<f(x))$ for all $x$ within some open interval containing $x_0$. $x_0$ is a **global maximum (minimum)** if $f(x_0)>f(x)$ ($f(x_0)<f(x))$ for all $x$ in the domain of $f$.
Given the function $f$ defined over domain $D$, all of the following are defined as **critical points**:
1. Any interior point of $D$ where $f'(x)=0$.
2. Any interior point of $D$ where $f'(x)$ does not exist.
3. Any endpoint that is in $D$.
The maxima and minima will be a subset of the critical points.
**Second Derivative Test of Maxima/Minima**: We can use the second derivative to tell us whether a point is a maximum or minimum of $f(x)$.
1. Local Maximum: $f'(x)=0$ and $f''(x)<0$
2. Local Minimum: $f'(x)=0$ and $f''(x)>0$
3. Need more info: $f'(x)=0$ and $f''(x)=0$
**Global Maxima and Minima** Sometimes no global max or min exists --- e.g., $f(x)$ not bounded above or below. However, there are three situations where we can fairly easily identify global max or min.
1. **Functions with only one critical point.** If $x_0$ is a local max or min of $f$ and it is the only critical point, then it is the global max or min.
2. **Globally concave up or concave down functions.** If $f''(x)$ is never zero, then there is at most one critical point. That critical point is a global maximum if $f''<0$ and a global minimum if $f''>0$.
3. **Functions over closed and bounded intervals** must have both a global maximum and a global minimum.
::: {#exm-maxmindraw}
## Maxima and Minima by drawing
Find any critical points and identify whether they are a max, min, or saddle point:
1. $f(x)=x^2+2$
2. $f(x)=x^3+2$
3. $f(x)=|x^2-1|$, $x\in [-2,2]$
:::
## Concavity of a Function
Concavity helps identify the curvature of a function, $f(x)$, in 2 dimensional space.
::: {#def-concave}
## Concave Function
A function $f$ is strictly concave over the set S \underline{if} $\forall x_1,x_2 \in S$ and $\forall a \in (0,1)$, $$f(ax_1 + (1-a)x_2) > af(x_1) + (1-a)f(x_2)$$ \textit{Any} line connecting two points on a concave function will lie \textit{below} the function.
:::
```{r}
#| echo: false
range1 <- tibble(x = c(-4, 4))
fx1 <- ggplot(range1, aes(x = x)) +
stat_function(fun = function(x) -x^2, size = 0.5) +
labs(y = expression(f(x)), title = "Concave")
fx2 <- ggplot(range1, aes(x = x)) +
stat_function(fun = function(x) x^2, size = 0.5) +
labs(y = expression(f(x)), title = "Convex")
fx1 + fx2 + plot_layout(nrow = 1)
```
::: {#def-convex}
## Convex Function
Convex: A function f is strictly convex over the set S \underline{if} $\forall x_1,x_2 \in S$ and $\forall a \in (0,1)$, $$f(ax_1 + (1-a)x_2) < af(x_1) + (1-a)f(x_2)$$
Any line connecting two points on a convex function will lie above the function.
:::
```{=tex}
\begin{comment}
\parbox{2in}{\includegraphics[scale=.4]{Convex.pdf}}
\end{comment}
```
Sometimes, concavity and convexity are strict of a requirement. For most purposes of getting solutions, what we call quasi-concavity is enough.
::: {#def-quasiconcave}
## Quasiconcave Function
A function f is quasiconcave over the set S if $\forall x_1,x_2 \in S$ and $\forall a \in (0,1)$, $$f(ax_1 + (1-a)x_2) \ge \min(f(x_1),f(x_2))$$
No matter what two points you select, the \textit{lowest} valued point will always be an end point.
:::
```{=tex}
\begin{comment}
\parbox{2in}{\includegraphics[scale=.4]{Quasiconcave.pdf}}
\end{comment}
```
::: {#def-quasiconvex}
## Quasiconvex
A function f is quasiconvex over the set $S$ if $\forall x_1,x_2 \in S$ and $\forall a \in (0,1)$, $$f(ax_1 + (1-a)x_2) \le \max(f(x_1),f(x_2))$$ No matter what two points you select, the \textit{highest} valued point will always be an end point.
:::
```{=tex}
\begin{comment}
\parbox{1.8in}{\includegraphics[scale=.4]{Quasiconvex.pdf}}
\end{comment}
```
**Second Derivative Test of Concavity**: The second derivative can be used to understand concavity.
If $$\begin{array}{lll}
f''(x) < 0 & \Rightarrow & \text{Concave}\\
f''(x) > 0 & \Rightarrow & \text{Convex}
\end{array}$$
### Quadratic Forms {.unnumbered}
Quadratic forms is shorthand for a way to summarize a function. This is important for finding concavity because
1. Approximates local curvature around a point --- e.g., used to identify max vs min vs saddle point.
2. They are simple to express even in $n$ dimensions:
3. Have a matrix representation.
**Quadratic Form**: A polynomial where each term is a monomial of degree 2 in any number of variables:
```{=tex}
\begin{align*}
\text{One variable: }& Q(x_1) = a_{11}x_1^2\\
\text{Two variables: }& Q(x_1,x_2) = a_{11}x_1^2 + a_{12}x_1x_2 + a_{22}x_2^2\\
\text{N variables: }& Q(x_1,\cdots,x_n)=\sum\limits_{i\le j} a_{ij}x_i x_j
\end{align*}
```
which can be written in matrix terms:
One variable
$$Q(\mathbf{x}) = x_1^\top a_{11} x_1$$
N variables: \begin{align*}
Q(\mathbf{x}) &=\begin{pmatrix} x_1 & x_2 & \cdots & x_n \end{pmatrix}\begin{pmatrix}
a_{11}&\frac{1}{2}a_{12}&\cdots&\frac{1}{2}a_{1n}\\
\frac{1}{2}a_{12}&a_{22}&\cdots&\frac{1}{2}a_{2n}\\
\vdots&\vdots&\ddots&\vdots\\
\frac{1}{2}a_{1n}&\frac{1}{2}a_{2n}&\cdots&a_{nn}
\end{pmatrix}
\begin{pmatrix} x_1\\x_2\\\vdots\\x_n\end{pmatrix}\\
&= \mathbf{x}^\top\mathbf{Ax}
\end{align*}
For example, the Quadratic on $\mathbf{R}^2$: \begin{align*}
Q(x_1,x_2)&=\begin{pmatrix} x_1& x_2 \end{pmatrix} \begin{pmatrix} a_{11}&\frac{1}{2} a_{12}\\
\frac{1}{2}a_{12}&a_{22}\end{pmatrix} \begin{pmatrix} x_1\\x_2 \end{pmatrix} \\
&= a_{11}x_1^2 + a_{12}x_1x_2 + a_{22}x_2^2
\end{align*}
### Definiteness of Quadratic Forms {.unnumbered}
When the function $f(\mathbf{x})$ has more than two inputs, determining whether it has a maxima and minima (remember, functions may have many inputs but they have only one output) is a bit more tedious. Definiteness helps identify the curvature of a function, $Q(\textbf{x})$, in n dimensional space.
**Definiteness**: By definition, a quadratic form always takes on the value of zero when $x = 0$, $Q(\textbf{x})=0$ at $\textbf{x}=0$. The definiteness of the matrix $\textbf{A}$ is determined by whether the quadratic form $Q(\textbf{x})=\textbf{x}^\top\textbf{A}\textbf{x}$ is greater than zero, less than zero, or sometimes both over all $\mathbf{x}\ne 0$.
## FOC and SOC
We can see from a graphical representation that if a point is a local maxima or minima, it must meet certain conditions regarding its derivative. These are so commonly used that we refer these to "First Order Conditions" (FOCs) and "Second Order Conditions" (SOCs) in the economic tradition.
### First Order Conditions {.unnumbered}
When we examined functions of one variable $x$, we found critical points by taking the first derivative, setting it to zero, and solving for $x$. For functions of $n$ variables, the critical points are found in much the same way, except now we set the partial derivatives equal to zero. Note: We will only consider critical points on the interior of a function's domain.
In a derivative, we only took the derivative with respect to one variable at a time. When we take the derivative separately with respect to all variables in the elements of $\mathbf{x}$ and then express the result as a vector, we use the term Gradient and Hessian.
::: {#def-gradient}
## Gradient
Given a function $f(\textbf{x})$ in $n$ variables, the gradient $\nabla f(\mathbf{x})$ (the greek letter nabla ) is a column vector, where the $i$th element is the partial derivative of $f(\textbf{x})$ with respect to $x_i$:
$$\nabla f(\mathbf{x}) = \begin{pmatrix}
\frac{\partial f(\mathbf{x})}{\partial x_1}\\ \frac{\partial f(\mathbf{x})}{\partial x_2}\\
\vdots \\ \frac{\partial f(\mathbf{x})}{\partial x_n} \end{pmatrix}$$
:::
Before we know whether a point is a maxima or minima, if it meets the FOC it is a "Critical Point".
::: {#def-criticalpoint}
## Critical Point
$\mathbf{x}^*$ is a critical point if and only if $\nabla f(\mathbf{x}^*)=0$. If the partial derivative of f(x) with respect to $x^*$ is 0, then $\mathbf{x}^*$ is a critical point. To solve for $\mathbf{x}^*$, find the gradient, set each element equal to 0, and solve the system of equations. $$\mathbf{x}^* = \begin{pmatrix} x_1^*\\x_2^*\\ \vdots \\ x_n^*\end{pmatrix}$$
:::
::: {#exm-gradcp}
Example: Given a function $f(\mathbf{x})=(x_1-1)^2+x_2^2+1$, find the (1) Gradient and (2) Critical point of $f(\mathbf{x})$.
:::
::: {#sol-gradcp}
Gradient
```{=tex}
\begin{align*}
\nabla f(\mathbf{x}) &= \begin{pmatrix}\frac{\partial f(\mathbf{x})}{\partial x_1}\\ \frac{\partial f(\mathbf{x})}{\partial x_2} \end{pmatrix}\\
&= \begin{pmatrix} 2(x_1-1)\\ 2x_2 \end{pmatrix}
\end{align*}
```
Critical Point $\mathbf{x}^* =$
```{=tex}
\begin{align*}
&\frac{\partial f(\mathbf{x})}{\partial x_1} = 2(x_1-1) = 0\\
&\Rightarrow x_1^* = 1\\
&\frac{\partial f(\mathbf{x})}{\partial x_2} = 2x_2 = 0\\
&\Rightarrow x_2^* = 0\\
\end{align*}
```
So $$\mathbf{x}^* = (1,0)$$
:::
### Second Order Conditions {.unnumbered}
When we found a critical point for a function of one variable, we used the second derivative as a indicator of the curvature at the point in order to determine whether the point was a min, max, or saddle (second derivative test of concavity). For functions of $n$ variables, we use *second order partial derivatives* as an indicator of curvature.
::: {#def-hessian}
## Hessian
Given a function $f(\mathbf{x})$ in $n$ variables, the hessian $\mathbf{H(x)}$ is an $n\times n$ matrix, where the $(i,j)$th element is the second order partial derivative of $f(\mathbf{x})$ with respect to $x_i$ and $x_j$:
$$\mathbf{H(x)}=\begin{pmatrix} \frac{\partial^2 f(\mathbf{x})}{\partial x_1^2}&\frac{\partial^2f(\mathbf{x})}{\partial x_1 \partial x_2}& \cdots & \frac{\partial^2 f(\mathbf{x})}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f(\mathbf{x})}{\partial x_2 \partial x_1}&\frac{\partial^2f(\mathbf{x})}{\partial x_2^2}&
\cdots & \frac{\partial^2 f(\mathbf{x})}{\partial x_2 \partial x_n}\\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial^2 f(\mathbf{x})}{\partial x_n \partial x_1}&\frac{\partial^2f(\mathbf{x})}{\partial x_n \partial x_2}&
\cdots & \frac{\partial^2 f(\mathbf{x})}{\partial x_n^2}\end{pmatrix}$$
:::
Note that the hessian will be a symmetric matrix because $\frac{\partial f(\mathbf{x})}{\partial x_1\partial x_2} = \frac{\partial f(\mathbf{x})}{\partial x_2\partial x_1}$.
Also note that given that $f(\mathbf{x})$ is of quadratic form, each element of the hessian will be a constant.
These definitions will be employed when we determine the **Second Order Conditions** of a function:
Given a function $f(\mathbf{x})$ and a point $\mathbf{x}^*$ such that $\nabla f(\mathbf{x}^*)=0$,
1. Hessian is Positive Definite $\quad \Longrightarrow \quad$ Strict Local Min
2. Hessian is Positive Semidefinite $\forall \mathbf{x}\in B(\mathbf{x}^*,\epsilon)$} $\quad \Longrightarrow \quad$ Local Min
3. Hessian is Negative Definite $\quad \Longrightarrow \quad$ Strict Local Max
4. Hessian is Negative Semidefinite $\forall \mathbf{x}\in B(\mathbf{x}^*,\epsilon)$} $\quad \Longrightarrow \quad$ Local Max
5. Hessian is Indefinite $\quad \Longrightarrow \quad$ Saddle Point
::: {#exm-maxmin2d}
## Max and min with two dimensions
We found that the only critical point of $f(\mathbf{x})=(x_1-1)^2+x_2^2+1$ is at $\mathbf{x}^*=(1,0)$. Is it a min, max, or saddle point?
:::
::: {#sol-maxmin2d}
The Hessian is \begin{align*}
\mathbf{H(x)} &= \begin{pmatrix} 2&0\\0&2 \end{pmatrix}
\end{align*} The Leading principal minors of the Hessian are $M_1=2; M_2=4$. Now we consider Definiteness. Since both leading principal minors are positive, the Hessian is positive definite.
Maxima, Minima, or Saddle Point? Since the Hessian is positive definite and the gradient equals 0, $x^\star = (1,0)$ is a strict local minimum.
Note: Alternate check of definiteness. Is $\mathbf{H(x^*)} \geq \leq 0 \quad \forall \quad \mathbf{x}\ne 0$
```{=tex}
\begin{align*}
\mathbf{x}^\top H(\mathbf{x}^*) \mathbf{x} &= \begin{pmatrix} x_1 & x_2 \end{pmatrix}\\
&= \begin{pmatrix} 2&0\\0&2 \end{pmatrix}\\
\begin{pmatrix} x_1\\x_2\end{pmatrix} &= 2x_1^2+2x_2^2
\end{align*}
```
For any $\mathbf{x}\ne 0$, $2(x_1^2+x_2^2)>0$, so the Hessian is positive definite and $\mathbf{x}^*$ is a strict local minimum.
:::
### Definiteness and Concavity {.unnumbered}
Although definiteness helps us to understand the curvature of an n-dimensional function, it does not necessarily tell us whether the function is globally concave or convex.
We need to know whether a function is globally concave or convex to determine whether a critical point is a global min or max. We can use the definiteness of the Hessian to determine whether a function is globally concave or convex:
1. Hessian is Positive Semidefinite $\forall \mathbf{x}$} $\quad \Longrightarrow \quad$ Globally Convex
2. Hessian is Negative Semidefinite $\forall \mathbf{x}$} $\quad \Longrightarrow \quad$ Globally Concave
Notice that the definiteness conditions must be satisfied over the entire domain.
## Global Maxima and Minima
**Global Max/Min Conditions**: Given a function $f(\mathbf{x})$ and a point $\mathbf{x}^*$ such that $\nabla f(\mathbf{x}^*)=0$,
```{=tex}
\begin{enumerate}
\item \parbox[t]{2in}{$f(\mathbf{x})$ Globally Convex} $\quad
\Longrightarrow \quad$ Global Min
\item \parbox[t]{2in}{$f(\mathbf{x})$ Globally Concave} $\quad
\Longrightarrow \quad$ Global Max
\end{enumerate}
```
Note that showing that $\mathbf{H(x^*)}$ is negative semidefinite is not enough to guarantee $\mathbf{x}^*$ is a local max. However, showing that $\mathbf{H(x)}$ is negative semidefinite for all $\mathbf{x}$ guarantees that $x^*$ is a global max. (The same goes for positive semidefinite and minima.)\\
Example: Take $f_1(x)=x^4$ and $f_2(x)=-x^4$. Both have $x=0$ as a critical point. Unfortunately, $f''_1(0)=0$ and $f''_2(0)=0$, so we can't tell whether $x=0$ is a min or max for either. However, $f''_1(x)=12x^2$ and $f''_2(x)=-12x^2$. For all $x$, $f''_1(x)\ge 0$ and $f''_2(x)\le 0$ --- i.e., $f_1(x)$ is globally convex and $f_2(x)$ is globally concave. So $x=0$ is a global min of $f_1(x)$ and a global max of $f_2(x)$.
::: {#exr-maxmin}
Given $f(\mathbf{x})=x_1^3-x_2^3+9x_1x_2$, find any maxima or minima.
:::
```{=latex}
\begin{enumerate}
\item First order conditions.
\begin{enumerate}
\item Gradient $\nabla f(\mathbf{x}) = $
$$\begin{pmatrix} \frac{\partial f}{\partial x_1} \\
\frac{\partial f}{\partial x_2}\end{pmatrix} =
\begin{pmatrix} 3x_1^2+9x_2 \\ -3x_2^2+9x_1 \end{pmatrix}$$
\item Critical Points $\mathbf{x^*} =$\\
Set the gradient equal to zero and solve for $x_1$ and
$x_2$.We have two equations and two unknowns. Solving for $x_1$
and $x_2$, we get two critical points: $\mathbf{x}_1^*=(0,0)$ and
$\mathbf{x}_2^*=(3,-3)$.
$$3x_1^2 + 9x_2 = 0 \quad \Rightarrow \quad 9x_2 =
-3x_1^2 \quad \Rightarrow \quad x_2 = -\frac{1}{3}x_1^2$$
$$-3x_2^2 + 9x_1 = 0 \quad \Rightarrow \quad
-3(-\frac{1} {3}x_1^2)^2 + 9x_1 = 0$$
$$\Rightarrow \quad -\frac{1}{3}x_1^4
+ 9x_1 = 0 \quad \Rightarrow \quad x_1^3 = 27x_1 \quad
\Rightarrow \quad x_1 = 3$$
$$3(3)^2 + 9x_2 = 0 \quad \Rightarrow \quad x_2 = -3$$
\end{enumerate}
\item Second order conditions.
\begin{enumerate}
\item Hessian $\mathbf{H(x)} = $
$$\begin{pmatrix} 6x_1&9\\9&-6x_2 \end{pmatrix}$$
\item Hessian $\mathbf{H(x_1^*)} = $
$$\begin{pmatrix} 0&9\\9&0\end{pmatrix}$$
\item Leading principal minors of $\mathbf{H(x_1^*)} = $
$$M_1=0; M_2=-81$$\\
\item Definiteness of $\mathbf{H(x_1^*)}$?\\
$\mathbf{H(x_1^*)}$ is indefinite\\
\item Maxima, Minima, or Saddle Point for $\mathbf{x_1^*}$?\\
Since $\mathbf{H(x_1^*)}$ is indefinite, $\mathbf{x}_1^*=(0,0)$
is a saddle point.\\
\item Hessian $\mathbf{H(x_2^*)} = $
$$\begin{pmatrix} 18&9\\9&18\end{pmatrix}$$
\item Leading principal minors of $\mathbf{H(x_2^*)} = $
$$M_1=18; M_2=243$$\\
\item Definiteness of $\mathbf{H(x_2^*)}$?\\
$\mathbf{H(x_2^*)}$ is positive definite\\
\item Maxima, Minima, or Saddle Point for $\mathbf{x}_2^*$?\\
Since $\mathbf{H(x_2^*)}$ is positive definite, $\mathbf{x}_1^*=(3,-3)$ is a strict local minimum\\
\end{enumerate}
\item Global concavity/convexity.
\begin{enumerate}
\item Is f(x) globally concave/convex?\\
No. In evaluating the Hessians for $\mathbf{x}_1^*$
and $\mathbf{x}_2^*$ we saw that the Hessian is not positive
semidefinite at x $=$ (0,0).\\
\item Are any $\mathbf{x^*}$ global minima or maxima?\\
No. Since the function is not globally concave/convex,
we can't infer that $\mathbf{x}_2^*=(3,-3)$ is a global minimum.
In fact, if we set $x_1=0$, the $f(\mathbf{x})=-x_2^3$, which will
go to $-\infty$ as $x_2\to \infty$.\\
\end{enumerate}
\end{enumerate}
```
## Constrained Optimization
We have already looked at optimizing a function in one or more dimensions over the whole domain of the function. Often, however, we want to find the maximum or minimum of a function over some restricted part of its domain.
ex: Maximizing utility subject to a budget constraint
![A typical Utility Function with a Budget Constraint](images/constraint.png)
**Types of Constraints**: For a function $f(x_1, \dots, x_n)$, there are two types of constraints that can be imposed:
```{=tex}
\begin{enumerate}
\item \textbf{Equality constraints:} constraints of the form $c(x_1,\dots, x_n) = r$.
Budget constraints are the classic example of equality constraints in social science.
\item \textbf{Inequality constraints:} constraints of the form $c(x_1, \dots, x_n) \leq r$. These might arise from non-negativity constraints or other threshold effects.
\end{enumerate}
```
In any constrained optimization problem, the constrained maximum will always be less than or equal to the unconstrained maximum. If the constrained maximum is less than the unconstrained maximum, then the constraint is binding. Essentially, this means that you can treat your constraint as an equality constraint rather than an inequality constraint.
For example, the budget constraint binds when you spend your entire budget. This generally happens because we believe that utility is strictly increasing in consumption, i.e. you always want more so you spend everything you have.
Any number of constraints can be placed on an optimization problem. When working with multiple constraints, always make sure that the set of constraints are not pathological; it must be possible for all of the constraints to be satisfied simultaneously.
\textbf{Set-up for Constrained Optimization:}
$$\max_{x_1,x_2} f(x_1,x_2) \text{ s.t. } c(x_1,x_2)$$ $$\min_{x_1,x_2} f(x_1,x_2) \text{ s.t. } c(x_1,x_2)$$
This tells us to maximize/minimize our function, $f(x_1,x_2)$, with respect to the choice variables, $x_1,x_2$, subject to the constraint.
Example:
$$\max_{x_1,x_2} f(x_1, x_2) = -(x_1^2 + 2x_2^2) \text{ s.t. }x_1 + x_2 = 4$$
It is easy to see that the \textit{unconstrained} maximum occurs at $(x_1, x_2) = (0,0)$, but that does not satisfy the constraint. How should we proceed?
### Equality Constraints {.unnumbered}
Equality constraints are the easiest to deal with because we know that the maximum or minimum has to lie on the (intersection of the) constraint(s).
The trick is to change the problem from a constrained optimization problem in $n$ variables to an unconstrained optimization problem in $n + k$ variables, adding *one* variable for *each* equality constraint. We do this using a lagrangian multiplier.
**Lagrangian function**: The Lagrangian function allows us to combine the function we want to optimize and the constraint function into a single function. Once we have this single function, we can proceed as if this were an *unconstrained* optimization problem.
For each constraint, we must include a **Lagrange multiplier** ($\lambda_i$) as an additional variable in the analysis. These terms are the link between the constraint and the Lagrangian function.
Given a *two dimensional* set-up: $$\max_{x_1,x_2}/\min_{x_1,x_2} f(x_1,x_2) \text{ s.t. } c(x_1,x_2) = a$$
We define the Lagrangian function $L(x_1,x_2,\lambda_1)$ as follows: $$L(x_1,x_2,\lambda_1) = f(x_1,x_2) - \lambda_1 (c(x_1,x_2) - a)$$
More generally, in *n dimensions*: $$ L(x_1, \dots, x_n, \lambda_1, \dots, \lambda_k) = f(x_1, \dots, x_n) - \sum_{i=1}^k\lambda_i(c_i(x_1,\dots, x_n) - r_i)$$
**Getting the sign right:** Note that above we subtract the lagrangian term *and* we subtract the constraint constant from the constraint function. Occasionally, you may see the following alternative form of the Lagrangian, which is *equivalent*: $$ L(x_1, \dots, x_n, \lambda_1, \dots, \lambda_k) = f(x_1, \dots, x_n) + \sum_{i=1}^k\lambda_i(r_i - c_i(x_1,\dots, x_n))$$ Here we add the lagrangian term *and* we subtract the constraining function from the constraint constant.
**Using the Lagrangian to Find the Critical Points**: To find the critical points, we take the partial derivatives of lagrangian function, $L(x_1, \dots, x_n, \lambda_1, \dots, \lambda_k)$, with respect to each of its variables (all choice variables $\mathbf{x}$ *and* all lagrangian multipliers $\mathbf{\lambda}$). At a critical point, each of these partial derivatives must be equal to zero, so we obtain a system of $n + k$ equations in $n + k$ unknowns:
```{=tex}
\begin{align*}
\frac{\partial L}{\partial x_1} &= \frac{\partial f}{\partial x_1} - \sum_{i = 1}^k\lambda_i\frac{\partial c_i}{\partial x_1} = 0\\
\vdots &= \vdots \nonumber \\
\frac{\partial L}{\partial x_n} &= \frac{\partial f}{\partial x_n} - \sum_{i = 1}^k\lambda_i\frac{\partial c_i}{\partial x_n} = 0\\
\frac{\partial L}{\partial \lambda_1} &= c_1(x_i, \dots, x_n) - r_1 = 0\\
\vdots &= \vdots \nonumber \\
\frac{\partial L}{\partial \lambda_k} &= c_k(x_i, \dots, x_n) - r_k = 0
\end{align*}
```
We can then solve this system of equations, because there are $n+k$ equations and $n+k$ unknowns, to calculate the critical point $(x_1^*,\dots,x_n^*,\lambda_1^*,\dots,\lambda_k^*)$.
**Second-order Conditions and Unconstrained Optimization:** There may be more than one critical point, i.e. we need to verify that the critical point we find is a maximum/minimum. Similar to unconstrained optimization, we can do this by checking the second-order conditions.
::: {#exm-constropt}
## Constrained optimization with two goods and a budget constraint
Find the constrained optimization of $$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2) \text{ s.t. } x_1 + x_2 = 4$$
:::
::: {#sol-constropt}
1. Begin by writing the Lagrangian: $$L(x_1, x_2, \lambda) = -(x_1^2 + 2x_2^2) - \lambda(x_1 + x_2 - 4)$$
2. Take the partial derivatives and set equal to zero:
```{=tex}
\begin{align*}
\frac{\partial L}{\partial x_1} = -2x_1 - \lambda \quad \quad \quad &= 0\\
\frac{\partial L}{\partial x_2} = -4x_2 - \lambda \quad \quad \quad &= 0\\
\frac{\partial L}{\partial \lambda} = -(x_1 + x_2 - 4) \quad & = & 0\\
\end{align*}
```
3. Solve the system of equations: Using the first two partials, we see that $\lambda = -2x_1$ and $\lambda = -4x_2$ Set these equal to see that $x_1 = 2x_2$. Using the third partial and the above equality, $4 = 2x_2 + x_2$ from which we get $$x_2^* = 4/3, x_1^* = 8/3, \lambda = -16/3$$
4. Therefore, the only critical point is $x_1^* = \frac{8}{3}$ and $x_2^* = \frac{4}{3}$
5. This gives $f(\frac{8}{3}, \frac{4}{3}) = -\frac{96}{9}$, which is less than the unconstrained optimum $f(0,0) = 0$
:::
Notice that when we take the partial derivative of L with respect to the Lagrangian multiplier and set it equal to 0, we return exactly our constraint! This is why signs matter.
## Inequality Constraints
Inequality constraints define the boundary of a region over which we seek to optimize the function. This makes inequality constraints more challenging because we do not know if the maximum/minimum lies along one of the constraints (the constraint binds) or in the interior of the region.
We must introduce more variables in order to turn the problem into an unconstrained optimization.
**Slack:** For each inequality constraint $c_i(x_1, \dots, x_n) \leq a_i$, we define a slack variable $s_i^2$ for which the expression $c_i(x_1, \dots, x_n) \leq a_i - s_i^2$ would hold with equality. These slack variables capture how close the constraint comes to binding. We use $s^2$ rather than $s$ to ensure that the slack is positive.
Slack is just a way to transform our constraints.
Given a two-dimensional set-up and these edited constraints: $$\max_{x_1,x_2}/\min_{x_1,x_2} f(x_1,x_2) \text{ s.t. } c(x_1,x_2) \le a_1$$
Adding in Slack: $$\max_{x_1,x_2}/\min_{x_1,x_2} f(x_1,x_2) \text{ s.t. } c(x_1,x_2) \le a_1 - s_1^2$$
We define the Lagrangian function $L(x_1,x_2,\lambda_1,s_1)$ as follows: $$L(x_1,x_2,\lambda_1,s_1) = f(x_1,x_2) - \lambda_1 ( c(x_1,x_2) + s_1^2 - a_1)$$
More generally, in n dimensions: $$ L(x_1, \dots, x_n, \lambda_1, \dots, \lambda_k, s_1, \dots, s_k) = f(x_1, \dots, x_n) - \sum_{i = 1}^k \lambda_i(c_i(x_1,\dots, x_n) + s_i^2 - a_i)$$
**Finding the Critical Points**: To find the critical points, we take the partial derivatives of the lagrangian function, $L(x_1,\dots,x_n,\lambda_1,\dots,\lambda_k,s_1,\dots,s_k)$, with respect to each of its variables (all choice variables $x$, all lagrangian multipliers $\lambda$, and all slack variables $s$). At a critical point, *each* of these partial derivatives must be equal to zero, so we obtain a system of $n + 2k$ equations in $n + 2k$ unknowns:
```{=tex}
\begin{align*}
\frac{\partial L}{\partial x_1} &= \frac{\partial f}{\partial x_1} - \sum_{i = 1}^k\lambda_i\frac{\partial c_i}{\partial x_1} = 0\\
\vdots & = \vdots \\
\frac{\partial L}{\partial x_n} &= \frac{\partial f}{\partial x_n} - \sum_{i = 1}^k\lambda_i\frac{\partial c_i}{\partial x_n} = 0\\
\frac{\partial L}{\partial \lambda_1} &= c_1(x_i, \dots, x_n) + s_1^2 - b_1 = 0\\
\vdots & = \vdots \\
\frac{\partial L}{\partial \lambda_k} &= c_k(x_i, \dots, x_n) + s_k^2 - b_k = 0\\
\frac{\partial L}{\partial s_1} &= 2s_1\lambda_1 = 0\\
\vdots =\vdots \\
\frac{\partial L}{\partial s_k} &= 2s_k\lambda_k = 0
\end{align*}
```
**Complementary slackness conditions**: The last set of first order conditions of the form $2s_i\lambda_i = 0$ (the partials taken with respect to the slack variables) are known as complementary slackness conditions. These conditions can be satisfied one of three ways:
1. $\lambda_i = 0$ and $s_i \neq 0$: This implies that the slack is positive and thus *the constraint does not bind*.
2. $\lambda_i \neq 0$ and $s_i = 0$: This implies that there is no slack in the constraint and *the constraint does bind*.
3. $\lambda_i = 0$ and $s_i = 0$: In this case, there is no slack but the *constraint binds trivially*, without changing the optimum.
Example: Find the critical points for the following constrained optimization:
$$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2) \text{ s.t. } x_1 + x_2 \le 4$$
1. Rewrite with the slack variables:
$$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2) \text{ s.t. } x_1 + x_2 \le 4 - s_1^2$$
2. Write the Lagrangian:
$$L(x_1,x_2,\lambda_1,s_1) = -(x_1^2 + 2x_2^2) - \lambda_1 (x_1 + x_2 + s_1^2 - 4)$$
3. Take the partial derivatives and set equal to 0:
```{=tex}
\begin{align*}
\frac{\partial L}{\partial x_1} = -2x_1 - \lambda_1 &= 0\\
\frac{\partial L}{\partial x_2} = -4x_2 - \lambda_1 &= 0\\
\frac{\partial L}{\partial \lambda_1} = -(x_1 + x_2 + s_1^2 - 4)&= 0\\
\frac{\partial L}{\partial s_1} = -2s_1\lambda_1 &= 0\\
\end{align*}
```
4. Consider all ways that the complementary slackness conditions are solved:
```{=tex}
\begin{center}
\begin{tabular}{|l|cccc|c|}
\hline
Hypothesis & $s_1$ & $\lambda_1$ & $x_1$ & $x_2$ & $f(x_1, x_2)$\\
\hline
$s_1 = 0$ $\lambda_1 = 0$ & \multicolumn{4}{l|}{No solution} & \\
$s_1 \neq 0$ $\lambda_1 = 0$ & 2 & 0 & 0 & 0 & 0\\
$s_1 = 0$ $\lambda_1 \neq 0$ & 0 & $\frac{-16}{3}$ & $\frac{8}{3}$ & $\frac{4}{3}$ & $-\frac{32}{3}$\\
$s_1 \neq 0$ $\lambda_1 \neq 0$ & \multicolumn{4}{l|}{No solution} &\\
\hline
\end{tabular}
\end{center}
```
This shows that there are two critical points: $(0,0)$ and $(\frac{8}{3},\frac{4}{3})$.
5. Find maximum: Looking at the values of $f(x_1,x_2)$ at the critical points, we see that $f(x_1,x_2)$ is maximized at $x_1^* = 0$ and $x_2^*=0$.
::: {#exr-critical-points-constrained-optimization}
Example: Find the critical points for the following constrained optimization: $$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2) \text{ s.t. }
\begin{array}{l}
x_1 + x_2 \le 4\\
x_1 \ge 0\\
x_2 \ge 0
\end{array}$$
:::
1. Rewrite with the slack variables:
```{=latex}
$$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2) \text{ s.t. }
\begin{array}{l}
x_1 + x_2 \le 4 - s_1^2\\
-x_1 \le 0 - s_2^2\\
-x_2 \le 0 - s_3^2
\end{array}$$
```
2. Write the Lagrangian:
$$L(x_1, x_2, \lambda_1, \lambda_2, \lambda_3, s_1, s_2, s_3) = -(x_1^2 + 2x_2^2) - \lambda_1(x_1 + x_2 + s_1^2 - 4) - \lambda_2(-x_1 + s_2^2) - \lambda_3(-x_2 + s_3^2)$$ 3. Take the partial derivatives and set equal to zero:
$\frac{\partial L}{\partial x_1} = -2x_1 - \lambda_1 + \lambda_2 = 0$\\ $\frac{\partial L}{\partial x_2} = -4x_2 - \lambda_1 + \lambda_3 = 0$\\ $\frac{\partial L}{\partial \lambda_1} = -(x_1 + x_2 + s_1^2 - 4) = 0$\\ $\frac{\partial L}{\partial \lambda_2} = -(-x_1 + s_2^2) = 0$\\ $\frac{\partial L}{\partial \lambda_3} = -(-x_2 + s_3^2) = 0$\\ $\frac{\partial L}{\partial s_1} = 2s_1\lambda_1 = 0$\\ $\frac{\partial L}{\partial s_2} = 2s_2\lambda_2 = 0$\\ $\frac{\partial L}{\partial s_3} = 2s_3\lambda_3 = 0$
3. Consider all ways that the complementary slackness conditions are solved:
```{=tex}
\begin{center}
\begin{tabular}{|l|cccccccc|c|}
\hline
Hypothesis & $s_1$ & $s_2$ & $s_3$ & $\lambda_1$ & $\lambda_2$ & $\lambda_3$ & $x_1$ & $x_2$ & $f(x_1, x_2)$\\
\hline
$s_1 = s_2 = s_3 = 0$ & \multicolumn{8}{l|}{No solution} & \\
$s_1 \neq 0, s_2 = s_3 = 0$ & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
$s_2 \neq 0, s_1 = s_3 = 0$ & 0 & 2 & 0 & -8 & 0 & -8 & 4 & 0 & -16\\
$s_3 \neq 0, s_1 = s_2 = 0$ & 0 & 0 & 2 & -16 & -16 & 0 & 0 & 4 & -32\\
$s_1 \neq 0, s_2 \neq 0, s_3 = 0$ &\multicolumn{8}{l|}{No solution} & \\
$s_1 \neq 0, s_3 \neq 0, s_2 = 0$ &\multicolumn{8}{l|}{No solution} & \\
$s_2 \neq 0, s_3 \neq 0, s_1 = 0$ &0 & $\sqrt{\frac{8}{3}}$ & $\sqrt{\frac{4}{3}}$ & $-\frac{16}{3}$ & 0 & 0 & $\frac{8}{3}$& $\frac{4}{3}$ & $-\frac{32}{3}$\\
$s_1 \neq 0, s_2 \neq 0, s_3 \neq 0$ &\multicolumn{8}{l|}{No solution}& \\
\hline
\end{tabular}
\end{center}
```
This shows that there are four critical points: $(0,0)$, $(4,0)$, $(0,4)$, and $(\frac{8}{3},\frac{4}{3})$
4. Find maximum: Looking at the values of $f(x_1,x_2)$ at the critical points, we see that the constrained maximum is located at $(x_1, x_2) = (0,0)$, which is the same as the unconstrained max. The constrained minimum is located at $(x_1, x_2) = (0,4)$, while there is no unconstrained minimum for this problem.
## Kuhn-Tucker Conditions
As you can see, this can be a pain. When dealing explicitly with *non-negativity constraints*, this process is simplified by using the Kuhn-Tucker method.
Because the problem of maximizing a function subject to inequality and non-negativity constraints arises frequently in economics, the **Kuhn-Tucker conditions** provides a method that often makes it easier to both calculate the critical points and identify points that are (local) maxima.
Given a *two-dimensional set-up*: $$\max_{x_1,x_2}/\min_{x_1,x_2} f(x_1,x_2) \text{ s.t. }
\begin{array}{l}
c(x_1,x_2) \le a_1\\
x_1 \ge 0 \\
gx_2 \ge 0
\end{array}$$
We define the Lagrangian function $L(x_1,x_2,\lambda_1)$ the same as if we did not have the non-negativity constraints: $$L(x_1,x_2,\lambda_2) = f(x_1,x_2) - \lambda_1(c(x_1,x_2) - a_1)$$
More generally, in n dimensions: $$ L(x_1, \dots, x_n, \lambda_1, \dots, \lambda_k) = f(x_1, \dots, x_n) - \sum_{i=1}^k\lambda_i(c_i(x_1,\dots, x_n) - a_i)$$
**Kuhn-Tucker and Complementary Slackness Conditions**: To find the critical points, we first calculate the Kuhn-Tucker conditions by taking the partial derivatives of the lagrangian function, $L(x_1,\dots,x_n,\lambda_1,\dots,\lambda_k)$, with respect to each of its variables (all choice variable s$x$ and all lagrangian multipliers $\lambda$) and we calculate the *complementary slackness conditions* by multiplying each partial derivative by its respective variable *and* include non-negativity conditions for all variables (choice variables $x$ and lagrangian multipliers $\lambda$).
**Kuhn-Tucker Conditions**
```{=tex}
\begin{align*}
\frac{\partial L}{\partial x_1} \leq 0, & \dots, \frac{\partial L}{\partial x_n} \leq 0\\
\frac{\partial L}{\partial \lambda_1} \geq 0, & \dots, \frac{\partial L}{\partial \lambda_m} \geq 0
\end{align*}
```
**Complementary Slackness Conditions**
```{=tex}
\begin{align*}
x_1\frac{\partial L}{\partial x_1} = 0, & \dots, x_n\frac{\partial L}{\partial x_n} = 0\\
\lambda_1\frac{\partial L}{\partial \lambda_1} = 0, & \dots, \lambda_m \frac{\partial L}{\partial \lambda_m} = 0
\end{align*}
```
**Non-negativity Conditions** \begin{eqnarray*}
x_1 \geq 0 & \dots & x_n \geq 0\\
\lambda_1 \geq 0 & \dots & \lambda_m \geq 0
\end{eqnarray*}
Note that some of these conditions are set equal to 0, while others are set as inequalities!
Note also that to minimize the function $f(x_1, \dots, x_n)$, the simplest thing to do is maximize the function $-f(x_1, \dots, x_n)$; all of the conditions remain the same after reformulating as a maximization problem.
There are additional assumptions (notably, f(x) is quasi-concave and the constraints are convex) that are sufficient to ensure that a point satisfying the Kuhn-Tucker conditions is a global max; if these assumptions do not hold, you may have to check more than one point.
**Finding the Critical Points with Kuhn-Tucker Conditions**: Given the above conditions, to find the critical points we solve the above system of equations. To do so, we must check \textit{all} border and interior solutions to see if they satisfy the above conditions.
In a two-dimensional set-up, this means we must check the following cases:
1. $x_1 = 0, x_2 = 0$ Border Solution
2. $x_1 = 0, x_2 \neq 0$ Border Solution
3. $x_1 \neq 0, x_2 = 0$ Border Solution
4. $x_1 \neq 0, x_2 \neq 0$ Interior Solution
::: {#exm-k-t-2}
## Kuhn-Tucker with two variables
Solve the following optimization problem with inequality constraints $$\max_{x_1,x_2} f(x) = -(x_1^2 + 2x_2^2)$$
```{=tex}
\begin{align*}
\text{ s.t. }
\begin{cases}
&x_1 + x_2 *\le 4\\
&x_1 *\ge 0\\
&x_2 *\ge 0
\end{cases}
\end{align*}
```
:::
1. Write the Lagrangian: $$L(x_1, x_2, \lambda) = -(x_1^2 + 2x_2^2) - \lambda(x_1 + x_2 - 4)$$
2. Find the First Order Conditions:
Kuhn-Tucker Conditions \begin{align*}
\frac{\partial L}{\partial x_1} = -2x_1 - \lambda &\leq 0\\
\frac{\partial L}{\partial x_2} = -4x_2 - \lambda & \leq 0\\
\frac{\partial L}{\partial \lambda} = -(x_1 + x_2 - 4)& \geq 0
\end{align*}
Complementary Slackness Conditions \begin{align*}
x_1\frac{\partial L}{\partial x_2} = x_1(-2x_1 - \lambda) &= 0\\
x_2\frac{\partial L}{\partial x_2} = x_2(-4x_2 - \lambda) &= 0\\
\lambda\frac{\partial L}{\partial \lambda} = -\lambda(x_1 + x_2 - 4)&= 0
\end{align*}
Non-negativity Conditions \begin{align*}
x_1 & \geq 0\\
x_2 & \geq 0\\
\lambda & \geq 0
\end{align*}
3. Consider all border and interior cases:
```{=tex}
\begin{center}
\begin{tabular}{|l|ccc|c|}
\hline
Hypothesis & $\lambda$& $x_1$ & $x_2$ & $f(x_1, x_2)$\\
\hline
$x_1 = 0, x_2 = 0$ &0 & 0 & 0 & 0\\
$x_1 = 0, x_2 \neq 0$ &-16 & 0 & 4 & -32\\
$x_1 \neq 0, x_2 = 0$ &-8 & 4 & 0 & -16\\
$x_1 \neq 0, x_2 \neq 0$ & $-\frac{16}{3}$ & $\frac{8}{3}$ & $\frac{4}{3}$ & $-\frac{32}{3}$\\
\hline
\end{tabular}
\end{center}
```
4. Find Maximum: Three of the critical points violate the requirement that $\lambda \geq 0$, so the point $(0,0,0)$ is the maximum.
::: {#exr-ktlogs}
## Kuhn-Tucker with logs
Solve the constrained optimization problem,
$$\max_{x_1,x_2} f(x) = \frac{1}{3}\log (x_1 + 1) + \frac{2}{3}\log (x_2 + 1) \text{ s.t. }
\begin{array}{l}
x_1 + 2x_2 \leq 4\\
x_1 \geq 0\\
x_2 \geq 0
\end{array}$$
:::
1. Write the Lagrangian: $$L(x_1, x_2, \lambda) = \frac{1}{3}\log(x_1+1) + \frac{2}{3}\log(x_2+1) - \lambda(x_1 + 2x_2 - 4)$$
2. Find the First Order Conditions:
Kuhn-Tucker Conditions
$\frac{\partial L}{\partial x_1} = \frac{1}{3(x_1+1)} - \lambda \leq 0$\\ $\frac{\partial L}{\partial x_2} = \frac{2}{3(x_2+1)} - \lambda \leq 0$\\ $\frac{\partial L}{\partial \lambda} = -(x_1 + 2x_2 - 4) \geq 0$\\
Complementary Slackness Conditions
$x_1\frac{\partial L}{\partial x_2} = x_1(\frac{1}{3(x_1+1)} - \lambda) = 0$\\ $x_2\frac{\partial L}{\partial x_2} = x_2(\frac{2}{3(x_2+1)} - \lambda) = 0$\\ $\lambda\frac{\partial L}{\partial \lambda} = -\lambda(x_1 + 2x_2 - 4) = 0$\\
Non-negativity Conditions
$x_1 \geq 0$\\ $x_2 \geq 0$\\ \$\lambda \\geq \$0\\
3. Consider all border and interior cases:
```{=tex}
\begin{center}
\begin{tabular}{|l|ccc|c|}
\hline
Hypothesis & $\lambda$& $x_1$ & $x_2$ & $f(x_1, x_2)$\\
\hline
$x_1 = 0, x_2 = 0$ &\multicolumn{3}{l|}{No Solution}& \\
$x_1 = 0, x_2 \neq 0$ &\multicolumn{3}{l|}{No Solution}& \\
$x_1 \neq 0, x_2 = 0$ & \multicolumn{3}{l|}{No Solution}& \\
$x_1 \neq 0, x_2 \neq 0$ & & $\frac{4}{3}$ & $\frac{4}{3}$ & $\log\frac{7}{3}$\\
\hline
\end{tabular}
\end{center}
```
4. Find Maximum:
Three of the critical points violate the constraints, so the point $(\frac{4}{3},\frac{4}{3})$ is the maximum.\\
## Applications of Quadratic Forms
**Curvature and The Taylor Polynomial as a Quadratic Form**: The Hessian is used in a Taylor polynomial approximation to $f(\mathbf{x})$ and provides information about the curvature of $f({\bf x})$ at $\mathbf{x}$ --- e.g., which tells us whether a critical point $\mathbf{x}^*$ is a min, max, or saddle point.
1. The second order Taylor polynomial about the critical point ${\bf x}^*$ is $$f({\bf x}^*+\bf h)=f({\bf x}^*)+\nabla f({\bf x}^*) \bf h +\frac{1}{2} \bf h^\top {\bf H(x^*)} \bf h + R(\bf h)$$
2. Since we're looking at a critical point, $\nabla f({\bf x}^*)=0$; and for small $\bf h$, $R(\bf h)$ is negligible. Rearranging, we get $$f({\bf x}^*+\bf h)-f({\bf x}^*)\approx \frac{1}{2} \bf h^\top {\bf H(x^*)} \bf h $$
3. The Righthand side here is a quadratic form and we can determine the definiteness of $\bf H(x^*)$.