generated from hankyang94/OptimalControlEstimation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-semidefinite-optimization.Rmd
1433 lines (1246 loc) · 60.7 KB
/
02-semidefinite-optimization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Semidefinite Optimization {#sdp}
## Positive Semidefinite Matrices
A real matrix $A = (A_{ij}) \in \Real{n \times n}$ is symmetric if $A = A\tran$, i.e., $A_{ij} = A_{ji}$ for all $i,j$. Let $\sym{n}$ be the space of all real symmetric matrices.
Any symmetric matrix $A$ defines a **quadratic form** $x\tran A x$. A matrix $A$ is said to be **positive semidefinite** (PSD) if and only if its associated quadratic form is nonnegative, i.e.,
$$
x\tran A x \geq 0, \quad \forall x \in \Real{n}.
$$
We use $\psd{n}$ to denote the set of $n\times n$ PSD matrices. We also write $A \succeq 0$ to denote positive semidefiniteness when the dimension is clear.
There are several equivalent characterizations of positive semidefiniteness.
::: {.theorembox}
::: {.lemma #PositiveSemidefinite name="Positive Semidefinite Matrices"}
Let $A \in \sym{n}$ be a symmetric matrix, the following statements are equivalent:
1. A is positive semidefinite.
2. $x\tran A x \geq 0, \forall x \in \Real{n}$.
3. All eigenvalues of $A$ are nonnegative.
4. All $2^n-1$ principal minors of $A$ are nonnegative.
5. The coefficients of $p_A(\lambda)$ weakly alternate in sign, i.e., $(-1)^{n-k} p_k \geq 0$ for $k=0,\dots,n-1$, where $p_A(\lambda) = \det (A - \lambda \eye_n)$ is the characteristics polynomial of $A$.
6. There exists a factorization $A = BB\tran$, where $B \in \Real{n \times r}$ with $r$ the rank of $A$.
:::
:::
Among the equivalent characterizations of PSD matrices, (5) is less well-known, but it can be very useful when we want to convert a PSD constraint into multiple scalar constraints. For example, consider the following subset of $\Real{3}$:
$$
\lcbrace{z \in \Real{3} \lmid X(z) = \begin{bmatrix} 1 & z_1 & z_2 \\ z_1 & z_2 & z_3 \\ z_2 & z_3 & 5 z_2 - 4 \end{bmatrix} \succeq 0 }.
$$
We can first form the characteristic polynomial of $X(z)$ --whose coefficients will be functions of $z$-- and then invoking (5) to obtain a finite number scalar inequality constraints. We can then pass these scalar constraints to Mathematica and plot the set as in the following figure [@yang22mp-inexact].
```{r SpectrahedronStride, out.width='50%', fig.show='hold', fig.cap='An example spectrahedron.', fig.align='center', echo=FALSE}
knitr::include_graphics("images/spectrahedron-stride.png")
```
Similarly, we say a matrix $A \in \sym{n}$ is **positive definite** (PD) is its associated quadratic form is always positive, i.e.,
$$
x\tran A x > 0, \quad \forall x \in \Real{n}.
$$
We use $\pd{n}$ to denote the set of $n \times n$ PD matrices, and also write $A \succ 0$ when the dimension is clear.
Below is set of equivalent characterizations of positive definite matrices.
::: {.theorembox}
::: {.lemma #PositiveDefinite name="Positive Definite Matrices"}
Let $A \in \sym{n}$ be a symmetric matrix, the following statements are equivalent:
1. A is positive definite.
2. $x\tran A x > 0, \forall x \in \Real{n}$.
3. All eigenvalues of $A$ are strictly positive.
4. All $n$ leading principal minors of $A$ are strictly positive.
5. The coefficients of $p_A(\lambda)$ strictly alternate in sign, i.e., $(-1)^{n-k} p_k > 0$ for $k=0,\dots,n-1$, where $p_A(\lambda) = \det (A - \lambda \eye_n)$ is the characteristics polynomial of $A$.
6. There exists a factorization $A = BB\tran$ with $B$ square and nonsingular (full-rank).
:::
:::
**Schur Complements**. A useful technique to check whether a matrix is positive (semi-)definite is to use the Schur Complements. Consider a block-partitioned matrix
\begin{equation}
M = \begin{bmatrix} A & B \\ B\tran & C \end{bmatrix},
(\#eq:block-mat-M)
\end{equation}
where $A$ and $C$ are symmetric matrices.
If $A$ is invertible, then the Schur complement of $A$ is
$$
M / A = C - B\tran A\inv B.
$$
Similarly, if $C$ is invertible, then the Schur complement of $C$ is
$$
M / C = A - B C\inv B\tran.
$$
We have the following result relating the Schur Complements to positive (semi-)definiteness.
::: {.theorembox}
::: {.proposition #SchurPSD name="Schur Complements and PSD"}
Consider the block-partitioned matrix $M$ in \@ref(eq:block-mat-M),
- $M$ is positive definite if and only if both $A$ and $M/A$ are positive definite:
$$
M \succ 0 \Leftrightarrow A \succ 0, M/A = C - B\tran A\inv B \succ 0.
$$
- $M$ is positive definite if and only if both $C$ and $M/C$ are positive definite:
$$
M \succ 0 \Leftrightarrow C \succ 0, M/C = A - B C\inv B\tran \succ 0.
$$
- If $A$ is positive definite, then $M$ is positive semidefinite if and only if $M/A$ is positive semidefinite:
$$
\text{If } A \succ 0, \text{ then } M \succeq 0 \Leftrightarrow M / A \succeq 0.
$$
- If $C$ is positive definite, then $M$ is positive semidefinite if and only if $M/C$ is positive semidefinite:
$$
\text{If } C \succ 0, \text{ then } M \succeq 0 \Leftrightarrow M / C \succeq 0.
$$
:::
:::
### Geometric Properties
The set $\psd{n}$ is a proper cone (cf. Definition \@ref(def:ProperCone)). Its interior is $\pd{n}$. Under the inner product
$$
\inprod{A}{B} = \trace(AB\tran), \quad A,B \in \Real{n \times n},
$$
the PSD cone $\psd{n}$ is self-dual.
Next we want to characterize the face of the PSD cone. We first present the following lemma which will turn out to be useful afterwards.
::: {.theorembox}
::: {.lemma #PSDRange name="Range of PSD Matrices"}
Let $A,B \in \psd{n}$, then we have
\begin{equation}
\Range(A) \subseteq \Range(A + B),
(\#eq:Range-PSD)
\end{equation}
where $\Range(A)$ denotes the span of the column vectors of $A$.
:::
:::
::: {.proofbox}
::: {.proof}
For any symmetric matrix $S$, we know
$$
\Range(S) = \ker(S)^{\perp}.
$$
Therefore, to prove \@ref(eq:Range-PSD), it is equivalent to prove
$$
\ker(A) \supseteq \ker(A + B).
$$
Pick any $u \in \ker(A + B)$, we have
$$
(A + B) u = 0 \Rightarrow u \tran (A + B) u = 0 \Rightarrow u\tran A u + u\tran B u = 0 \Rightarrow u\tran A u = u\tran B u = 0,
$$
where the last derivation is due to $A, B \succeq 0$. Now that we have $u\tran A u = 0$, we claim that $Au = 0$ must hold, i.e., $u \in \ker(A)$. To see this, write
$$
u = \sum_{i=1}^n a_i v_i,
$$
where $a_i = \inprod{u}{v_i}$ and $v_i,i=1,\dots,n$ are the eigenvectors of $A$ corresponding to eigenvalues $\lambda_i,i=1,\dots,n$. Then we have
$$
Au = \sum_{i=1}^n a_i A v_i = \sum_{i=1}^n a_i \lambda_i v_i,
$$
and
$$
u\tran A u = \sum_{i=1}^n \lambda_i a_i^2 = 0.
$$
Since $\lambda_i \geq 0, a_i^2 \geq 0$, we have
$$
\lambda_i a_i^2 = 0, \forall i = 1,\dots,n.
$$
This indicates that if $\lambda_i > 0$, then $a_i = 0$. Therefore, $a_i$ can only be nonzero for $\lambda_i = 0$, which leads to
$$
Au = \sum_{i=1}^n a_i \lambda_i v_i = 0.
$$
Therefore, $u \in \ker(A)$, proving the result.
:::
:::
Lemma \@ref(lem:PSDRange) indicates that if $A \succeq B$, then $\Range(B) \subseteq \Range(A)$. What about the reverse?
::: {.theorembox}
::: {.lemma #Extension name="Extend Line Segment"}
Let $A,B \in \psd{n}$, if $\Range(B) \subseteq \Range(A)$, then there must exist $C \in \psd{n}$ such that
$$
A \in (B,C),
$$
i.e., the line segment from $B$ to $A$ can be extended past $A$ within $\psd{n}$.
:::
:::
::: {.proofbox}
::: {.proof}
Since $\Range(B) \subseteq \Range(A)$, we have
$$
\ker(A) \subseteq \ker(B).
$$
Now consider extending the line segment past $A$ to
$$
C_{\alpha} = A + \alpha(A - B) = (1+\alpha) A - \alpha B,
$$
with some $\alpha > 0$. We want to show that there exists $\alpha > 0$ such that $C_{\alpha} \succeq 0$.
Pick $u \in \Real{n}$, then either $u \in \ker(B)$ or $u \not\in \ker(B)$. If $u \in \ker(B)$, then
$$
u\tran C_{\alpha} u = (1+\alpha) u\tran A u - \alpha u\tran B u = (1+\alpha) u\tran A u \geq 0.
$$
If $u \not\in \ker(B)$, then due to $\ker(A) \subseteq \ker(B)$, we have $u \not\in \ker(A)$ as well. As a result, we have
\begin{equation}
u\tran C_{\alpha} u = (1+\alpha) u\tran A u - \alpha u\tran B u = (1+\alpha) u\tran A u \lparen{ 1- \frac{\alpha}{1+\alpha} \frac{u\tran B u}{u\tran A u} }.
(\#eq:prove-extension-1)
\end{equation}
Since
$$
\max_{u: u \not\in \ker(A)} \frac{u\tran B u}{u\tran A u} \leq \frac{\lambda_{\max}(B)}{\lambda_{\min,>0}(A)},
$$
where $\lambda_{\min,>0}(A)$ denotes the minimum positive eigenvalue of $A$, we can always choose $\alpha$ sufficiently small to make \@ref(eq:prove-extension-1) nonnegative. Therefore, there exists $\alpha > 0$ such that $C_{\alpha} \succeq 0$.
:::
:::
In fact, from Lemma \@ref(lem:PSDRange) we can induce a corollary.
::: {.theorembox}
::: {.corollary #PSDRange name="Range of PSD Matrices"}
Let $A, B \in \psd{n}$, then we have
$$
\Range(A + B) = \Range(A) + \Range(B),
$$
with "$+$" the Minkowski sum.
:::
:::
::: {.exercisebox}
::: {.exercise}
Let $A,B \in \psd{n}$, show that $\inprod{A}{B} = 0$ if and only if $\Range(A) \perp \Range(B)$.
:::
:::
For a subset $T \subseteq \psd{n}$, we use $\face(T,\psd{n})$ to denote the smallest face of $\psd{n}$ that contains $T$. We first characterize the smallest face that contains a given PSD matrix, i.e., $\face(A,\psd{n})$ for $A\succeq 0$. Clearly, if $A$ is PD, then $\face(A, \psd{n}) = \psd{n}$ is the entire cone. If $A$ is PSD but singular with rank $r < n$, then $A$ has the following spectral decomposition
$$
Q\tran A Q = \begin{bmatrix} \Lambda & 0 \\ 0 & 0 \end{bmatrix},
$$
where $\Lambda \in \pd{r}$ is a diagonal matrix with the $r$ nonzero eigenvalues of $A$, and $Q \in \Ogroup(n)$ is orthogonal. If
$$
A = \lambda B + (1-\lambda)C, \quad B,C \in \psd{n},\lambda \in (0,1),
$$
then multiplying both sides by $Q\tran$ and $Q$ we have
$$
\begin{bmatrix} \Lambda & 0 \\ 0 & 0 \end{bmatrix} = Q\tran A Q = \lambda Q\tran B Q + (1-\lambda) Q\tran C Q.
$$
Therefore, it must hold that
$$
Q\tran B Q = \begin{bmatrix} B_1 & 0 \\ 0 & 0 \end{bmatrix}, \quad Q\tran C Q = \begin{bmatrix} C_1 & 0 \\ 0 & 0 \end{bmatrix}, \quad B_1 \in \psd{r}, C_1 \in \psd{r},
$$
which is equivalent to
$$
B = Q \begin{bmatrix} B_1 & 0 \\ 0 & 0 \end{bmatrix} Q\tran, C = Q \begin{bmatrix} C_1 & 0 \\ 0 & 0 \end{bmatrix} Q\tran, \quad B_1 \in \psd{r}, C_1 \in \psd{r}.
$$
We conclude that $\face(A,\psd{n})$ must contain the set
\begin{equation}
G:= \lcbrace{Q \begin{bmatrix} X & 0 \\ 0 & 0 \end{bmatrix} Q\tran \lmid X \in \psd{r}}.
(\#eq:face-of-A-psd)
\end{equation}
::: {.exercisebox}
::: {.exercise}
Show that $G$ in \@ref(eq:face-of-A-psd) is a face of $\psd{n}$, i.e., (i) $G$ is convex; (ii) $u \in (x,y), u \in G, x,y \in \psd{n} \Rightarrow x,y \in G$.
:::
:::
As a result, we have $\face(A,\psd{n}) = G$.
More general faces of the PSD cone $\psd{n}$ can be characterized as follows (Theorem 3.7.1 in [@wolkowicz12book-sdp]).
::: {.theorembox}
::: {.theorem #FacePSD name="Faces of the PSD Cone"}
A set $F \subseteq \psd{n}$ is a face if and only if there exists a subspace $L \subseteq \Real{n}$ such that
$$
F = \cbrace{X \in \psd{n} \mid \Range(X) \subseteq L}.
$$
:::
:::
::: {.proofbox}
::: {.proof}
It is easy to prove the "**If**" direction using Lemma \@ref(lem:PSDRange).
First we show $F$ is convex. Pick $A,B \in F$. We have $\Range(A) \subseteq L$ and $\Range(B) \subseteq L$. Let $v_1,\dots,v_m$ be a set of basis spanning $L$. We have that, for any $u \in \Real{n}$,
\begin{equation}
\begin{split}
A u \in L & \Rightarrow Au = \sum_{i=1}^m a_i v_i, \\
B u \in L & \Rightarrow Bu = \sum_{i=1}^m b_i v_i.
\end{split}
\end{equation}
So for any $\lambda \in [0,1]$, we have
$$
(\lambda A + (1-\lambda) B) u = \lambda Au + (1-\lambda) Bu = \sum_{i=1}^m (\lambda a_i + (1-\lambda) b_i ) v_i \in L,
$$
implying $\lambda A + (1-\lambda) B \in F$ for any $\lambda \in [0,1]$.
Now we show that:
$$
X \in (A,B), X \in F, A,B \in \psd{n} \Rightarrow A, B \in F.
$$
From $X = \lambda A + (1-\lambda) B$ for some $\lambda \in (0,1)$, and invoking Lemma \@ref(lem:PSDRange), we have
\begin{equation}
\begin{split}
\Range(X) & = \Range(\lambda A + (1-\lambda) B) \supseteq \Range(\lambda A) = \Range(A) \\
\Range(X) & = \Range(\lambda A + (1-\lambda) B) \supseteq \Range((1-\lambda) B) = \Range(B).
\end{split}
\end{equation}
Since $\Range(X) \subseteq L$ due to $X \in F$, we have
$$
\Range(A) \subseteq L, \quad \Range(B) \subseteq L,
$$
leading to $A,B \in F$.
The proof for the "**Only If**" direction can be found in Theorem 3.7.1 of [@wolkowicz12book-sdp].
:::
:::
## Semidefinite Programming
### Spectrahedra
Recall the definition of a polyhedron in \@ref(eq:polyhedron), i.e., a vector $x$ constrained by finitely many linear inequalities. The feasible set of a Linear Program is a polyhedron.
Similarly, we define a **spectrahedron** as a set defined by finitely many **linear matrix inequalities** (LMIs). Spectrahedra are the feasible sets of Semidefinite Programs (SDPs).
A linear matrix inequality has the form
$$
A_0 + \sum_{i=1}^m A_i x_i \succeq 0,
$$
where $A_i \in \sym{n},i=0,\dots,m$ are given symmetric matrices. Correspondingly, a spectrahedron is defined by finitely many LMIs.
::: {.definitionbox}
::: {.definition #Spectrahedron name="Spectrahedron"}
A set $S \subseteq \Real{m}$ is a spectrahedron if it has the form
$$
S = \lcbrace{x \in \Real{m} \lmid A_0 + \sum_{i=1}^m x_i A_i \succeq 0},
$$
for given symmetric matrices $A_0,A_1,\dots,A_m \in \sym{n}$.
:::
:::
Note that there is no less of generality in defining a spectrahedron using a single LMI. For example, in the case of a set defined by two LMIs:
$$
S = \lcbrace{x \in \Real{m} \lmid A_0 + \sum_{i=1}^m x_i A_i \succeq 0, B_0 + \sum_{i=1}^m x_i B_i \succeq 0 }, A_i \in \sym{n}, B_i \in \sym{d},
$$
we can compress the two LMIs into a single LMI by putting $A_i$ and $B_i$ along the diagonal:
$$
S = \lcbrace{x \in \Real{m} \lmid \begin{bmatrix} A_0 & \\ & B_0 \end{bmatrix} + \sum_{i=1}^m x_i \begin{bmatrix} A_i & \\ & B_i \end{bmatrix} \succeq 0 }.
$$
Leveraging (5) of Lemma \@ref(lem:PositiveSemidefinite), we know that a PSD constraint is equivalent to weakly alternating signs of the characteristic polynomial of the given matrix. Therefore, a spectrahedron is defined by finitely many polynomial inequalities, i.e., a spectrahedron is a (convex) **basic semialgebraic set**, as seen in the following example [@blekherman12book-semidefinite].
::: {.examplebox}
::: {.example #EllipticCurve name="Elliptic Curve"}
Consider the spectrahedron in $\Real{2}$ defined by
$$
\lcbrace{(x,y) \in \Real{2} \lmid A(x,y) = \begin{bmatrix} x+1 & 0 & y \\ 0 & 2 & -x-1 \\ y & -x-1 & 2 \end{bmatrix} \succeq 0 }.
$$
To obtain scalar inequalities defining the set, let
$$
p_A(\lambda) = \det (\lambda I - A(x,y)) = \lambda^3 + p_2 \lambda^2 + p_1 \lambda + p_0
$$
be the characteristic polynomial of $A(x,y)$. $A(x,y) \succeq 0$ is then equivalent to the coefficients weakly alternating in sign:
\begin{equation}
\begin{split}
p_2 & = -(x+5) \leq 0, \\
p_1 & = -x^2 + 2x - y^2 + 7 \geq 0, \\
p_0 & = -(3+ x -x^3 -3x^2 - 2y^2) \leq 0.
\end{split}
\end{equation}
We can use the following Matlab script to plot the set shown in Fig. \@ref(fig:EllipticCurve). (The code is also available at [here](https://github.com/ComputationalRobotics/Semidefinite-Examples).) As we can see, the spectrahedron is convex, but it is not a polyhedron.
```matlab
x = -2:0.01:2;
y = -2:0.01:2;
[X,Y] = meshgrid(x,y);
ineq = (-X - 5 <= 0) & ...
(-X.^2 + 2*X - Y.^2 + 7 >=0) & ...
(3 + X - X.^3 - 3*X.^2 - 2*Y.^2 >= 0);
h = pcolor(X,Y,double(ineq)) ;
h.EdgeColor = 'none' ;
```
```{r EllipticCurve, out.width='60%', fig.show='hold', fig.cap='Elliptic Curve.', fig.align='center', echo=FALSE}
knitr::include_graphics("images/elliptic_curve.png")
```
:::
:::
We can use the same technique to visualize the elliptope, a spectrahedron that we will see again later when we study the MAXCUT problem.
::: {.examplebox}
::: {.example #Elliptope name="Elliptope"}
Consider the 3D elliptope defined by
$$
\lcbrace{(x,y,z) \in \Real{3} \lmid A(x,y,z) = \begin{bmatrix} 1 & x & y \\ x & 1 & z \\ y & z & 1 \end{bmatrix} \succeq 0}.
$$
The characteristic polynomial of $A(x,y,z)$ is
$$
p_A(\lambda) = \lambda^3 - 3 \lambda^2 + (-x^2 - y^2 - z^2 + 3) \lambda + x^2 - 2 xyz + y^2 + z^2 -1.
$$
The coefficients need to weakly alternative in sign, we have the inequalities
\begin{equation}
\begin{split}
-x^2 - y^2 - z^2 + 3 & \geq 0 \\
x^2 - 2 xyz + y^2 + z^2 -1 & \leq 0
\end{split}
\end{equation}
Using the Matlab script [here](https://github.com/ComputationalRobotics/Semidefinite-Examples/blob/main/elliptope.m), we generate the following plot.
```{r Elliptope, out.width='80%', fig.show='hold', fig.cap='Elliptope.', fig.align='center', echo=FALSE}
knitr::include_graphics("images/elliptope.png")
```
:::
:::
Another example is provided in Fig. \@ref(fig:SpectrahedronStride).
### Formulation and Duality
Semidefinite programs (SDPs) are linear optimization problems over spectrahedra. A standard SDP in **primal** form is written as
\begin{equation}
\boxed{
\begin{split}
p^\star = \min_{X \in \sym{n}} & \quad \inprod{C}{X} \\
\subject & \quad \calA(X) = b, \\
& \quad X \succeq 0
\end{split}
}
(\#eq:SDP-P)
\end{equation}
where $C \in \sym{n}$, $b \in \Real{m}$, and the linear map $\calA: \sym{n} \rightarrow \Real{m}$ is defined as
$$
\calA(X) := \begin{bmatrix} \inprod{A_1}{X} \\
\vdots \\ \inprod{A_i}{X} \\ \inprod{A_m}{X} \end{bmatrix}.
$$
Recall that $\inprod{C}{X} = \trace(CX)$. The feasible set of \@ref(eq:SDP-P) is the intersection of the PSD cone ($\psd{n}$) and the affine subspace defined by $\calA(X) = b$.
Closely related to the primal SDP \@ref(eq:SDP-P) is the **dual** problem
\begin{equation}
\boxed{
\begin{split}
d^\star = \max_{y \in \Real{m}} & \quad \inprod{b}{y} \\
\subject & \quad C - \calA^* (y) \succeq 0
\end{split}
}
(\#eq:SDP-D)
\end{equation}
where $\calA^{*}: \Real{m} \rightarrow \sym{n}$ is the **adjoint** map defined as
$$
\calA^*(y) := \sum_{i=1}^m y_i A_i.
$$
Observe how the primal-dual SDP pair \@ref(eq:SDP-P)-\@ref(eq:SDP-D) parallels the primal-dual LP pair \@ref(eq:primal-lp)-\@ref(eq:dual-lp).
**Weak duality**. We have a similar weak duality between the primal and dual. Pick any $X$ that is feasible for the primal \@ref(eq:SDP-P) and $y$ that is feasible for the dual \@ref(eq:SDP-D), we have
$$
\boxed{\inprod{C}{X} - \inprod{b}{y} = \inprod{C}{X} - \inprod{\calA(X)}{y} = \inprod{C - \calA^* (y)}{X} \geq 0,}
$$
where the last inequality holds because both $C - \calA^*(y)$ and $X$ are positive semidefinite. As a result, we have the weak duality
$$
d^\star \leq p^\star.
$$
Similar to the LP case, we will denote $p^\star = +\infty$ if the primal is infeasible, $p^\star = - \infty$ if the primal is unbounded below. We will denote $d^\star = +\infty$ if the dual is unbounded above, and $d^\star = -\infty$ if the dual is infeasible. We say the primal (or the dual) is **solvable** if it admits optimizers. We denote $p^\star - d^\star$ as the **duality gap**.
Recall Theorem \@ref(thm:LPStrongDuality) states that in LP, if at least one of the primal and dual is feasible, then strong duality holds (i.e., $p^\star = d^\star = \{\pm \infty, \text{finite} \}$). Unfortunately, this does not carry over to SDPs. Let us provide several examples.
::: {.examplebox}
::: {.example #FailureSDPDuality name="Failure of SDP Strong Duality"}
The first example, from [@ramana97mp-exact], shows that even if both primal and dual are feasible, there could exist a nonzero duality gap. Consider the following SDP pair for some $\alpha \geq 0$
$$
\begin{cases}
\min_{X \in \sym{3}} & \alpha X_{11} \\
\subject & X_{22} = 0 \\
& X_{11} + 2 X_{23} = 1 \\
& \begin{bmatrix} X_{11} & X_{12} & X_{13} \\
* & X_{22} & X_{23} \\
* & * & X_{33} \end{bmatrix} \succeq 0
\end{cases},
\begin{cases}
\max_{y \in \Real{2}} & y_2 \\
\subject & \begin{bmatrix} \alpha & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \succeq \begin{bmatrix} y_2 & 0 & 0 \\ 0 & y_1 & y_2 \\ 0 & y_2 & 0 \end{bmatrix}
\end{cases}
$$
To examine the primal feasible set, let us pick the bottom-right $2\times 2$ submatrix of $X$. The determinant of this submatrix needs to be nonnegative (due to (4) of Lemma \@ref(lem:PositiveSemidefinite)):
$$
X_{22} X_{33} - X_{23}^2 \geq 0.
$$
Because $X_{22} = 0$, we have $X_{23} = 0$ and hence $X_{11} = 1$. Therefore, $p^\star = \alpha$ is attained.
To examine the dual feasible set, pick the bottom-right $2 \times 2$ submatrix of
$$
\begin{bmatrix} \alpha - y_2 & 0 & 0 \\ 0 & - y_1 & -y_2 \\ 0 & -y_2 & 0 \end{bmatrix} \succeq 0,
$$
we have $y_2 = 0$. As a result, $d^\star = 0$, and strong duality fails.
The second example, from [@todd01an-semidefinite], shows that the duality gap can even be infinite. Consider the primal-dual SDP
$$
\begin{cases}
\min_{X \in \sym{2}} & 0 \\
\subject & X_{11} = 0 \\
& X_{12} = 1 \\
& \begin{bmatrix} X_{11} & X_{12} \\ * & X_{22} \end{bmatrix} \succeq 0
\end{cases},
\begin{cases}
\max_{y \in \Real{2}} & 2 y_2 \\
\subject & \begin{bmatrix} - y_1 & - y_2 \\ - y_2 & 0 \end{bmatrix} \succeq 0
\end{cases}
$$
Clearly, the primal is infeasible because
$$
\begin{bmatrix} 0 & 1 \\ 1 & X_{22} \end{bmatrix}
$$
can never be PSD. So $p^\star = + \infty$. The dual problem, however, is feasible. From the PSD constraint we have $y_2 = 0$ and $d^\star = 0$. Therefore, the duality gap is infinite.
The third example, from [@todd01an-semidefinite], shows that even when the duality gap is zero, the primal or dual problem may not admit optimizers. Consider the primal-dual SDP
$$
\begin{cases}
\min_{X \in \sym{2}} & 2 X_{12} \\
\subject & - X_{11} = -1 \\
& - X_{22} = 0 \\
& \begin{bmatrix} X_{11} & X_{12} \\ * & X_{22} \end{bmatrix} \succeq 0
\end{cases},
\begin{cases}
\max_{y \in \Real{2}} & - y_1 \\
\subject & \begin{bmatrix} y_1 & 1 \\ 1 & y_2 \end{bmatrix} \succeq 0
\end{cases}
$$
To examine the primal feasible set, we have
$$
\begin{bmatrix} 1 & X_{12} \\ X_{12} & 0 \end{bmatrix} \succeq 0
$$
implies $X_{12} = 0$. Hence the primal feasible set only has one point and $p^\star = 0$. The dual feasible set reads
$$
y_1 y_2 \geq 1,\quad y_1 \geq 0, \quad y_2 \geq 0,
$$
and we want to minimize $y_1$. Clearly, $d^\star = 0$ but it is not attainable. Therefore, strong duality holds but the dual problem is not solvable.
A Matlab script that passes these three examples to SDP solvers can be found [here](https://github.com/ComputationalRobotics/Semidefinite-Examples/blob/main/failure_strong_duality.m).
:::
:::
The examples above are somewhat "pathological" and they show that SDPs in general can be more complicated that LPs. It turns out, with the addition of **Slater's condition**, i.e., **strict feasibility** of the primal and dual, we can recover nice results parallel to those of LP.
::: {.theorembox}
::: {.theorem #SDPStrongDuality name="SDP Strong Duality"}
Assume both the primal SDP \@ref(eq:SDP-P) and the dual SDP \@ref(eq:SDP-D) are _strictly feasible_, i.e., there exists $X \succ 0$ such that $\calA(X)=b$ for the primal and there exists $y \in \Real{m}$ such that $C - \calA^* (y) \succ 0$ for the dual, then strong duality holds, i.e., both problems are solvable and admit optimizers, and $p^\star = d^\star$ equals to some finite number.
Further, a pair of primal-dual feasible points $(X,y)$ is optimal if and only if
$$
\inprod{C}{X} = \inprod{b}{y} \Leftrightarrow \inprod{C - \calA^* (y)}{X} = 0 \Leftrightarrow (C - \calA^* (y)) X = 0.
$$
:::
:::
One can relax the requirement of both primal and dual being strictly feasible to only one of them being strictly feasible, and similar results would hold. Precisely, if the primal is bounded below and strictly feasible, then $p^\star = d^\star$ and the dual is solvable. If the dual is bounded above and strictly feasible, then $p^\star = d^\star$ and the primal is solvable [@nie23book-moment].
::: {.examplebox}
::: {.example #SuccessSDPDuality name="SDP Strong Duality"}
Consider the following primal-dual SDP pair
$$
\begin{cases}
\min_{X \in \sym{2}} & 2 X_{11} + 2 X_{12} \\
\subject & X_{11} + X_{22} = 1 \\
& \begin{bmatrix} X_{11} & X_{12} \\ * & X_{22} \end{bmatrix} \succeq 0
\end{cases},
\begin{cases}
\max_{y \in \Real{}} & y \\
\subject & \begin{bmatrix} 2 - y & 1 \\ 1 & - y \end{bmatrix} \succeq 0
\end{cases}
$$
Choose
$$
X = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.5 \end{bmatrix} \succ 0
$$
we see the primal is strictly feasible.
Choose $y = -1$, we have
$$
\begin{bmatrix} 3 & 1 \\ 1 & 1 \end{bmatrix} \succ 0
$$
and the dual is strictly feasible. Therefore, strong duality holds.
In this case, pick the pair of primal-dual feasible points
$$
X^\star = \begin{bmatrix} \frac{2 - \sqrt{2}}{4} & - \frac{1}{2 \sqrt{2}} \\ - \frac{1}{2 \sqrt{2}} & \frac{2 + \sqrt{2}}{4} \end{bmatrix}, \quad y^\star = 1 - \sqrt{2},
$$
we have
$$
\inprod{C}{X^\star} = 1-\sqrt{2} = \inprod{b}{y^\star},
$$
and both $X^\star$ and $y^\star$ are optimal.
:::
:::
### Geometric Properties
## Software for Conic Optimization {#SoftwareConic}
Linear optimization over the nonnegative orthant ($\Real{n}_{+}$), the second-order cone ($\calQ_{n}$), and the positive semidefinite cone ($\psd{n}$) forms the foundation of modern convex optimization, commonly referred to as conic optimization. These three types of cones are self-dual, and there exist efficient algorithms to solve the convex optimization problems. Popular solvers include SDPT3, SeDuMi, MOSEK, and SDPNAL+.
In this section, we introduce how we should "talk to" the numerical solvers for conic optimization, i.e., how should we pass a mathematically written conic optimization to a numerical solver. Note that in many cases, this "transcription" can be done by programming packages such as CVX, CVXPY, YALMIP etc., but I think it is important to understand the standard interface of numerical solvers because (i) it reinforces our understanding of the mathematical basics, (ii) it gets us closer to designing custom numerical solvers for specific problems, (iii) if you are a heavy convex optimization user you will realize that many of the programming packages are not "efficient" in transcribing the original optimization problem (but they are indeed very general). I have had cases where solving the conic optimization takes a few minutes but transcribing the problem to the solver takes half an hour.
We will use the SeDuMi format as an example. Consider the following general linear convex optimization problem
\begin{equation}
\begin{split}
\max_{y \in \Real{m}} & \quad b\tran y \\
\subject & \quad Fy = g \\
& \quad f \geq Gy \\
& \quad h_i\tran y + \tau_i \geq \norm{H_i y + p_i}_2, i=1,\dots,r \\
& \quad B_{j,0} + \sum_{k=1}^m y_k B_{j,k} \succeq 0, j=1,\dots,s,
\end{split}
(\#eq:general-convex-sedumi)
\end{equation}
for given matrices and vectors
$$
F \in \Real{\ell_1 \times m},G \in \Real{\ell_2 \times m},H_i \in \Real{l_i \times m}, B_{j,k} \in \sym{n_j}, h_i \in \Real{m}, p_i \in \Real{l_i}, \tau_i \in \Real{}, g \in \Real{\ell_1},f \in \Real{\ell_2}.
$$
Define the linear function
$$
\phi(y):= \left(Fy, Gy, \begin{bmatrix} - h_1\tran y \\ - H_1 y \end{bmatrix}, \dots, \begin{bmatrix} - h_r\tran y \\ - H_r y \end{bmatrix}, - \sum_{k=1}^m y_k B_{1,k}, \dots,- \sum_{k=1}^m y_k B_{s,k} \right),
$$
which is a linear map from $\Real{m}$ to the vector space of Cartesian products
$$
V:= \Real{\ell_1} \times \Real{\ell_2} \times \Real{l_1+1} \times \dots \times \Real{l_r + 1} \times \sym{n_1}\times \dots \times \sym{n_s}.
$$
A vector $X \in V$ can be written as a tuple
$$
X = (x_1,x_2,\mathx_1,\dots,\mathx_r,X_1,\dots,X_s).
$$
Given another vector $Y \in V$
$$
X = (y_1,y_2,\mathy_1,\dots,\mathy_r,Y_1,\dots,Y_s),
$$
the inner product between $X$ and $Y$ is defined as
$$
\inprod{X}{Y} = \inprod{x_1}{y_2} + \inprod{x_2}{y_2} + \sum_{i=1}^r \inprod{\mathx_i}{\mathy_i} + \sum_{j=1}^s \inprod{X_j}{Y_j}.
$$
Let $\calK$ be the Cartesian product of the free cone, the nonnegative orthant, the second-order cone, and the PSD cone
$$
\calK := \Real{\ell_1} \times \Real{\ell_2}_{+} \times \calQ_{l_1} \times \dots \times \calQ_{l_r} \times \psd{n_1} \times \dots \times \psd{n_s}.
$$
Its dual cone is
$$
\calK^* = \{0\}^{\ell_1} \times \times \Real{\ell_2}_{+} \times \calQ_{l_1} \times \dots \times \calQ_{l_r} \times \psd{n_1} \times \dots \times \psd{n_s}.
$$
Note that all the cones there are self-dual except the free cone whose dual is the zero point. Then denote
$$
C := \left( g,f,\begin{bmatrix} \tau_1 \\ p_1 \end{bmatrix},\dots,\begin{bmatrix} \tau_r \\ p_r \end{bmatrix},B_{1,0},\dots,B_{s,0} \right) \in V,
$$
we have that the original optimization \@ref(eq:general-convex-sedumi) is simply the following dual conic problem
\begin{equation}
\max_{y \in \Real{m}} \cbrace{b\tran y \mid C - \phi(y) \in \calK^*}.
(\#eq:dual-conic)
\end{equation}
The linear map $\phi(y)$ can be written as
$$
\phi(y) = y_1 A_1 + \dots, y_m A_m
$$
for vectors $A_1,\dots,A_m$ in the space $V$. Therefore, the primal problem to \@ref(eq:dual-conic) is
\begin{equation}
\min_{X \in V} \cbrace{\inprod{C}{X} \mid \inprod{A_i}{X}=b_i,i=1,\dots,m,X \in \calK }.
(\#eq:primal-conic)
\end{equation}
Let us practice an example from [@nie23book-moment].
::: {.examplebox}
::: {.example #SeDuMiExample name="SeDuMi Example"}
Consider the following optimization problem as an instance of \@ref(eq:general-convex-sedumi):
\begin{equation}
\begin{split}
\max_{y \in \Real{3}} & \quad y_3 - y_1 \\
\subject & \quad y_1 + y_2 + y_3 = 3\\
& \quad -1 \geq -y_1 - y_2 \\
& \quad -1 \geq -y_2 - y_3 \\
& \quad y_1 + y_3 \geq \sqrt{(y_1-1)^2 + y_2^2 + (y_3 -1)^2} \\
& \quad \begin{bmatrix} 1 & y_1 & y_2 \\ y_1 & 2 & y_3 \\ y_2 & y_3 & 3 \end{bmatrix} \succeq 0
\end{split}
(\#eq:sedumi-example)
\end{equation}
Clearly we have $b = (-1,0,1)$. The linear map $\phi(y)$ is
$$
\phi(y) = \left( y_1+y_2+y_3,\begin{bmatrix} -y_1 - y_2 \\ - y_2 - y_3 \end{bmatrix}, \begin{bmatrix} - y_1 - y_3 \\ -y_1 \\ -y_2 \\ -y_3 \end{bmatrix}, \begin{bmatrix} 0 & -y_1 & -y_2 \\ -y_1 & 0 & -y_3 \\ -y_2 & -y_3 & 0 \end{bmatrix} \right).
$$
The vector space $V=\Real{} \times \Real{2} \times \Real{4} \times \sym{3}$, and the cone $\calK$ is
$$
\calK = \Real{} \times \Real{2}_{+} \times \calQ_3 \times \psd{3}.
$$
The vectors $A_1,A_2,A_3$ are
\begin{equation}
\begin{split}
A_1 &= \left( 1,\begin{bmatrix}-1\\0\end{bmatrix}, \begin{bmatrix}-1\\-1\\0\\0\end{bmatrix},\begin{bmatrix}0&-1&0\\-1&0&0\\0&0&0\end{bmatrix} \right)\\
A_2 &= \left( 1,\begin{bmatrix}-1\\-1\end{bmatrix}, \begin{bmatrix}0\\0\\-1\\0\end{bmatrix},\begin{bmatrix}0&0&-1\\0&0&0\\-1&0&0\end{bmatrix} \right)\\
A_3 &= \left( 1,\begin{bmatrix}0\\-1\end{bmatrix}, \begin{bmatrix}-1\\0\\0\\-1\end{bmatrix},\begin{bmatrix}0&0&0\\0&0&-1\\0&-1&0\end{bmatrix} \right).
\end{split}
\end{equation}
The vector $C$ is
$$
C = \left( 3,\begin{bmatrix}-1\\-1\end{bmatrix}, \begin{bmatrix}0\\-1\\0\\-1\end{bmatrix},\begin{bmatrix}1&0&0\\0&2&0\\0&0&3\end{bmatrix} \right).
$$
To input this problem to SeDuMi, we only need to provide the data $(A,b,C)$ together with the description of the cones $\calK$.
```matlab
% describe dimensions of the cones
K.f = 1; % free cone
K.l = 2; % nonnegative orthant
K.q = 4; % second order cone
K.s = 3; % psd cone
% provide A,b,c
c = [3,-1,-1,0,-1,0,-1,1,0,0,0,2,0,0,0,3];
b = [-1,0,1];
A = [
1,-1,0,-1,-1,0,0,0,-1,0,-1,0,0,0,0,0;
1,-1,-1,0,0,-1,0,0,0,-1,0,0,0,-1,0,0;
1,0,-1,-1,0,0,-1,0,0,0,0,0,-1,0,-1,0
];
% solve using sedumi
[xopt,yopt,info] = sedumi(A,b,c,K);
```
Note that in providing $A$, we vectorize each $A_i$ and place it along the $i$-th row of the matrix $A$.
The above Matlab script gives us the optimal solution
$$
y^\star = (0,1,2).
$$
To run the [code](https://github.com/ComputationalRobotics/Semidefinite-Examples/blob/main/sedumi_example.m), make sure you download [SeDuMi](https://github.com/sqlp/sedumi) and add that to your Matlab path.
:::
:::
## Interior Point Algorithm
A nice property of semidefinite optimization is that it can be solved in polynomial time. The algorithm of choice for solving SDPs is called an **interior point method** (IPM), which is also what popular SDP solvers like SeDuMI, MOSEK, and SDPT3 implement under the hood.
Although it can be difficult to implement an SDP solver as efficient and robust as MOSEK (as it requires many engineering wisdom), it is surprisingly simple to sketch out the basic algorithmic framework, as it is based on [Newton's method for solving a system of nonlinear equations](https://en.wikipedia.org/wiki/Newton%27s_method#:~:text=14%20External%20links-,Description,the%20method%20can%20be%20iterated.). Before I introduce you the basic algorithm, let me review two useful preliminaries.
::: {.highlightbox}
<p style="text-align: center;"><b>Newton's Method</b></p>
Given a function $f: \Real{} \rightarrow \Real{}$ that is continuously differentiable, Newton's method is designed to find a root of $f(x) = 0$. Given an initial iterate $x^{(0)}$, Newton's method works as follows
$$
x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{f'(x^{(k)})},
$$
where $f'(x^{(k)})$ denotes the derivative of $f$ at the current iterate $x^{(k)}$. This simple algorithm is indeed (in my opinion) the most important foundation of modern numerical optimization [@nocedal99book-numerical]. Under mild conditions, Newton's method has at least quadratic convergence rate, that is to say, if $|x^{(k)} - x^\star| = \epsilon$, then $|x^{(k+1)} - x^\star| = O(\epsilon^2)$. Of course, there exist pathological cases where even linear convergence is not guaranteed (e.g., when $f'(x^\star) = 0$).
Newton's method can be generalized to find a point at which multiple functions vanish simultaneously. Given a function $F: \Real{n} \rightarrow \Real{n}$ that is continuously differentiable, and an initial iterate $x^{(0)}$, Newton's method reads
\begin{equation}
x^{(k+1)} = x^{(k)} - J_F(x^{(k)})\inv F(x^{(k)}),
(\#eq:newton-method-vector)
\end{equation}
where $J_F(\cdot)$ denotes the Jacobian of $F$. Iteration \@ref(eq:newton-method-vector) is equivalent to
\begin{equation}
\begin{split}
J_F(x^{(k)}) \Delta x^{(k)} & = - F(x^{(k)}) \\
x^{(k+1)} & = x^{(k)} + \Delta x^{(k)}
\end{split}
(\#eq:newton-method-vector-1)
\end{equation}
i.e., one first solves a linear system of equations to find an update direction $\Delta x^{(k)}$, and then take a step along the direction.
As we will see, IPMs for solving SDPs can be interpreted as applying Newton's method to the perturbed KKT optimality conditions.
:::
We introduce another useful preliminary about the symmetric Kronecker product.
::: {.highlightbox}
<p style="text-align: center;"><b>Symmetric Vectorization and Kronecker Product</b></p>
Consider a linear operator on $\Real{n \times n}$:
\begin{equation}
\begin{split}
\Real{n \times n} & \rightarrow \Real{n \times n} \\
K & \mapsto N K M\tran
\end{split}
(\#eq:linear-opt-Rn)
\end{equation}
where $N,M \in \Real{n \times n}$ are given square matrices. This linear map is equivalent to
\begin{equation}
\begin{split}
\Real{n^2} & \rightarrow \Real{n^2} \\
\vectorize(K) & \mapsto (M \kron N) \vectorize(K)
\end{split}
(\#eq:linear-opt-Rn-kron)
\end{equation}
where $\kron$ denotes the usual kronecker product.
Symmetric vectorization and kronecker product is to generalize the above linear map on $\Real{n \times n}$ to $\sym{n}$.
Consider a linear operator on $\sym{n}$:
\begin{equation}
\begin{split}
\sym{n} & \rightarrow \sym{n} \\
K & \mapsto \frac{1}{2} \left( N K M\tran + M K N\tran \right)
\end{split}
(\#eq:linear-opt-Sn)
\end{equation}
where $N,M\in\Real{n\times n}$ are given square matrices. This linear map is equivalent to
\begin{equation}
\begin{split}
\Real{n^{\Delta}} & \rightarrow \Real{n^{\Delta}} \\
\svec(K) & \mapsto (M \skron N) \svec(K)
\end{split}
(\#eq:linear-opt-Sn-skron)
\end{equation}
where
$$
n^{\Delta} = \frac{n(n+1)}{2}
$$
is the $n$-th triangle number, $\svec(K)$ denotes the symmetric vectorization of $K$ defined as
$$
\svec(K) = \begin{bmatrix}
K_{11} \\
\sqrt{2} K_{12} \\
\vdots \\
\sqrt{2} K_{1n} \\
K_{22} \\
\vdots \\
\sqrt{2} K_{2n} \\
\vdots \\
K_{nn}
\end{bmatrix},
$$
and $M \skron N$ denotes the symmetric kronecker product. Note that we have
$$
\inprod{A}{B} = \inprod{\svec(A)}{\svec(B)}, \quad \forall A,B \in \sym{n},
$$
and
$$
M \skron N = N \skron M.
$$
To compute the symmetric kronecker product, see [@schacke04mthesis-kronecker]. The $\skron$ function is readily implemented in software packages such as SDPT3.
The following property of the symmetric kronecker product is useful for us later.
::: {.theorembox}
::: {.lemma #SkronEigLemma name="Spectrum of Symmetric Kronecker Product"}
Let $M,N \in \sym{n}$ be two symmetric matrices that commute, i.e., $MN = NM$, and $\alpha_1,\dots,\alpha_n$ and $\beta_1,\dots,\beta_n$ be their eigenvalues with $v_1,\dots,v_n$ a common basis of orthonormal eigenvectors. The $n^\Delta$ eigenvalues of $M \skron N$ are given by
$$
\frac{1}{2}(\alpha_i \beta_j + \beta_i \alpha_j), \quad 1 \leq i \leq j \leq n,
$$
with the corresponding set of orthonormal eigenvectors
$$
\begin{cases}
\svec(v_i v_i\tran) & \text{if } i = j \\
\frac{1}{\sqrt{2}} \svec(v_i v_j\tran + v_j v_i\tran) & \text{if } i < j
\end{cases}.
$$
:::
:::
It is easy to see that, if $M,N$ are two PSD matrices that commute, then $M \skron N$ is also PSD.
For more properties of symmetric vectorization and Kronecker product, see [@schacke04mthesis-kronecker].
:::
Now we are ready to sketch the interior-point algorithm.
### The Central Path
Assuming strict feasibility, strong duality holds between the SDP primal \@ref(eq:SDP-P) and dual \@ref(eq:SDP-D), then $(X,y,Z)$ is primal-dual optimal if and only if they satisfy the following KKT optimality conditions
\begin{equation}
\begin{split}
\calA(X) = b, \quad X \succeq 0 \\
C - \calA^*(y) - Z = 0, \quad Z \succeq 0 \\
\inprod{X}{Z} = 0 \Leftrightarrow XZ = 0
\end{split}
(\#eq:SDP-KKT)
\end{equation}
The first idea in IPM is to relax the last condition in \@ref(eq:SDP-KKT), i.e., zero duality gap, to a small positive gap, which leads to the notion of a central path.
::: {.definitionbox}
::: {.definition #CentralPath name="Central Path"}
A point $(X^\mu,y^\mu,Z^\mu)$ is said to lie on the central path if there exists $\mu > 0$ such that
\begin{equation}
\begin{split}
\calA(X^\mu) = b, \quad X^\mu \succeq 0 \\
C - \calA^*(y^\mu) - Z^\mu = 0, \quad Z^\mu \succeq 0 \\
X^\mu Z^\mu = \mu \eye
\end{split},
(\#eq:SDP-central-path)
\end{equation}
that is, $(X^\mu,y^\mu,Z^\mu)$ is primal and dual feasible, but attain a nonzero duality gap:
$$
\inprod{C}{X^\mu} - \inprod{b}{y^\mu} = \inprod{C}{X^\mu} - \inprod{\calA(X^\mu)}{y^\mu} = \inprod{C - \calA^*(y^\mu)}{X^\mu} = \inprod{X^\mu}{Z^\mu} = n\mu.
$$
:::
:::
The central path exists and is unique.
::: {.theorembox}
::: {.theorem #UniqueCentralPath name="Central Path"}
Assume the SDP pair \@ref(eq:SDP-P) and \@ref(eq:SDP-D) are strictly feasible. For any $\mu > 0$, $(X^\mu,y^\mu,Z^\mu)$ satisfying \@ref(eq:SDP-central-path) exists and is unique. Moreover,
$$
(X,y,Z) = \lim_{\mu \rightarrow 0} (X^\mu, y^\mu, Z^\mu)
$$
exists and solves \@ref(eq:SDP-P) and \@ref(eq:SDP-D).
:::
:::
Theorem \@ref(thm:UniqueCentralPath) states that the limit of the central path leads to a solution of the SDP. Therefore, the basic concept of IPM is to follow the central path and converge to the optimal solution.
### The AHO Newton Direction
We will focus on the central path equation \@ref(eq:SDP-central-path) and for simplicity of notation, we will rename $(X^\mu,y^\mu,Z^\mu)$ as $(X,y,Z)$. Discarding the positive semidefiniteness condition for now, we can view \@ref(eq:SDP-central-path) as finding a root to the following function
\begin{equation}
F(X,y,Z) = \begin{pmatrix}
\calA^*(y) + Z - C \\
\calA(X) - b \\
X Z - \mu \eye
\end{pmatrix} = 0.
(\#eq:central-path-F-1)
\end{equation}
However, there is an issue with $F$ defined as above. The input of the function $(X,y,Z)$ lives in $\sym{n} \times \Real{m} \times \sym{n}$, but the output lives in $\sym{n} \times \Real{m} \times \Real{n \times n}$ because $XZ$ is not guaranteed to be symmetric (two symmetric matrices may not commute). Therefore, Newton's method cannot be directly applied.
An easy way to fix this issue is to treat the input and output of $F$ as both $\Real{n \times n} \times \Real{m} \times \Real{n \times n}$. This could work, however, leads to nonsymmetric Newton iterates.
[@alizadeh98siopt-primal] proposed a better idea -- to equivalently rewrite \@ref(eq:central-path-F-1) as
\begin{equation}
F(X,y,Z) = \begin{pmatrix}
\calA^*(y) + Z - C \\
\calA(X) - b \\
\frac{1}{2}(X Z + Z X) - \mu \eye
\end{pmatrix} = 0.
(\#eq:central-path-F-2)
\end{equation}
The next proposition states that \@ref(eq:central-path-F-2) and \@ref(eq:central-path-F-1) are indeed equivalent.
::: {.theorembox}
::: {.proposition #SymmetricCentralPath name="Symmetrized Central Path"}
If $X,Z \succeq 0$, then
$$
XZ = \mu \eye \Leftrightarrow XZ + ZX = 2 \mu \eye.
$$
:::
:::
::: {.proofbox}
::: {.proof}
The $\Rightarrow$ direction is obvious. To show the "\Leftarrow" direction, write $X = Q \Lambda Q\tran$ with $QQ\tran = \eye$.
:::
:::
Now that the input and output domains of $F$ in \@ref(eq:central-path-F-2) are both $\sym{n} \times \Real{m} \times \sym{n}$, we can apply Newton's method to solving the system of equations. For ease of implementation, let us denote
$$
x = \svec(X), \quad z = \svec(Z), \quad c = \svec(C),
$$
$$
A\tran = \begin{bmatrix}
\svec(A_1) & \svec(A_2) & \cdots & \svec(A_m)
\end{bmatrix}
$$
and rewrite \@ref(eq:central-path-F-2) as
\begin{equation}
F(x,y,z) = \begin{pmatrix}
A\tran y + z - c \\
A x - b \\
\frac{1}{2}\svec(XZ + ZX) - \svec(\mu \eye)
\end{pmatrix} = 0.
(\#eq:central-path-F-3)
\end{equation}
The Jacobian of $F$ reads
\begin{equation}
J_F = \begin{bmatrix}
0 & A\tran & \eye \\
A & 0 & 0 \\
P & 0 & Q
\end{bmatrix}
(\#eq:aho-Jacobian)
\end{equation}
where
$$
P = Z \skron \eye, \quad Q = X \skron \eye,
$$
due to the symmetric kronecker product introduced before
$$
\frac{1}{2}\svec(XZ + ZX) = (Z \skron \eye ) x = (X \skron \eye) z.
$$
Let
$$
r_d = c - A\tran y - z, \quad r_p = b - Ax, \quad r_c = \svec(\mu \eye ) - \frac{1}{2}\svec(XZ + ZX),
$$
be the dual, primal, and complementarity residuals, applying Newton's method to \@ref(eq:central-path-F-3) gives us the Newton direction as the solution to the following linear system
\begin{equation}
\begin{bmatrix}
0 & A\tran & \eye \\
A & 0 & 0 \\
P & 0 & Q
\end{bmatrix}
\begin{bmatrix} \Delta x \\ \Delta y \\ \Delta z \end{bmatrix} = \begin{bmatrix} r_d \\ r_p \\ r_c \end{bmatrix}.
(\#eq:aho-newton-direction)
\end{equation}
One can directly form the block matrix in \@ref(eq:aho-newton-direction) and solve the linear system. However, leveraging the sparsity of the linear system, we can do better.
We can first use the third equation to eliminate $\Delta z$:
\begin{equation}
\Delta z = Q\inv (r_c - P \Delta x).
(\#eq:aho-eliminate-z)