-
Notifications
You must be signed in to change notification settings - Fork 1
/
MSPresentation.Rmd
688 lines (567 loc) · 30.1 KB
/
MSPresentation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
---
title: "The Structure of Measurement Related Ill-Posed Inverse Problems in Experimental Particle Physics"
author: Sean Gilligan
output:
beamer_presentation:
toc: FALSE
theme: "Szeged"
colortheme: "crane"
citation_package: biblatex
keep_tex: TRUE
slide_level: 3
bibliography: citations.bib
header-includes:
- \institute{Oregon State University}
- \usepackage{graphicx}
- \usepackage{caption}
- \usepackage{wasysym}
- \usepackage[normalem]{ulem}
- \usepackage[T1]{fontenc}
- \graphicspath{{images/}}
- \usepackage{hyperref}
- \hypersetup{colorlinks = true,
linkcolor = black,
urlcolor = hightide,
anchorcolor = hightide,
citecolor = hightide}
- \usepackage[utf8]{inputenc}
- \usepackage{amssymb,amsmath}
- \usepackage{bm}
- \usepackage[font=tiny]{caption}
- \definecolor{beaverorange}{RGB}{215,63,9}
- \definecolor{fadedbeaver}{RGB}{240,110,60}
- \definecolor{hightide}{RGB}{0,133,155}
- \definecolor{pinestand}{RGB}{74,119,60}
- \definecolor{reindeermoss}{RGB}{196,214,164}
- \definecolor{luminance}{RGB}{255,181,0}
- \definecolor{candela}{RGB}{253,210,110}
- \setbeamertemplate{blocks}[rounded]
- \setbeamercolor{alerted text}{fg=black!20!red}
- \setbeamercolor*{palette primary}{fg=black,bg=beaverorange}
- \setbeamercolor*{palette secondary}{bg=fadedbeaver,fg=black}
- \setbeamercolor*{palette tertiary}{fg=black,bg=beaverorange}
- \setbeamercolor*{palette quaternary}{fg=black,bg=beaverorange}
- \setbeamercolor*{sidebar}{fg=black,bg=beaverorange}
- \setbeamercolor*{palette sidebar primary}{fg=black}
- \setbeamercolor*{palette sidebar secondary}{fg=white}
- \setbeamercolor*{palette sidebar tertiary}{fg=black}
- \setbeamercolor*{palette sidebar quaternary}{fg=beaverorange}
- \setbeamercolor*{titlelike}{parent=palette primary}
- \setbeamerfont{frametitle}{series=\bfseries}
- \setbeamercolor{frametitle}{bg=white,fg=beaverorange}
- \setbeamercolor{frametitle right}{bg=beaverorange}
- \setbeamercolor*{block title}{fg=black,bg=luminance}
- \setbeamercolor*{block body}{fg=black,bg=candela}
- \setbeamercolor*{block title example}{fg=black,bg=pinestand}
- \setbeamercolor*{block body example}{fg=black,bg=reindeermoss}
- \setbeamertemplate{title page}{\begin{columns}[c]
\column{.35\textwidth}
\begin{center}
\vspace{0.6cm}
\includegraphics[width=0.9\textwidth]{images/BeaverOSU.png}\\\vspace{0.3cm}
\includegraphics[width=0.7\textwidth]{images/NSF_Large.png}
\end{center}
\column{.6\textwidth}
\noindent\setlength{\fboxrule}{3pt}{
\fcolorbox{beaverorange}{fadedbeaver}{\parbox{\textwidth}{{\Large\inserttitle}}}}\\\vspace{0.8cm}
\centering{\insertauthor\\\textit{[email protected]}}\\\vspace{0.6cm}
\centering{\begin{tabular}{rl}
Statistics Advisor:& James Molyneux\\
Physics Advisor:& Heidi Schellman
\end{tabular}}
\end{columns}}
- \AtBeginSubsection[]{}
- \AtBeginSection[]{\begin{frame}\setlength{\fboxrule}{3pt}\setlength{\fboxsep}{1em}{\fcolorbox{beaverorange}{fadedbeaver}{\makebox[0.9\linewidth][c]{{\Large\insertsection}}}}\end{frame}}
- \AtBeginDocument{\setbeamertemplate{navigation symbols}[horizontal]}
- \AtBeginDocument{\title[Ill-Posed Inverse Problems]{The Structure of Ill-Posed Inverse Problems in Experimental Particle Physics}}
- \expandafter\def\expandafter\insertshorttitle\expandafter{\insertshorttitle\hfill\insertframenumber\,/\,\inserttotalframenumber}
---
```{r, echo = FALSE, message = FALSE, warning = FALSE}
library(tidyverse)
library(ggforce)
library(jpeg)
```
### Overview
\tableofcontents
<!------------------------------------------------------------------------->
# Introduction
## Inverse Problems
### Disclaimer
\begin{columns}
\column{.4\textwidth}
\begin{figure}[!h]
\includegraphics[width=0.9\textwidth]{images/TheTreacheryOfImages_ReneMagritte_1929.png}
\caption*{Magritte, Ren$\acute{\text{e}}$. \emph{The Treachery of Images}. 1929.}
\end{figure}
\column{.6\textwidth}
Mathematical models are the building blocks of the simulacra we construct in our attempts to quantitatively approximate the world we inherit and create.
\vspace{1em}A model is at best an imperfect representation. One should never be construed as an intrinsic truth about the subject of this representation, and definitely never as the subject itself.
\end{columns}
<!------------------------------------------------------------------------->
### What is an inverse problem?
\begin{columns}
\column{.7\textwidth}
\small{\textbf{Inverse problems} In this capacity these are problems that are predominately concerned with reversing obfuscating data manipulations that occur during measurement, which for counting experiments can result from such things as \cite{Blobel2013}:
\begin{itemize}
\item finite measurement resolution and event migration;
\item statistical fluctuations;
\item nonlinear measurement responses undermining calibration efforts;
\item constraints on event acceptance or interferences with event detection.
\end{itemize}}
\column{.3\textwidth}
\begin{figure}[!h]
\includegraphics[width=0.9\textwidth]{images/TheCrystalBall_JohnWilliamWaterhouse_1902.jpeg}
\caption*{Waterhouse, John William. \emph{The Crystal Ball}. 1902.}
\end{figure}
\end{columns}
<!------------------------------------------------------------------------->
### Clearing things up...
\begin{columns}
\column{.55\textwidth}
\vspace{-3em}
\begin{figure}[!ht]
```{r, echo = FALSE, cache = TRUE, fig.align='left', fig.height=1.75}
flower <- readJPEG("images/flower_smaller.jpg")
n <- dim(flower)[1]
spread <- pnorm(seq(-10.05,10.15,0.1), sd = 1)-pnorm(seq(-10.15,10.05,0.1), sd = 1)
spread <- c(rep(0,n-102),spread,rep(0,n-102))
i0 <- length(spread)-n+1
smat <- diag(0,n)
for(i in 0:(n-1)){
smat[i+1,] <- spread[(i0-i):(i0+n-1-i)]
}
flower_smeared <- flower
for(i in 1:3){
flower_smeared[,,i] <- t(smat) %*% flower[,,i] %*% smat
}
flowers <- array(1, dim = c(n,2*n+100,3))
for(i in 1:3){
flowers[,,i] <- cbind(flower_smeared[,,i],array(1,dim = c(n,100,3))[,,i],flower[,,i])
}
dn <- 6
par(mar=c(0,0,0,38), bg=NA)
plot.new()
plot.window(xlim=c(0,1),ylim=c(0,1),asp=(n/(2*n+100)))
rasterImage(flowers[(dn+1):(n-dn),(dn+1):(2*n+100-dn),],0,0,1,1)
```
\vspace{-2em}
\caption*{A rose in the Sunnyside neighborhood of Portland, OR. Original picture taken on July 3, 2022, a few days after the 2021 heat dome, during which the daily max temperatures at the Portland International Airport from June 26 - 28, 2021 were respectively 108$^{\circ}$F, 112$^{\circ}$F, and 116$^{\circ}$F. (source: \href{https://projects.oregonlive.com/weather/temps/}{oregonlive.com})\\Gaussian blurring performed in \texttt{R}.}
\end{figure}
\column{.45\textwidth}
The goal of solving an inverse problem is to reconstruct a previous or unperturbed state that has some desired representation or contains the metrics necessary to perform a particular hypothesis test.
\vspace{1em}Image deblurring, for example, is an ubiquitous application concerned with inverse problems.
\end{columns}
<!------------------------------------------------------------------------->
### The mathematics of inconvenience
\begin{columns}
\column{.5\textwidth}
\small{The mathematics of a smudge on a camera lens are equivalent to those used to describe smearing effects on any other data represented by a metric space. Assuming only linear processes, effects like these can be modeled by:
\begin{itemize}
\item an operator $A$ mapping
\item an element $z$ of some metric space $Z$ (the original state) to
\item an element $u$ of some metric space $U$ (the measured state),
\end{itemize}
and written $A:Z\to U$ or $A(z)=u$.}
\column{.5\textwidth}
\vspace{-1.5em}
\begin{figure}[!ht]
```{r, echo = FALSE, cache = TRUE, fig.align='center', fig.height=1.8, fig.width=2.6}
mean1 <- -1
mean2 <- 1
xmin <- -3
xmax <- 3
dx <- 0.2
continuous_data <- tibble(x = seq(xmin,xmax,0.1),
y1 = (1/3)*dnorm(seq(xmin,xmax,0.1),mean1,0.5) +
(2/3)*dnorm(seq(xmin,xmax,0.1),mean2,0.5),
y2 = (1/3)*dnorm(seq(xmin,xmax,0.1),mean1,sqrt(0.5)) +
(2/3)*dnorm(seq(xmin,xmax,0.1),mean2,sqrt(0.5)))
discrete_data <- tibble(x = c(xmin,rep(seq(xmin+dx,xmax-dx,dx),each=2),xmax),
y1 = rep(((1/3)*(pnorm(seq(xmin+dx,xmax,dx),mean1,0.5)-
pnorm(seq(xmin,xmax-dx,dx),mean1,0.5)) +
(2/3)*(pnorm(seq(xmin+dx,xmax,dx),mean2,0.5)-
pnorm(seq(xmin,xmax-dx,dx),mean2,0.5))),each=2)/dx,
y2 = rep(((1/3)*(pnorm(seq(xmin+dx,xmax,dx),mean1,sqrt(0.5))-
pnorm(seq(xmin,xmax-dx,dx),mean1,sqrt(0.5))) +
(2/3)*(pnorm(seq(xmin+dx,xmax,dx),mean2,sqrt(0.5))-
pnorm(seq(xmin,xmax-dx,dx),mean2,sqrt(0.5)))),each=2)/dx)
n <- 30
spread <- pnorm(seq(-5.25,5.25,0.2), sd = 0.1)-pnorm(seq(-5.45,5.05,0.2), sd = 0.1)
spread <- c(rep(0,n),spread,rep(0,n))
i0 <- (length(spread)+1)/2
smat <- diag(0,n)
for(i in 0:(n-1)){
smat[,i+1] <- spread[(i0-i):(i0+n-1-i)]
}
#discrete_data$y2[seq(1,length(discrete_data$x),2)] <-
# t(smat) %*% discrete_data$y2[seq(1,length(discrete_data$x),2)]
#discrete_data$y2[seq(2,length(discrete_data$x),2)] <-
# t(smat) %*% discrete_data$y2[seq(2,length(discrete_data$x),2)]
par(mfrow = c(2, 2),
mar = c(0,0.5,1,0.1),
oma = c(1.5,0.5,0.5,2))
plot(continuous_data$x,continuous_data$y1,type="l",col="blue",
yaxt = "n", xaxt = "n", ylim=c(0,0.55))
title("True Value", adj = 0, cex.main = 0.9)
axis(2, at = c(0.0,0.1,0.2,0.3,0.4,0.5), tck = -0.03, las = 1, hadj = 0, cex.axis = 0.65)
plot(continuous_data$x,continuous_data$y2,type="l",col="red",
tck = -0.03, yaxt = "n", xaxt = "n", ylim=c(0,0.55))
title("Measured Value", adj = 0, cex.main = 0.9)
plot(discrete_data$x,discrete_data$y1,type="l",col="blue",ann = FALSE,
yaxt = "n", xaxt = "n", ylim=c(0,0.55))
axis(2, at = c(0.0,0.1,0.2,0.3,0.4,0.5), tck = -0.03, las = 1, hadj = 0, cex.axis = 0.65)
axis(1, at = -3:3, tck = -0.03, padj = -3.75, cex.axis = 0.55)
plot(discrete_data$x,discrete_data$y2,type="l",col="red",ann = FALSE,
yaxt = "n", xaxt = "n", ylim=c(0,0.55))
axis(1, at = -3:3, tck = -0.03, padj = -3.75, cex.axis = 0.55)
```
\vspace{-1em}
\caption*{Continuous and discrete representations of the same data with some hypothetical measurement error represented by a Gaussian blur. Calculated and rendered in \texttt{R}.}
\end{figure}
\end{columns}
## Locating the difficulty
### Well-Posedness
Finding $z$ given $A$ and $u$ is trivial when it is a \textbf{well-posed} problem, that is \cite{Tikh1977}:
\begin{enumerate}
\item $\forall u\in U$ there exists a solution $z\in Z$,
\item the solution is unique,
\item and the problem is stable on $U$ and $Z$.
\begin{itemize}
\item The inverse $A^{-1}$ is continuous.
\item Small changes in $u$ $\implies$ small changes in $z$.
\end{itemize}
\end{enumerate}
### Ill-Posedness
\begin{columns}
\column{.4\textwidth}
Any departure of these three criteria results in an \textbf{ill-posed} problem. What these might look like in practice can vary, but some striking examples will be brought up as they appear, as will the harm they would cause if treated as well-posed.
\column{.6\textwidth}
\vspace{-18em}
\begin{figure}[!ht]
```{r, echo = FALSE, fig.align='center', fig.width=2.6}
ggplot() +
geom_ellipse(aes(x0 = -1, y0 = 0.2, a = 0.8, b = 0.8, angle = 0)) +
geom_ellipse(aes(x0 = 1, y0 = 0.2, a = 0.8, b = 0.8, angle = 0)) +
geom_point(data = tibble(x = c(0.95,1.05,-0.98,-0.94,-0.98),
y = c(0.2,0.05,-0.25,-0.35,0.7)),
mapping = aes(x=x, y=y), shape = 20) +
geom_text(aes(x=-1.08, y=-0.38, label="italic(z)"), parse = TRUE, size = 3) +
geom_text(aes(x=-1.31, y=-0.21, label=paste0("italic(z)+delta*italic(z)")),
parse = TRUE, size = 3) +
geom_text(aes(x=-1.3, y=0.72, label=paste0("italic(z)+Delta*italic(z)")),
parse = TRUE, size = 3) +
geom_text(aes(x=1.18, y=0.06, label="italic(u)"),
parse = TRUE, size = 3) +
geom_text(aes(x=1.27, y=0.24, label="italic(u)+delta*italic(u)"),
parse = TRUE, size = 3) +
geom_text(aes(x=c(-1,1),y=c(1.15,1.15),label=c("italic(Z)","italic(U)")), parse = TRUE) +
geom_curve(aes(x=0.9, y=0.2, xend=-0.92, yend=-0.22), curvature=0.2,
arrow = arrow(length = unit(0.04, "npc")), color = "#5e81b5") +
geom_curve(aes(x=0.91, y=0.24, xend=-0.92, yend=0.7), curvature=0.2,
arrow = arrow(length = unit(0.04, "npc")), color = "#eb6235") +
geom_curve(aes(x=-0.88, y=-0.36, xend=0.99, yend=0.01), curvature=0.2,
arrow = arrow(length = unit(0.04, "npc"))) +
geom_text(aes(x=-0.02, y=0.02, label="italic(A^-1)"),
parse = TRUE, size = 3, color="#5e81b5") +
geom_text(aes(x=0, y=0.5, label="italic(A^-1)"),
parse = TRUE, size = 3, color="#eb6235") +
geom_text(aes(x=0.04, y=-0.47, label="italic(A)"),
parse = TRUE, size = 3) +
coord_fixed() + theme_void()
```
\vspace{-19em}
\caption*{For a small change $u\to u + \delta u$, Condition 3 requires for the solution to a \textcolor[HTML]{5e81b5}{well-posed} problem a correspondingly small change $z\to z+\delta z$. A large change $z\to z+\Delta z$ indicates the problem is \textcolor[HTML]{eb6235}{ill-posed}. Figure made in \texttt{R}.}
\end{figure}
\end{columns}
### Unfolding
In particle physics ill-posed inverse problems are solved to deal with measurement related errors and uncertainties by way of a process called \textbf{unfolding}, where folding refers to the forward process that produces said errors and uncertainties. This will be the term of choice for the remainder of the presentation.
\vspace{1em}Other labels, both more specific and more general, include \textbf{deconvolution}. \textbf{unsmearing}, and other names based on specific applications in seismology, tomography, and signal processing.
# The Math
## Continuous representation
### The Fredholm integral equation
Given a true distribution $f_X(x)$, a measured/reconstructed distribution $f_Y(y)$, and a kernel $K(x,y)$ that describes the physical measuring process are related to each other by way of the Fredholm integral equation of the first kind \cite{Blobel2011}:
\begin{align}
f_Y(y)=\int_\mathcal{X}K(x,y)f_X(x)\,dx\label{fred}
\end{align}
## Discrete representation
\footnotesize
The mathematical methods implemented here rely on linear matrix equations. Reformulating one dimensional data adjust Equation \eqref{fred} to:
\begin{align}
\bm\nu = \bm{R}\bm\mu.\label{eq:mat}
\end{align}
The vectors $\bm\nu$, $\bm\mu$ and matrix $\bm{R}$ relate to their
continuous counterparts by \cite{Blobel2013}:
\begin{align}
\text{true distribution }f_X(x)&\longrightarrow\bm\mu\,\in\,\mathcal{U}\equiv\{\mathbb{R}^M_{+}\cup\bm{0}\}\text{ the unknown true bin counts,}\nonumber\\
\text{measured distribution }f_Y(y)&\longrightarrow\bm\nu\,\in\,\mathcal{V}\equiv\{\mathbb{R}^N_{+}\cup\bm{0}\}\text{ the measured bin counts,}\nonumber\\
\text{Kernel }K(x,y)&\longrightarrow\bm{R}\;\;\text{the rectangular }N\text{-by-}M\text{ \bf response matrix}\text{.}\nonumber
\end{align}
### Conditional Probability
The components of the $\bm{R}$ correspond to conditional probabilities such that \cite{Cowan1998}:
\begin{align}
R_{ij}&=P(\text{measured value in bin }i\vert\text{true value in bin }j)\nonumber\\
&=\frac{P(\text{measured value in bin }i\text{ and true value in bin }j)}{P(\text{true value in bin }j)}\nonumber\\
&=\frac{\int_{\text{bin }i}\int_{\text{bin }j}K(x,y)f_X(x)dx\,dy}{\int_{\text{bin }j}dx\,f_X(x)}\nonumber\\
&\equiv P(\nu_i\vert\mu_j).\label{eq:Rij}
\end{align}
### The Response Matrix
The overall form is then:
\begin{align}
\bm{R}=\begin{pmatrix}
P(\nu_1\vert\mu_1) & P(\nu_1\vert\mu_2) & \dots & P(\nu_1\vert\mu_{N}) \\
P(\nu_2\vert\mu_1) & P(\nu_2\vert\mu_2) & \cdots & P(\nu_2\vert\mu_{N}) \\
\vdots & \vdots & \ddots & \vdots \\
P(\nu_{M}\vert\mu_1) & P(\nu_{M}\vert\mu_2) & \dots & P(\nu_{M}\vert\mu_{N})
\end{pmatrix}\label{eq:Rmat}
\end{align}
\begin{align}
\text{where }\;\;\;\;\nu_i = \sum_{j=1}^MR_{ij}\mu_j\;\;\;\;\text{ and }\;\;\;\;\frac{\partial\nu_i}{\partial\mu_j}&=R_{ij}\label{dnudmu}.
\end{align}
### Efficiency
As mentioned in the intro, constraints on event selection and detector geometry result in some events going uncounted. Per bin, the ratio of the number of observed events to the number of true events is the \textbf{efficiency}, and is represent by vector $\bm\epsilon$. The components of which can be calculated by \cite{Cowan1998}:
\begin{align}\sum_{i=1}^{N}P(\nu_i\vert\mu_j)=\sum_{i=1}^{N}R_{ij}=\epsilon_j\leq 1.\label{eq:eff}\end{align}
### The Expected, The Estimated, and The Observed
\begin{columns}
\column{.6\textwidth}
All variables so far have involved the exact values as derived from exact distributions $f_Y(y)$ and $f_X(x)$. To clarify some notation the vector $\bm{n}$ is introduced. Altogether we have:
\begin{itemize}
\item $\bm\nu$: the expected number of observed events;
\item $\bm{\hat\nu}$: the estimated mean number of observed events;
\item $\bm{n}$: the number of observed events;
\item $\bm\mu$: the expected true number of events;
\item $\bm{\hat\mu}$: the estimated true number of events.
\end{itemize}
\column{.4\textwidth}
\begin{figure}
\includegraphics[width=0.9\textwidth]{gbu.jpg}
\end{figure}
\end{columns}
## Poisson processes
### The right statistical model
As particle physics experiments are counting experiments the Poisson distribution is a common choice to model the number of counts in a particular bin. As such, for bin $i$ with an expected $\nu_i$ counts and random variable $n_i$ we have:
\begin{align}P(n_i\vert\nu_i)=\frac{\nu_i^{n_i}e^{-\nu_i}}{n_i!},\label{eq:pois}\end{align}
with standard estimates:
\begin{align}
\nu_i
&=\text{E}[\hat\nu_i]=\text{E}[n_i];\nonumber\\
&=\text{Var}[\hat\nu_i]=\text{Var}[n_i].\nonumber
\end{align}
### Observed Events Covariance Matrix
Dealing with more than one Poisson distribution results in the need for multivariate estimates. Being treated as independent distributions the estimated covariance matrix of the observed number of counts $\bm{\hat\Sigma}_{\nu}$ has components:
\begin{align}
\hat\Sigma^{\nu}_{ij}
&=\text{Cov}[\hat\nu_i,\hat\nu_j]\nonumber\\
&=\text{Cov}[n_i,n_j]\nonumber\\
&=\delta_{ij}n_i,\label{eq:cov}
\end{align}
where the Kronecker delta, $\delta_{ij}=\begin{cases}0\;\;\;\text{if }i\neq j\\1\;\;\;\text{if }i=j\end{cases}$.
### Adventures in log-likelihood
Maximum likelihood estimation can be explored as well, with some useful results. Jumping to the end for time (see paper) we have:
\begin{align}
\log L(\bm\mu)&=\sum_{i=1}^N\left(n_i\log\nu_i-\nu_i-\log n_i!\right)\label{eq:logL}\\
\frac{\partial\log L}{\partial\mu_k}&=\sum_{i=1}^N\left(\frac{n_i}{\nu_i}-1\right)R_{ik}=0\label{eq:dlogL}\\
\frac{\partial^2\log L}{\partial\mu_k\partial\mu_l}&=-\sum_{i=1}^N\frac{n_i R_{il}R_{ik}}{\nu_i^2}\label{eq:d2logL}
\end{align}
### New and old estimates
Some basic algebra on Equation \eqref{eq:dlogL} show that the MLR estimate $\bm{\hat\nu}_{\text{MLE}}$ reproduces $\bm{\hat\nu}=\bm{n}$, which is nice. However Equation \eqref{eq:d2logL} hints at something more, as the Fisher information is:
\begin{align}\bm{\mathcal{I}}(\bm\mu)
&=-\text{E}\left[\frac{\partial^2\log L}{\partial\mu_k\partial\mu_l}\right].\nonumber\\
&=\text{E}\left[\sum_{i=1}^N\frac{n_i R_{il}R_{ik}}{\nu_i^2}\right]\label{fishinfo}
\end{align}
The Cram$\acute{\text{e}}$r-Rao lower bound defines its inverse as the lower bound on the covariance matrix of an \emph{unbiased} estimator of $\bm{\mu}$ \cite{Cowan1998}.
### Lower bound for an unbiased estimator
Assuming there is an inverse for $\bm{R}$, for unbiased estimator $\bm{T}(\bm{n})$:
\begin{align}
\text{Cov}_{\!\bm{\mu}}\!\left[\bm{T}(\bm{n})\right]\geq \bm{\mathcal{I}}(\bm{\mu})^{-1}
&=\left(\sum_{i=1}^N\frac{E[n_i] R_{il}R_{ik}}{\nu_i^2}\right)^{-1}\nonumber\\
&=\left(\sum_{i=1}^N\frac{R_{il}R_{ik}}{\nu_i}\right)^{-1}\nonumber\\
&=\left(\bm{R}^T\bm{\hat\Sigma}_\nu^{-1}\bm{R}\right)^{-1}\nonumber\\
&=\bm{R}^{-1}\bm{\hat\Sigma}_\nu(\bm{R}^{-1})^{T}.\label{lowerB}
\end{align}
We should then be interested in finding something like this if we want an unbiased estimator.
# A Simulated Toy Model
### Disclaimer
This simulation is not meant to reflect any actual physics, but particle physics can contain signals like these.
## The Physics Models
### A trio of physics processes
Consider three hypothetical Cauchy physics Processes with three sets of independent and identically distributed random variables such that for some $i$th sample from each:
\begin{align}
X_{1,i}&\sim \text{Cauchy}(11,4)\nonumber\\
X_{2,i}&\sim \text{Cauchy}(18,4)\nonumber\\
X_{3,i}&\sim \text{Cauchy}(14,5)\nonumber
\end{align}
### A duo of models
Consider two hypothetical physics Models that predict observations in a hypothetical experiment. Model 1 predicts that they will only see events from Process 1 and Process 2, while Model 2 predicts that they will also see events from Process 3. As informed by a binomial or trinomial process a particular event in Model 1 and Model 2 will have the forms:
\begin{align}
X^{(1)}_i&=0.3X_{1,i}+0.7X_{2,i}\nonumber\\
X^{(2)}_i&=0.225X_{1,i}+0.525X_{2,i}+0.25X_{3,i}\nonumber
\end{align}
### Simulations
In particle physics experiments, analysts make use of Monte-Carlo (MC) simulations to generate physics events under a physics model of interest, as well as to simulate detector response to such events. These are then compared to real data. For the purpose of this project the following samples were taken:
\begin{itemize}
\item 200,000 events from Model 1 as MC/Training;
\item 200,000 events from Model 2 as MC/Training;
\item 20,000 events from Model 1 as Data/Testing;
\item 20,000 events from Model 2 as Data/Testing
\end{itemize}
### Process component distributions
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{processes.png}
\end{figure}
## Simulating the detector response
### The detector responds
The effects of detector smearing is represented by i.i.d random variables generated by the conditional Gaussian process:
$$\varepsilon_i\sim N\left(\mu(X_i),\sigma(X_i)^2\right),$$
the mean and variance of which are functions of $X_i$ defined by:
\begin{align}
\mu(X_i=x)&=-x^{1/4}\;\;\text{and}\nonumber\\
\sigma(X_i=x)&=\log\left(\frac{x+10}{4}\right).\nonumber
\end{align}
### Form of the observed
With the added error the form of the observed data becomes the deceptively simple:
$$Y_i = X_i^{(?)}+\varepsilon_i$$
Even so, the probability of then being detected at all, as determined by the efficiency, is:
$$p(X_i=x)=1-e^{-\sqrt{x}/4}$$
## The Plots
### A direct comparison
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{simulation.png}
\caption*{Distinguishing between Model 1 and Model 2 is clearly easier before detector effects are applied. Can you tell which Model the Data belongs too in the measured/reconstructed case?}
\end{figure}
### Migrations
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{migration.png}
\caption*{These heatmaps are a visual representation of the information contained in the migration matrices.}
\end{figure}
# Unfolding
## Response Matrix Inversion
### Naive approaches
As Equation eqref{mat} looks like any other matrix operating on a vector, the inverse of this seems like an obvious route, such that:
\begin{align}
\bm{\hat\mu}=\bm{R}^{-1}\bm{n}.\label{eq:naiveInv}
\end{align}
This would be true of this was a well-posed problem. For pedagogy, let's see why this is a bad approach.
### Weighted least squares
Least squares should take us to an inverse of some sort:
\begin{align}\min_{\bm\mu}\nabla_{\!\bm\mu}\bm\chi^2(\bm{\mu})
&=\nabla_{\bm\mu}(\bm{R}\bm{\mu}-\bm{n})^T\bm{\hat\Sigma}_{\nu}^{-1}(\bm{R}\bm{\mu}-\bm{n})\label{eq:leastsq}\\
&=\;\;\vdots\nonumber\\
&=2\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R}\bm{\mu}-2\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{n}\nonumber
\end{align}
Setting this to zero allows us to solve for:
\begin{align}
\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R}\bm{\mu}&=\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{n}\nonumber\\
\implies\bm{\hat\mu}&=(\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R})^{-1}\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{n}\\
&=\bm{R}^{+}\bm{n}.\label{eq:naivemu}
\end{align}
### The Pseudo-inverse
What was found, $\bm{R}^{+}=(\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R})^{-1}\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}$, is the Moore-Penrose generalized inverse(or pseudo-inverse) \cite{Blobel2013}
\vspace{1em}Note that $\bm{R}\bm{R}^{+}=\bm{R}(\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R})^{-1}\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}=\bm{H}$ is just the hat matrix from ordinary linear regression.
\vspace{1em}Also note that $\bm{R}^{-1}\bm{\hat\Sigma}_\nu(\bm{R}^{-1})^{T}$ in $\bm{R}^{+}$ is the inverse of the lower bound for the covariance matrix of an unbiased estimator of $\bm{\mu}$. I.e.
\begin{align}
(\bm{R}^T\bm{\hat\Sigma}_{\nu}^{-1}\bm{R})^{-1}&=\bm{R}^{+}\bm{\hat\Sigma}_\nu\bm{R}^{+^T}=\bm{R}^{+}\text{Cov}[\bm{n}]\bm{R}^{+^T}\nonumber\\
&=\text{Cov}[\bm{R}^{+}\bm{n}]=\text{Cov}[\bm{\hat\mu}]=\bm{\hat\Sigma}_\mu\nonumber.
\end{align}
And yet dispite all this, the estimated true counts are...
### Ugly results
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{badresults.png}
\caption*{The solutions are unstable, even when paired with the correct Models. Additionally, the solutions have negative counts, meaning they do not exist in the space of allowable solutions.}
\end{figure}
### Why?
Overfitting. The unfolding process used here cannot account for even small random variations in the space of reconstructed counts. In pursuing an unbiased estimator we have found a completely unacceptable variance. Bias must be introduced that uses additional information contained in the data. This is done with \textbf{regularization}.
## Regularization
### Tikhonov regularization
The method of regularization called Tikhonov regularization is commonly known as ridge regression \cite{vanwieringen2021}, which is only a specific case discussed in Masters-level statistics courses. It generalizes this by the inclusion of an additional variable:
$$\tau\vert\vert \Gamma x\vert\vert\;\;\;\;\;\;\text{vs}\;\;\;\;\;\;\tau\vert\vert x\vert\vert$$
Technically the latter is equivalent but with $\Gamma=\bm{I}$, the identity matrix.
### Singular Value Decomposition
Singular value decomposition \cite{wiki:svd} (SVD) involves the factorization of an $N\times M$ matrix $\bm{A}$ into
\begin{align}
\bm{A}=\bm{USV}^T,\label{eq:svd}
\end{align}
where $\bm{U}$ and $\bm{V}$ are respectively $N\times N$ and $M\times M$ unitary matrices and $\bm{S}$ is an $N\times M$ rectangular diagonal matrix consisting of non-negative real numbers.
### Data and notation transformations
SVD is also how the particular form of unfolding we will use is described \cite{Hocker1995}. The first couple steps of which involve:
\begin{enumerate}
\item redistributing information in Equation \eqref{eq:mat} to make $\bm{R}$ contain the actual counts, call it $\bm{X}$,
\item rescaling the true counts to produce $\bm\beta$ with components $\beta_i=\mu_i/\mu_i^{\text{MC}}$,
\item and changing notation for counts to $\bm{n}=\bm{Y}$.
\end{enumerate}
### SVD to distribute weights
The no form the squares to be minimized is:
\begin{align}
\chi^2(\bm{\beta})=\left(\bm{Y}-\bm{X}\bm{\beta}\right)^T\bm{\hat\Sigma}_\nu^{-1}\left(\bm{Y}-\bm{X}\bm{\beta}\right)\label{regSq}
\end{align}
Using SVD to redistribute the squares gives
\begin{align}
\bm{\hat\Sigma}_\nu=\bm{QTQ}^T,\;\;\;\;\;\text{ where }\;\;\;\;\;\bm{\hat\Sigma}_\nu^{-1}=\bm{Q}\bm{T}^{-1}\bm{Q}^T\text.
\end{align}
### Distributing weights
The result of doing so is:
\begin{align}
\chi^2(\bm{\beta})&=\left(\bm{\tilde{Y}}-\bm{\tilde{X}}\bm{\beta}\right)^T\left(\bm{\tilde{Y}}-\bm{\tilde{X}}\bm{\beta}\right),
\end{align}
where
\begin{align}
\bm{\tilde{X}}=\bm{T}^{-1/2}\bm{Q}^T\bm{X}\;\;\;\;\;\;\;\text{and}\;\;\;\;\;\;\bm{\tilde{Y}}=\bm{T}^{-1/2}\bm{Q}^T\bm{Y}.
\end{align}
### Add regularization term
The chosen regularization term is one that minimizes the second derivative $\bm\mu$, and is accomplished by
\begin{align}
\chi^2(\bm{\beta})&=\left(\bm{\tilde{Y}}-\bm{\tilde{X}}\bm{\beta}\right)^T\left(\bm{\tilde{Y}}-\bm{\tilde{X}}\bm{\beta}\right)+\tau\bm{\beta}^T\bm{C}^T\bm{C}\bm{\beta},\label{eq:regular}
\end{align}
where
\begin{align}
\bm{C}=\begin{pmatrix}
-1&1&0&0&\dots&\\
1&-2&1&0&\dots&&\\
0&1&-2&1&\ddots&\\
\vdots&\vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\
&&&&\dots&-2&1\\
&&&&\dots&1&-1
\end{pmatrix}
\end{align}
### Reformulation
Equation eqref{mat} is reformulated again to produce
\begin{align}
\begin{bmatrix}
\bm{\tilde{X}}\\\sqrt{\tau}\,\bm{C}
\end{bmatrix}\bm{\beta} =
\begin{bmatrix}
\bm{\tilde{Y}}\\
\bm{0}_N
\end{bmatrix},\nonumber
\end{align}
but a damped least-squares approach is chosen.
### Cut for time
The remainder of the material has been cut and can be referenced in the paper.
### Choosing regularization term
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{yk.png}
\end{figure}
### Minimize sum of $\chi$s across all combinations
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{minTau.png}
\end{figure}
### Comparing
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{table.png}
\end{figure}
### Results
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{results.png}
\end{figure}
### Acknowledgements
James, Heidi, NSF, Sharmodeep, Sarah, linear algebra, Soviet era physicists and mathematicians
### References
\AtNextBibliography{\tiny}
\printbibliography