-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathmain.tex
1154 lines (1003 loc) · 51 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{article}
% Packages
\usepackage[margin=1.5cm, includefoot, footskip=30pt]{geometry}
\usepackage{hyperref}
\usepackage{multicol}
\usepackage{booktabs}
\usepackage{xstring}
\usepackage{subcaption}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{standalone}
\usepackage{fancyvrb}
\usepackage{algorithm,algorithmic}
\usepackage{authblk}
\usepackage{tikz}
\usetikzlibrary{calc, shapes, patterns}
% Title
\title{Evolution Reinforces Cooperation with the Emergence of Self-Recognition
Mechanisms: an empirical study of the Moran process for the iterated
Prisoner's dilemma}
\author[1]{Vincent Knight}
\author[2]{Marc Harper}
\author[1]{Nikoleta E. Glynatsi}
\author[3]{Owen Campbell}
\affil[1]{Cardiff University, School of Mathematics, UK}
\affil[2]{Google Inc., Mountain View, CA, USA}
\affil[3]{Not affiliated}
\date{}
\begin{document}
\maketitle
\begin{abstract}
We present insights and empirical results from an extensive numerical
study of the evolutionary dynamics of the iterated prisoner's dilemma.
Fixation probabilities for Moran processes are obtained
for all pairs of 164 different strategies including classics such as TitForTat, zero
determinant strategies, and many more sophisticated strategies.
Players with long memories and sophisticated behaviours outperform
many strategies that perform well in a two player setting. Moreover we
introduce several strategies trained with evolutionary algorithms to
excel at the Moran process. These strategies are excellent invaders
and resistors of invasion and in some cases naturally evolve handshaking
mechanisms to resist invasion. The best invaders were those trained
to maximize total payoff while the best resistors invoke handshake mechanisms.
This suggests that while maximizing individual payoff
can lead to the evolution of cooperation through invasion, the relatively
weak invasion resistance of payoff maximizing strategies are not as
evolutionarily stable as strategies employing handshake mechanisms.
\end{abstract}
\section{Introduction}\label{sec:introduction}
The Prisoner's Dilemma (PD) \cite{Flood1958} is a fundamental two player game
used to model a variety of strategic interactions. Each player chooses
simultaneously and independently
between cooperation (C) or defection (D). The payoffs of
the game are defined by the matrix $\begin{pmatrix} R & S \\ T & P
\end{pmatrix}$, where $T > R > P > S$ and $2R > T + S$. The PD is a one
round game, but is commonly studied in a manner where the prior outcomes
matter. This repeated form is called the Iterated Prisoner's
Dilemma (IPD). As described in \cite{Axelrod1980a, Knight2016, Press2012} a
number of strategies have been developed to take advantage of the history of
play. Recently, some strategies referred to as zero determinant (ZD) strategies
\cite{Press2012} can manipulate some players through extortionate mechanisms.
The Moran Process \cite{Moran1957} is a model of
evolutionary population dynamics that has been used to gain insights about the
evolutionary stability in a number of settings (more details given in
Section~\ref{sec:the_moran_process}).
Several earlier
works have studied iterated games in the context of the prisoner's dilemma
\cite{Nowak, stewart2013extortion}, however these often make simplifying assumptions
or are limited to classes of strategies
such as memory-one strategies that only use the previous round of play.
This manuscript provides a detailed numerical analysis of agent-based simulations
of \textbf{\input{./tex/num_strategies.tex}}complex and adaptive strategies for the
IPD\@. This is made possible by the Axelrod library \cite{axelrodproject}, an
effort to provide software for reproducible research for the IPD\@. The library
now contains over \input{./tex/num_strategies_axelrod.tex}parameterized
strategies including classics like TitForTat and WinStayLoseShift, as well as
recent variants such as OmegaTFT, zero determinant and other memory one
strategies, strategies based on finite state machines, lookup tables, neural
networks, and other machine learning based strategies, and a collection of novel
strategies. Not all strategies have been considered for this study: excluded
are those that make use of knowledge of the number of turns in a match
and others that have a high
computational run time. The large number of strategies are available thanks to
the open source nature of the project with over 50 contributors from around the
world, made by programmers and researchers \cite{Knight2016}. Three of the considered
strategies are finite state machines trained specifically for Moran processes
(described further in Section~\ref{sec:strategies}).
In addition to providing a large collection of strategies, the Axelrod library
can conduct matches, tournaments and population
dynamics with variations including noise and spatial structure.
The strategies and simulation frameworks are
automatically tested to an extraordinarily high degree of coverage in accordance
with best research software practices.
Using the Axelrod library and the many strategies it contains, we obtain
fixation probabilities for all pairs of strategies, identifying
those that are effective invaders and those resistant to invasion, for
population sizes $N=2$ to $N=14$. Moreover we present a number of strategies
that were created via reinforcement algorithms (evolutionary and particle
swarm algorithms) that are among the best invaders and resistors of invasion
known to date, and show that handshaking mechanisms naturally arise from these
processes as an invasion-resistance mechanism.
Recent work has argued that agent-based
simulations can provide insights in evolutionary game theory not available
via direct mathematical analysis \cite{adami2016evolutionary}. The results
and insights contained in this paper would be difficult to derive analytically.
In particular the following questions are addressed:
\begin{enumerate}
\item What strategies are good invaders?
\item What strategies are good at resisting invasion?
\item How does the population size affect these findings?
\end{enumerate}
While the results agree with some of the published literature, it is found that:
\begin{enumerate}
\item Zero determinant strategies are not particularly effective for $N > 2$
\item Complex strategies can be effective, and in fact can naturally evolve
through evolutionary processes to outperform designed strategies.
\item The strongest resistors specifically evolve or have a handshake mechanism.
\item Strong invaders are generally cooperative strategies that do not defect
first but retaliate to varying degrees of intensity against strategies that defect.
\item Strategies evolved to maximize their total payoff can be strong invaders
and achieve mutual cooperation with many other strategies.
\end{enumerate}
\subsection{The Moran Process}\label{sec:the_moran_process}
Figure~\ref{fig:moran_process} shows a diagrammatic representation of the Moran
process, a stochastic birth death process on a finite population in which the
population size stays constant over time. Individuals are \textbf{selected}
according to a given fitness landscape. Once selected, the individual is
reproduced and similarly another individual is chosen to be removed from the
population. In some settings mutation is also considered but without mutation
(the case considered in this work) this process will arrive at an absorbing
state where the population is entirely made up of players of one strategy. The
probability with which a given strategy takes over a population is called the
\textit{fixation probability}. A more detailed analytic description of this is
given in Section~\ref{sec:validation}. In our simulations offspring do not
inherit any knowledge or history from parent replicants.
\begin{figure}[!hbtp]
\centering
\input{./tex/moran_process.tex}
\caption{A diagrammatic representation of a Moran process.}
\label{fig:moran_process}
\end{figure}
The Moran process was initially introduced in \cite{Moran1957}. It has since
been used in a variety of settings including the understanding of the spread of
cooperative and non-cooperative behaviour such as cancer \cite{West2016} and the
emergence of cooperative behaviour in spatial topologies \cite{Nowak2017}.
However these works mainly consider relatively simple strategies. Some work has
looked at evolutionary stability of agent-based strategies within the Prisoner's Dilemma
\cite{Li2014} but this is not done in the more widely used setting of the Moran
process, rather in terms of infinite population stability. In \cite{Baek2016}
Moran processes are studied in a theoretical framework for a small subset of
strategies. The subset included memory one strategies: strategies that recall
the events of the previous round only.
Of particular interest are the zero determinant strategies introduced in
\cite{Press2012}. It was argued in \cite{stewart2013extortion} that generous
ZD strategies are robust against invading strategies. However, in \cite{Lee2015},
a strategy using machine learning techniques was capable of resisting invasion
and also able to invade any memory one strategy. Recent work \cite{Hilbe2017}
has investigated the effect of memory length on strategy performance and the
emergence of cooperation but this is not done in a Moran process context and only
considers specific cases of memory 2 strategies. In \cite{Adami2013} it was
recognised that many zero determinant strategies do not fare well against
themselves. This is a disadvantage for the Moran process where the best
strategies cooperate well with other players using the same strategy.
\subsection{Strategies considered}\label{sec:strategies}
To carry out this numerical experiment, \input{./tex/num_strategies}
strategies, listed (with their properties) in Appendix~\ref{app:list_of_players},
are used from the Axelrod library. There are
\input{./tex/num_stochastic.tex}stochastic and
\input{./tex/num_deterministic.tex}deterministic strategies. Their memory depth,
defined by the number of rounds of history used by the strategy each round, is
shown in Table~\ref{tbl:memory_depth_count}. The memory depth is infinite if the
strategy uses the entire history of play (whatever its length). For example, a
strategy that utilizes a handshaking mechanism where the opponent's actions on
the first few rounds of play determines the strategies subsequent behavior would
have infinite memory depth.
A number of these strategies have been trained with reinforcement learning
algorithms prior to this study and not specifically for the Moran process.
\begin{itemize}
\item Evolved ANN: a neural network based strategy;
\item Evolved LookerUp: a lookup table based strategy;
\item PSO Gambler: a stochastic version of the lookup table based strategy;
\item Evolved HMM: a hidden Markov model based strategy.
\end{itemize}
Apart from the PSO Gambler strategy, which was trained using a particle swarm
optimisation algorithm, these strategies are trained with an evolutionary
algorithm that perturbs strategy parameters and optimizes the mean total score
against all other opponents \cite{affenzeller2009genetic}. They were trained to
win IPD tournaments by maximizing their mean total payoffs against a variety
of opponents. Variation is
introduced via mutation and crossover of parameters, and the best performing
strategies are carried to the next generation along with new variants. Similar
methods appear in the literature \cite{Ashlock2006}.
More information about each player can be obtained in the documentation for
\cite{axelrodproject} and a detailed description of the performance
of these strategies in IPD tournaments is described in~\cite{Harper2017}.
All of the training code is archived at \cite{marc_harper_2017_824264}. This
software is (similarly to the Axelrod library) available on github
\url{https://github.com/Axelrod-Python/axelrod-dojo} with documentation to
train new strategies easily. Training
typically takes less than 100 generations and can be completed within several
hours on commodity hardware.
There are three further strategies trained specifically for this study; Trained
FSM 1, 2, and 3 (TF1 - TF3). These are based on finite state machines of 16, 16,
and 8 states respectively (see Figures~\ref{fig:tf1},~\ref{fig:tf2}
and~\ref{fig:tf3}).
\begin{figure}[!hbtp]
\centering
\input{./tex/fsm_one.tex}
\caption{TF1: a 16 state finite state machine with a handshake leading to
mutual cooperation at state 4.}
\label{fig:tf1}
\end{figure}
\begin{figure}[!hbtp]
\centering
\input{./tex/fsm_two.tex}
\caption{TF2: a 16 state finite state machine with a handshake leading to
mutual cooperation at state 16.}
\label{fig:tf2}
\end{figure}
\begin{figure}[!hbtp]
\centering
\input{./tex/fsm_three.tex}
\caption{TF3: an 8 state finite state machine.}
\label{fig:tf3}
\end{figure}
As opposed to the previously described strategies, these strategies were trained
with the objective function of \textbf{mean fixation probabilities for Moran
processes} starting at initial population states consisting of \(N/2\)
individuals of the training candidates and \(N/2\) individuals of an opponent
strategy, taken from a selection of 150 opponents from the Axelrod library:
\begin{itemize}
\item TF1 \(N=12\), 0\% noise.
\item TF2 \(N=10\), 0\% noise.
\item TF3 \(N=8\), 1\% noise.
\end{itemize}
Each matchup of players was run to fixation to estimate the absorption
probabilities. The trained algorithms
were run for fewer than 50 generations. Training data for this is available at
\cite{data}.
TF3 cooperates and defects with
various cycles depending on the opponent's actions. TF3 will mutually
cooperate with any strategy and only tolerates a few defections before
defecting for the rest of match. It is similar to but not exactly the same as
Fool Me Once, a strategy that cooperates until the opponent has defected twice
(not necessarily consecutively), and defects indefinitely thereafter. Though a
product of training with a Moran objective, it differs from TF1 and TF2
in that it lacks a handshake mechanism. Figure~\ref{fig:tf3} shows
all 8 states of the strategy produced by the training process (states 3 and 8
are not reachable).
TF2 always starts with CD and will defect against opponents that start with
DD\@. It plays CDD against itself and then cooperates thereafter; Fortress3 and
Fortress4 also use a similar handshake and cooperate with TF2. Cooperation
can be rescued after a failed handshake by a complex sequence of plays
which sometimes results in mutual cooperation with Firm but Fair, Grofman, and
GTFT, and a few others with low probability. TF2 defects against all other
players in the study, barring
unusual cases arising from particular randomizations. Figure~\ref{fig:tf2} shows
all 16 states of the strategy (states 6 and 7 are not reachable).
TF1 has an initial handshake of CCD and cooperates if the opponent matches.
However if the opponent later defects, TF1 will respond in kind, so the
handshake is not permanent. Only one player (Prober 4 \cite{Prison1998}) manages to
achieve cooperation with TF1 after about 20 rounds of play. TF1 is functionally
very similar to a strategy known as ``Collective Strategy'', which has a
handshake of CD and cooperates with opponents that matched the handshake
until they defect, defecting thereafter if the opponent ever defects \cite{Li2009}.
This strategy was specifically designed for evolutionary processes.
For both TF1 and TF2 a handshake
mechanism naturally emerges from the structure of the underlying finite state
machine. This behavior is an outcome of the evolutionary process and is in no
way hard-coded or included via an additional mechanism.
\begin{table}[!hbtp]
\centering
\input{./tex/memory_depth_count.tex}
\caption{Memory depth}
\label{tbl:memory_depth_count}
\end{table}
\subsection{Data collection}\label{sec:data_collection}
Each strategy pair is run
with starting population distributions of $(1, N-1)$, $(N/2, N/2)$ and $(N-1 ,
1)$, for \(N\) from 2 through 14. The fixation probability is then empirically
computed for each combination of starting distribution and value of \(N\). The
Axelrod library can carry out exact simulations of the Moran process. Since some
of the strategies have a high computational cost or are stochastic, samples are
taken from a large number of match outcomes for the pairs of players for use in
computing fitnesses in the Moran process. This approach was verified to agree
with unsampled calculations to a high degree of accuracy in specific cases.
This is described in Algorithms~\ref{alg:data_collection}
and~\ref{alg:moran_process}.
\begin{algorithm}[!hbtp]
\caption{Data Collection}
\label{alg:data_collection}
\input{./tex/data_collection.tex}
\end{algorithm}
\begin{algorithm}[!hbtp]
\caption{Moran process}
\label{alg:moran_process}
\input{./tex/moran_process_simulation.tex}
\end{algorithm}
Section~\ref{sec:validation} will further validate the methodology by comparing
simulated results to analytical results in some cases. The main results of this
manuscript are presented in Section~\ref{sec:empirical_results} which will
present a detailed analysis of all the data generated. Finally,
Section~\ref{sec:conclusion} will conclude and offer future avenues for the work
presented here.
\section{Validation}\label{sec:validation}
As described in \cite{Nowak} consider the payoff matrix:
\begin{equation}\label{equ:payoff_matrix}
M = \begin{pmatrix}
a, b\\
c, d
\end{pmatrix}
\end{equation}
The expected payoffs of \(i\) players of the first type in a population with \(N
- i\) players of the second type are given by:
\begin{equation}\label{equ:expected_payoff_one}
f_i = \frac{a(i - 1) + b(N - i)}{N - 1}
\end{equation}
\begin{equation}\label{equ:expected_payoff_two}
g_i = \frac{ci + d(N - i - 1)}{N - 1}
\end{equation}
The transitions within the birth death process that underpins the Moran process
are then given by:
\begin{align}
p_{i, i+1}&= \frac{if_i}{if_i+(N-i)g_i}\frac{N-i}{N}\label{equ:p_up}\\
p_{i, i-1}&= \frac{(N-i)g_i}{if_i+(N-i)g_i}\frac{i}{N}\label{equ:p_down}\\
p_{ii} &= 1 - p_{i, i+1} - p_{i, i-1}\label{equ:p_stay}
\end{align}
Using this it is a known result \cite{Nowak2017} that the fixation probability
of the first strategy in a population of \(i\) individuals of the first type
and \(N-i\) individuals of the second:
\begin{equation}\label{equ:fixation_probability}
x_i = \frac{1 + \sum_{j=1}^{i-1}\prod_{k=1}^{j}\gamma_j}{1 + \sum_{j=1}^{N-1}
\prod_{k=1}^{j}\gamma_j}
\end{equation}
where:
\[
\gamma_j = \frac{p_{j, j-1}}{p_{j, j+1}}
\]
A neutral strategy will have fixation probability $x_i = i/N$.
Comparisons of \(x_1, x_{N/2}, x_{N-1}\) are shown in
Figure~\ref{fig:comparison_deterministic}. The points represent the simulated
values and the line shows the theoretical value. Note that these are all
deterministic strategies and show a perfect match between the expected value
of (\ref{equ:fixation_probability}) and the actual Moran process for all
strategy pairs. Figure~\ref{fig:comparison_stochastic} shows the fixation probabilities for
stochastic strategies. These are no longer a good match which highlights the
weakness of assuming a given interaction between two IPD strategies can be
summarised with a set of utilities as shown in
(\ref{equ:payoff_matrix}). For any given pair of strategies it is possible to
obtain \(p_{i,i-1}, p_{i,i+1}, p_{ii}\) exactly (as opposed to the
approximations offered by (\ref{equ:p_up}), (\ref{equ:p_down}) and
(\ref{equ:p_stay})). Obtaining these requires particular analysis for a given
pair and can be quite a complex endeavour for stochastic strategies with long
memory: this is not necessary for the purposes of this work.
All data generated for this validation exercise can be found
at \cite{data}.
\begin{figure}[!hbtp]
\centering
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Alternator_v_Cooperator.pdf}
\caption{Alternator and Cooperator}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Defector_v_Alternator.pdf}
\caption{Defector and Alternator}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Alternator_v_TfT.pdf}
\caption{Alternator and Tit For Tat}
\end{subfigure}%
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Cooperator_v_TfT.pdf}
\caption{Cooperator and Tit For Tat}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Defector_v_Cooperator.pdf}
\caption{Defector and Cooperator}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Defector_v_TfT.pdf}
\caption{Defector and Tit For Tat}
\end{subfigure}%
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/WSLS_v_TfT.pdf}
\caption{Win Stay Lose Shift and Tit For Tat}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Alternator_v_WSLS.pdf}
\caption{Alternator and Win Stay Lose Shift}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Defector_v_WSLS.pdf}
\caption{Defector and Win Stay Lose Shift}
\end{subfigure}%
\caption{Comparison of theoretic and actual Moran Process fixation
probabilities for \textbf{deterministic} strategies.}
\label{fig:comparison_deterministic}
\end{figure}
\begin{figure}[!hbtp]
\centering
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Random_v_Cooperator.pdf}
\caption{Random and Cooperator}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Random_v_Defector.pdf}
\caption{Random and Defector}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Random_v_TfT.pdf}
\caption{Random and Tit For Tat}
\end{subfigure}%
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_Cooperator.pdf}
\caption{All C or all D and Cooperator}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_Defector.pdf}
\caption{All C or all D and Defector}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Calculator_v_Random.pdf}
\caption{Calculator and Random}
\end{subfigure}%
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Calculator_v_ALLCorALLD.pdf}
\caption{Calculator and All C or all D}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/Calculator_v_Arrogant_QLearner.pdf}
\caption{Calculator and Arrogant Q learner}
\end{subfigure}%
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_TfT.pdf}
\caption{All C or all D and Tit For Tat}
\end{subfigure}%
\caption{Comparison of theoretic and actual Moran Process
fixation probabilities for \textbf{stochastic} strategies.}
\label{fig:comparison_stochastic}
\end{figure}
\section{Empirical results}\label{sec:empirical_results}
This section outlines the data analysis carried out, all data for this study is
available at~\cite{data}:
\begin{itemize}
\item Section~\ref{sec:two_individuals} considers the specific case of
\(N=2\).
\item Section~\ref{sec:strong_invaders} investigates the effect of
population size on the ability of a strategy to invade another
population. This will highlight how complex strategies with long
memories outperform simpler strategies.
\item Section~\ref{sec:strong_resistors} similarly investigates the
ability to defend against an invasion.
\item Section~\ref{sec:population_size} investigates the relationship
between performance for differing population sizes as well as
taking a close look at zero determinant strategies \cite{Press2012}.
\end{itemize}
\subsection{The special case of \(N=2\)}\label{sec:two_individuals}
When $N=2$ the Moran process is effectively a measure of the distribution of relative
mean payoffs over all possible matches between two players. The strategy
that scores higher than the other more often will fixate more often. For \(N=2\)
the two cases of \(x_1\) and \(x_{N-1}\) coincide, but will be
considered separately for larger \(N\) in sections~\ref{sec:strong_invaders}
and~\ref{sec:strong_resistors}. Figure~\ref{fig:boxplot_2} shows all fixation
probabilities for the strategies considered. The top 16 (10\%) strategies are
shown in
Table~\ref{tbl:summary_top_2}. The top five ranking strategies are:
\begin{enumerate}
\item The top strategy is the Collective Strategy (CS) which has a simple
handshake mechanism described above.
\item Defector: it always defects. Since it has no interactions with other
defectors (recall that \(N=2\)), its aggressiveness is rewarded.
\item Aggravater, which plays like Grudger (responding to any
defections with unconditional defections throughout) however starts by
playing 3 defections.
\item Predator, a finite state machine described in \cite{Ashlock2006}.
\item Handshake, a slightly less aggressive version of the Collective
Strategy \cite{robson1989}. As long as the initial sequence is played
then it cooperates. Thus it will do well in a population consisting of
many members of itself just as the Collective Strategy does. The
difference is that CS will defect after the handshake if the opponent
defects while handshake will not.
\end{enumerate}
It is also noted that TF1, TF2 and TF3 all perform well. This is also the \(N\)
for which a zero determinant strategy does appear in the top 10\% ranking
strategies: ZD-extort-4. The performance of zero determinant strategies will be
examined more closely in Section~\ref{sec:population_size}.
\begin{figure}[!hbtp]
\begin{subfigure}{.5\textwidth}
\centering
\includegraphics[height=.7\textheight]{./img/boxplot_2_invade.pdf}
\caption{The fixation probabilities for \(N=2\)}
\label{fig:boxplot_2}
\end{subfigure}%
~
\begin{subfigure}{.5\textwidth}
\scriptsize
\centering
\input{./tex/summary_top_2_invade.tex}
\caption{Top strategies for \(N=2\) (neutral fixation is \(p=0.5\))}
\label{tbl:summary_top_2}
\end{subfigure}
\caption{Performance of strategies for \(N=2\).}
\end{figure}
As will be demonstrated in Section~\ref{sec:population_size} the results for
\(N=2\) differ from those of larger $N$. Hence these results do not concur with
the literature which suggests that zero determinant strategies should be
effective for larger population sizes, but these analyses consider stationary
behaviour, while this work runs for a fixed number of rounds. \cite{stewart2013extortion}
The stationarity assumption allows for a deterministic payoff matrix
leading to the conclusions about zero determinant strategies in the space
of memory-one strategies that do not generalize to this context.
% In the next sections close attention to
% strategies who are strong invaders/resistors is given.
\subsection{Strong Invaders}\label{sec:strong_invaders}
In this section the focus is on the ability of a mutant strategy to invade: the
probability of one individual of a given type successfully fixating in a
population of \(N - 1\) other individuals, denoted by \(x_1\).
The ranks of each strategy for all considered values of \(N\) according to mean
\(x_1\) are shown in Figure~\ref{fig:ranks_v_size_invade}.
\begin{figure}[!hbtp]
\centering
\includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_invade.pdf}
\caption{\textbf{Invasion}: Ranks of all strategies according to \(x_1\) for different
population sizes.}
\label{fig:ranks_v_size_invade}
\end{figure}
The fixation
probabilities are shown in
Figures~\ref{fig:boxplot_3_invade},~\ref{fig:boxplot_7_invade}
and~\ref{fig:boxplot_14_invade} for \(N\in\{3, 7, 14\}\) showing the mean
fixation as well as the neutral fixation for each given scenario.
\begin{figure}[!hbtp]
\centering
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_3_invade.pdf}
\caption{\(N=3\)}
\label{fig:boxplot_3_invade}
\end{subfigure}%
~
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_7_invade.pdf}
\caption{\(N=7\)}
\label{fig:boxplot_7_invade}
\end{subfigure}%
~
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_14_invade.pdf}
\caption{\(N=14\)}
\label{fig:boxplot_14_invade}
\end{subfigure}
\caption{The fixation probabilities \(x_1\).}
\end{figure}
The top 16 strategies are given in Tables~\ref{tbl:top_invade}.
\begin{table}[!hbtp]
\centering
\scriptsize
\begin{subfigure}[t]{.5\textwidth}
\centering
\input{./tex/summary_top_3_invade.tex}
\caption{\(N=3\)}
\end{subfigure}%
~
\begin{subfigure}[t]{.5\textwidth}
\centering
\input{./tex/summary_top_7_invade.tex}
\caption{\(N=7\)}
\end{subfigure}
\begin{subfigure}[t]{.3\textwidth}
\centering
\input{./tex/summary_top_14_invade.tex}
\caption{\(N=14\)}
\end{subfigure}
\caption{Top invaders for \(N\in\{3, 7, 14\}\)}
\label{tbl:top_invade}
\end{table}
It can be seen that apart from CS, none of the strategies
of Table~\ref{tbl:summary_top_2} perform well for \(N\in\{3, 7, 14\}\). The new
top performing strategies are:
\begin{itemize}
\item Grudger (which only performs well for \(N=3\)), starts by cooperating
but will defect if at any point the opponent has defected.
\item MEM2, an infinite memory strategy that switches between TFT, TF2T, and
Defector \cite{Li2014}.
\item TF3, the finite state machine trained specifically for Moran processes
described in Section~\ref{sec:introduction}.
\item Prober 4, a strategy which starts with a specific 20 move sequence of
cooperations and defections \cite{Prison1998}. This initial sequence serves
as approximate handshake.
\item PSO Gambler and Evolved Lookerup 2 2 2: are strategies that make use
of a lookup table mapping the first 2 moves of the opponent as well as
the last 2 moves of both players to an action. The PSO gambler is a
stochastic version of the Lookerup which maps those states to probabilities of cooperating. The Lookerup was described in \cite{Knight2016}.
\item The Evolved ANN strategies are neural networks that map a number of
attributes (first move, number of cooperations, last move, etc.) to
an action. Both of these have been trained using an evolutionary
algorithm.
\item The Evolved FSM 16 is a 16 state finite state machine trained to
perform well in tournaments.
\end{itemize}
Only one of the above strategies is stochastic although close inspection of the
source code of PSO Gambler shows that it makes stochastic decisions rarely, and
is functionally very similar to its deterministic cousin Evolved Looker Up.
The PSO Gambler Mem1 strategy is a memory one strategy that has been trained to
maximise its utility and does perform well.
Apart from TF3, the finite state machines trained specifically for
Moran processes do not appear in the top 5, while strategies trained for
tournaments do. This is due to the nature of invasion: most of the opponents
will initially be different strategies. The next section will consider the
converse situation.
\subsection{Strong resistors}\label{sec:strong_resistors}
In addition to identifying good invaders, strategies resistant to invasion by
other strategies are identified by examining the distribution of $x_{N-1}$ for
each strategy. The ranks of each strategy for all considered values of \(N\)
according to mean \(x_{N-1}\) are shown in
Figures~\ref{fig:ranks_v_size_resist}.
\begin{figure}[!hbtp]
\centering
\includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_resist.pdf}
\caption{\textbf{Resistance}: Ranks of all strategies according to \(x_{N-1}\) for different
population sizes.}
\label{fig:ranks_v_size_resist}
\end{figure}
The fixation probabilities are shown in
Figures~\ref{fig:boxplot_3_resist},~\ref{fig:boxplot_7_invade}
and~\ref{fig:boxplot_14_resist} for \(N\in\{3, 7, 14\}\) showing the mean
fixation as well as the neutral fixation for each given scenario.
\begin{figure}[!hbtp]
\centering
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_3_resist.pdf}
\caption{\(N=3\)}
\label{fig:boxplot_3_resist}
\end{subfigure}%
~
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_7_resist.pdf}
\caption{\(N=7\)}
\label{fig:boxplot_7_resist}
\end{subfigure}%
~
\begin{subfigure}{.3\textwidth}
\centering
\includegraphics[height=.9\textheight]{./img/boxplot_14_resist.pdf}
\caption{\(N=14\)}
\label{fig:boxplot_14_resist}
\end{subfigure}
\caption{The fixation probabilities \(x_{N-1}\).}
\end{figure}
Table~\ref{tbl:top_resist} shows the top strategies when ranked
according to \(x_{N-1}\) for \(N\in\{3, 7, 14\}\).
Once again none of the short memory strategies from
Section~\ref{sec:two_individuals} perform well for high \(N\).
\begin{table}[!hbtp]
\scriptsize
\centering
\begin{subfigure}[t]{.5\textwidth}
\centering
\input{./tex/summary_top_3_resist.tex}
\caption{\(N=3\)}
\end{subfigure}%
~
\begin{subfigure}[t]{.5\textwidth}
\centering
\input{./tex/summary_top_7_resist.tex}
\caption{\(N=7\)}
\end{subfigure}
\begin{subfigure}[t]{.3\textwidth}
\centering
\input{./tex/summary_top_14_resist.tex}
\caption{\(N=14\)}
\end{subfigure}
\caption{Top resistors for \(N\in\{3, 7, 14\}\)}
\label{tbl:top_resist}
\end{table}
Interestingly none of these strategies is stochastic: this is explained by
the need of strategies to have a steady hand when interacting with their own
kind. Acting stochastically increases the chance of friendly fire.
However it is possible to design a strategy with a stochastic or error-correcting
handshake that is an excellent resistor even in noisy environments \cite{Lee2015}.
There are are only two new strategies that appear in the top ranks for
\(x_{N-1}\): TF1 and TF2. These two strategies are with CS the strongest
resistors. They all have handshakes, and whilst the handshakes of CS and
Handshake (which ranks highly for the smaller values of \(N\)) were
programmed, the handshakes of TF1 and TF2 evolved through an evolutionary
process without any priming.
As described in Section~\ref{sec:strong_invaders} the strategies trained with
the payoff maximizing objective are among the best invaders in the library
however they are not as resistant to invasion as the strategies trained using a
Moran objective function. These strategies include trained finite state machine
strategies, but they do not appear to have handshaking mechanisms. Therefore it
is reasonable to conclude that the objective function is the cause of the
emergence of handshaking mechanisms. More specifically, TF1 and TF2 evolved
handshakes for high invasion resistance. TF3 is a better total payoff maximizer
which makes it a better invader along with the strategies
trained to maximize total payoff since successful fitness proportionate selection
is necessary for invasion. Training with an objective with initial population
mix other than $(N/2, N/2)$ may favor invasion or resistance.
The payoff maximizing strategies typically will not defect before the opponent's
first defection, possibly because the training strategy collection contains some
strategies such as Grudger and Fool Me Once that retaliate harshly by defecting
for the remainder of the match if the opponent has more than a small number of
cumulative defections. Paradoxically it is advantageous to defect (as a signal)
in order to achieve mutual cooperation with opponents using the same strategy
but not with other opponents. Nevertheless an evolutionary process is able to
tunnel through the costs and risks associated to early defections to find more
optimal solutions, so it is not surprising in hindsight that handshaking
strategies emerge from the evolutionary training process.
A handshake requires at least one defection and there is
selective pressure to defect as few times as possible to achieve the
self-recognition mechanism. It is also unwise to defect on the first move as
some strategies additionally retaliate first round defections. So the
handshakes used by TF1, TF2, and CS are in some sense optimal.
It is evident through
Sections~\ref{sec:two_individuals},~\ref{sec:strong_invaders}
and~\ref{sec:strong_resistors} that performance of strategies not only depends
on the initial population distribution but also that there seems to be a
difference depending on whether or not \(N>2\). This will be explored further in
the next section, looking not only at \(x_1\) and \(x_{N-1}\) but also consider
\(x_{N/2}\).
\subsection{The effect of population size}\label{sec:population_size}
To complement Figures~\ref{fig:ranks_v_size_invade}
and~\ref{fig:ranks_v_size_resist}, Figure~\ref{fig:ranks_v_size_coexist} shows
the rank of each strategy based on \(x_{N/2}\).
Tables~\ref{tbl:ranks_v_size_invade},~\ref{tbl:ranks_v_size_resist}
and~\ref{tbl:ranks_v_size_coexist} show the same information for a selection of
strategies:
\begin{itemize}
\item The strategies that ranked highly for \(N=2\);
\item The strategies that ranked highly for \(N=14\);
\item The zero determinant strategies.
\end{itemize}
The results for \(x_{N/2}\) show similarities to the results for \(x_{N-1}\) and
in particular TF1, TF2 and TF3 ranked one, three and eight. This is to be
expected since, as described in Section~\ref{sec:strategies} these strategies
were trained in an initial population of \((N/2, N/2)\) individuals.
For all starting populations
\(i\in\{1, N/2, N-1\}\) the ranks of strategies are relatively stable across the
different values of \(N>2\) however for \(N=2\) there is a distinct difference.
This highlights that there is little that can be inferred about the evolutionary
performance of a strategy in a large population from its performance in a small
population. This is confirmed by the performance of the zero determinant strategies: while
some do rank relatively highly for \(N=2\) (ZD-extort-4 has rank 16) this rank
does not translate to larger populations.
\begin{table}[!hbtp]
\centering
\scriptsize
\input{./tex/change_of_rank_invade.tex}
\caption{Invasion: Fixation ranks of some strategies according to \(x_1\) for different
population sizes}
\label{tbl:ranks_v_size_invade}
\end{table}
\begin{table}[!hbtp]
\centering
\scriptsize
\input{./tex/change_of_rank_resist.tex}
\caption{Resistance: Fixation ranks of some strategies according to \(x_{N-1}\) for different
population sizes}
\label{tbl:ranks_v_size_resist}
\end{table}
\begin{figure}[!hbtp]
\centering
\includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_coexist.pdf}
\caption{Fixation ranks of all strategies according to \(x_{N/2}\) for different
population sizes.}
\label{fig:ranks_v_size_coexist}
\end{figure}
\begin{table}[!hbtp]
\centering
\scriptsize
\input{./tex/change_of_rank_coexist.tex}
\caption{Ranks of some strategies according to \(x_{N/2}\) for different
population sizes}
\label{tbl:ranks_v_size_coexist}
\end{table}
Figure~\ref{fig:correlation_coefficients} show the correlation coefficients
of the ranks of strategies in differing population size. How well a strategy
performs in any Moran process for \(N>2\) has
little to do with the performance for \(N=2\). This illustrates why the strong
performance of zero determinant strategies predicted in \cite{Press2012} does
not extend to larger populations. This was discussed theoretically in
\cite{Adami2013} and observed empirically in these simulations.
\begin{figure}[!htbp]
\centering
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.9\textwidth]{./img/correlation_heatmap_invade.pdf}
\caption{Rank based on \(x_1\)}
\end{subfigure}
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.9\textwidth]{./img/correlation_heatmap_resist.pdf}
\caption{Rank based on \(x_{N - 1}\)}
\end{subfigure}
~
\begin{subfigure}[t]{.3\textwidth}
\centering
\includegraphics[width=.9\textwidth]{./img/correlation_heatmap_coexist.pdf}
\caption{Rank based on \(x_{N/2}\)}
\end{subfigure}
\caption{Heatmap of correlation coefficients of rankings by population size.}
\label{fig:correlation_coefficients}
\end{figure}
\section{Discussion}
Training strategies to excel
at the Moran process leads to the evolution of cooperation, but only with like
individuals in the case of TF1 and TF2. This may have significant implications
for human social interactions such as the evolution of ingroup/outgroup mechanisms
and other sometimes costly rituals that reinforce group behavior.
While TF1 and TF2 are competent invaders, the best invaders
in the study do not appear to employ strict handshakes, and are generally
cooperative strategies. TF3, which does not use a handshake, is a better invader
than TF1 and TF2 but not as good a resistor. Nevertheless it was the result
of the same kind of training procceses and is a better combined invader-resistor
than the invaders that were trained previously to maximize payout.
The strategies trained to maximize payoff in head-to-head matches are generally
cooperative and are effective invaders.
Combined with the fact that handshaking strategies are stronger resisters,
this suggests that while maximizing individual payoff can lead to the evolution
of cooperation, these strategies are not the most evolutionarily stable
in the long run. A strategy with a handshaking mechanism is still capable of
invading and is more resistant to subsequent invasions. Moreover, the
best resistor of the payoff maximally trained strategies (Evolved Looker Up
1\_1\_1),
which always defects if the opponent defects in the first round, is effectively
employing a one-shot handshake of C. Similarly, Grudger (also known as Grim),
which emerged from training memory one strategies for the Moran process,
also effectively employs a handshake of always cooperating, as it defects
for the remainder of the match if the opponent ever defects.
The insights that payoff maximizers are better invaders and that handshakers
are better resistors suggests that a strategy
aware of the population distribution could choose to become a handshaker at
a critical threshold and use a strategy better for invasion when in the
minority. Information about the population distribution was not available
to our strategies. Previous work has showed that strategies able to retain
memory across matches can infer the population distribution and act in such
a manner, resulting in a strategy effective at invasion and resistance
\cite{Lee2015}.